tlarkworthy (u/tlarkworthy)

tlarkworthy commented on WebR – R in the Browser docs.r-wasm.org/webr/late... · Posted by u/sieste

Cool but 12MB WASM blob. I wish there was a way of making these WASM builds significantly smaller.

tlarkworthy commented on Best Practices for Building Agentic AI Systems userjot.com/blog/best-pra... · Posted by u/vinhnx

tlarkworthy · 13 days ago

These subagents look like tools

tlarkworthy commented on A Comprehensive Survey of Self-Evolving AI Agents [pdf] arxiv.org/abs/2508.07407... · Posted by u/SerCe

hnuser123456 · 16 days ago

Thanks for the writeup. I wonder if it would be plausible to run this kind of self-optimization for a wider variety of problem sets, to generate "context pathways" for various tasks that are all optimized, and maybe even learn patterns from multiple prompt optimizations to generalize.

tlarkworthy · 16 days ago

the prompt I would like to optimize is the reflection prompt

`You are a prompt‑engineer AI. You will be improving the performance of a prompt by considering recent executions of that prompt against a variate of tasks that were asked by a user. You need to look for ways to improve the SCORE by considering recent executions using that prompt and doing web research on the domain.

Your tasks is to improve the CURRENT PROMPT. You will be given traces of several TASKS using the CURRENT PROMPT and then respond only with the text of the improved using the improve_prompt tool`; const research_msg = `Generate some ideas on how how this prompt might be improved, perhaps using web research\nCURRENT PROMPT:\n${prompt}\n${trace}`

source: https://observablehq.com/@tomlarkworthy/gepa#reflectFn

but I would need quite a few distinct tasks to do that and task setup is the laborious part (getting quicker now I optimized the notebook coding agent).

tlarkworthy commented on A Comprehensive Survey of Self-Evolving AI Agents [pdf] arxiv.org/abs/2508.07407... · Posted by u/SerCe

koakuma-chan · 16 days ago

Do you mind sharing which tasks you achieved great results on?

tlarkworthy · 16 days ago

It's all written up and linked in the notebook and executable in your browser (if you dare to insert your OPEN_AI_KEY, but my results are included assuming you won't).

The evals were coding observable notebook challenges, simple things like create a drop down, but to solve you need to know the observable standard library and some of the unique syntax like "viewof".

There is a table of the cases here https://observablehq.com/@tomlarkworthy/robocoop-eval#cell-2...

So it's important the prompt encodes enough of the programming model. The seed prompt did not, but the reflect function managed to figure it all out. At the top of the notebook is the final optimized prompt which has done a fair bit of research to figure out the programming model using web search.

tlarkworthy commented on A Comprehensive Survey of Self-Evolving AI Agents [pdf] arxiv.org/abs/2508.07407... · Posted by u/SerCe

tlarkworthy · 16 days ago

Recently tried out the new GEPA algorithm for prompt evolution with great results. I think using LLMs to write their own prompt and analyze their trajectories is pretty neat once appropriate guardrails are in place

https://arxiv.org/abs/2507.19457

https://observablehq.com/@tomlarkworthy/gepa

I guess GEPA is still preprint and before this survey but I recommend taking a look due to it's simplicity

tlarkworthy commented on GPT-5 openai.com/gpt-5/... · Posted by u/rd

kybernetikos · 22 days ago

ChatGPT5 in this demo:

> For an airplane wing (airfoil), the top surface is curved and the bottom is flatter. When the wing moves forward:

> * Air over the top has to travel farther in the same amount of time -> it moves faster -> pressure on the top decreases.

> * Air underneath moves slower -> pressure underneath is higher

> * The presure difference creates an upward force - lift

Isn't that explanation of why wings work completely wrong? There's nothing that forces the air to cover the top distance in the same time that it covers the bottom distance, and in fact it doesn't. https://www.cam.ac.uk/research/news/how-wings-really-work

Very strange to use a mistake as your first demo, especially while talking about how it's phd level.

tlarkworthy · 21 days ago

They did not ask how wings work. They asked for the bernoulli effect, that's a different question.

tlarkworthy commented on Nimtable: Open-source web UI to browse and manage Apache Iceberg tables github.com/nimtable/nimta... · Posted by u/Sheldon_fun

tlarkworthy · 2 months ago

We recently migrated from RDS to Iceberg and the savings have been mind-blowing. That said, it's like a deconstructed database and everything you took for granted is now on the outside. Sometimes it's cool, like instead of a Cron inside the database you have a cloud Cron on the outside which is more natively monitor able. Some of it sucks like write contention on inserts so you need a write coordinator lock, and external vacuums. However, the savings are extreme. Better cost per byte stored. Better compression. Less table bloat.

The universal compatibility has not materialised yet. We are still waiting for duckdb and click house predicate pushdown but if that appears we will be able to have specialized query engines over a generic storage layer which would be amazing. Tools like Nimtable underline what compatability unlocks.

tlarkworthy commented on Cursor goes rogue in YOLO mode, deletes itself and everything else machine.news/it-felt-like... · Posted by u/LargeLingoMod

tlarkworthy · 3 months ago

I wrote an agent that works in userspace inside the developing program and it frequently reads it's own code to diagnose errors and sometimes tries to upgrade itself, but that causes a hot reload and it loses its own conversation. It does seem to be useful though that it can read it's own tool implementations.