iainmerrick (u/iainmerrick)

iainmerrick commented on A Broken Heart allenpike.com/2026/a-brok... · Posted by u/memalign

iamcreasy · 4 days ago

> At that point, I reached for an age-old tool that has gotten more useful in the modern age: binary search. That is, you explain the symptom to your coding agent. Then you have it repeatedly remove stuff from your code that might be causing the problem

Can someone give me some high level pointers on how to setup this scaffolding?

iainmerrick · 4 days ago

"git bisect" usually does the trick.

iainmerrick commented on Agent Skills agentskills.io/home... · Posted by u/mooreds

ashdksnndck · 6 days ago

Is your view that this doesn’t work based on conjecture or direct experience? It’s my understanding Anthropic and OpenAI have optimized their products to use skills more efficiently and it seems obviously true when I add skills to my repo (even when the info I put there is already in existing documentation).

iainmerrick · 6 days ago

Hmm, that’s a good question! I think a bit of both.

In terms of experience, I’ve noticed that agents don’t always use skills the way you want; and I’ve noticed that they’re pretty good at browsing existing code and docs and figuring things out for themselves.

Is this an example of “the bitter lesson”? That’s conjecture, but I think pretty well-founded.

It could well be that specific formats for skills work better because the agents are trained on those specific formats. But if so, I think it’s just a local maximum.

iainmerrick commented on Bunny Database bunny.net/blog/meet-bunny... · Posted by u/dabinat

osener · 6 days ago

When I tried it last year, their edge compute infra was just not there yet. It could not do any meaningful server-side rendering because of code size, compute and JS standard constraints.

Has this situation changed?

iainmerrick · 6 days ago

Depending on your precise requirements, I think it might have changed.

I've been trying out Bunny recently and it looks like a very viable replacement for most things I currently do with Cloudflare. This new database fills one of the major gaps.

Their edge scripting is based on Deno, and I think is pretty comparable to e.g. Vercel. They also have "magic containers", comparable to AWS ECS but (I think) much more convenient. It sounds from the docs like they run containers close to the edge, but I don't know if it's comparable to e.g. Lambda@Edge.

iainmerrick commented on Agent Skills agentskills.io/home... · Posted by u/mooreds

ashdksnndck · 6 days ago

We’re working with the models that are available now, not theoretical future models with infinite context.

Claude is programmed to stop reading after it gets through the skill’s description. That means we don’t consume more tokens in the context until Claude decides it will be useful. This makes a big difference in practice. Working in a large repo, it’s an obvious step change between me needing to tell Claude to go read a particular readme that I know solves the problem vs Claude just knowing it exists because it already read the description.

Sure, if your project happened to already have a perfect index file with a one-sentence description of each other documentation file, that could serve as a similar purpose (if Claude knew about it). It’s worthwhile to spread knowledge about how effective this pattern is. Also, Claude is probably trained to handle this format specifically.

iainmerrick · 6 days ago

To clarify, the bit where I think the bitter lesson applies is trying to standardize the directory names, the permitted headings and paragraph lengths, etc. It's pointless bikeshedding.

Making your docs nice and modular, and having a high-level overview that tells you where to find more detailed info on specific topics, is definitely a good idea. We already know that when we're writing docs for human readers. The LLMs are already trained on a big corpus written by and for humans. There's no compelling reason why we need to do anything radically different to help them out. To the contrary, it's better not to do anything radically different, so that new LLM-assisted code and docs can be accessible to humans too.

Well-written docs already play nicely with LLM context.

iainmerrick commented on Agent Skills agentskills.io/home... · Posted by u/mooreds

killerstorm · 6 days ago

> Is any of this standardization really needed?

This standardization, basically, makes a list of docs easier to scan.

As a human, you have a permanent memory. LLMs don't have it, they have to load it into the context, and doing it only as necessary can help.

E.g. if you had anterograde amnesia, you'd want everything to be optimally organized, labeled, etc, right? Perhaps an app which keeps all information handy.

iainmerrick · 6 days ago

Everybody wants that, though, no? At least some of the time?

For example, if you've just joined a new team or a new project, wouldn't you like to have extensive, well-organised documentation to help get you started?

This reminds me of the "curb-cut effect", where accommodations for disabilities can be beneficial for everybody: https://front-end.social/@stephaniewalter/115841555015911839

iainmerrick commented on Agent Skills agentskills.io/home... · Posted by u/mooreds

smithkl42 · 6 days ago

It's all about managing context. The bitter lesson applies over the long haul - and yes, over the long haul, as context windows get larger or go away entirely with different architectures, this sort of thing won't be needed. But we've defined enough skills in the last month or two that if we were to put them all in CLAUDE.md, we wouldn't have any context left for coding. I can only imagine that this will be a temporary standard, but given the current state of the art, it's a helpful one.

iainmerrick · 6 days ago

To clarify, when I mentioned the bitter lesson I meant putting effort into organising the "skills" documentation in a very specific way (headlines, descriptions, etc).

Splitting the docs into neat modules is a good idea (for both human readers and current AIs) and will continue to be a good idea for a while at least. Getting pedantic about filenames, documentation schemas and so on is just bikeshedding.

iainmerrick commented on Agent Skills agentskills.io/home... · Posted by u/mooreds

avaer · 6 days ago

It's not about instructions, it's about discoverability and data.

Yeah, WWW is really just text but that doesn't mean you don't need HTTP + HTML and a browser/search engine. Skills is just that, but for agent capabilities.

Long term you're right though, agents will fetch this all themselves. And at some point they will not be our agents at all.

iainmerrick · 6 days ago

I guess what I mean is that standardizing this bit of the problem right now feels sort of like XHTML. Many people thought that was a big deal back in the day, but it turned out to be a pointless digression.

Long term you're right though, agents will fetch this all themselves

It's not "long term", it's right now. If your docs are well-written and well-organised, agents can already use them. The most you might need to do is copy your README.md into CLAUDE.md.

iainmerrick commented on Agent Skills agentskills.io/home... · Posted by u/mooreds

zby · 6 days ago

The instructions are standard documents - but this is not all. What the system adds is an index of all skills, built from their descriptions, that is passed to the llm in each conversation. The idea is to let the llm read the skill when it is needed and not load it into context upfront. Humans use indexes too - but not in this way. But there are some analogies with GUIs and how they enhance discoverability of features for humans.

I wish they arranged it around READMEs. I have a directory with my tasks and I have a README.md there - before codex had skills it already understood that it needs to read the readme when it was dealing with tasks. The skills system is less directory dependent so is a bit more universal - but I am not sure if this is really needed.

iainmerrick · 6 days ago

Humans use indexes too - but not in this way.

What's different?

iainmerrick commented on Agent Skills agentskills.io/home... · Posted by u/mooreds

postalcoder · 6 days ago

Folks have run comparisons. From a huggingface employee:

  codex + skills finetunes Qwen3-0.6B to +6 on humaneval and beats the base score on the first run.

  I reran the experiment from this week, but used codex's new skills integration. Like claude code, codex consumes the full skill into context and doesn't start with failing runs. It's first run beats the base score, and on the second run it beats claude code.

https://xcancel.com/ben_burtenshaw/status/200023306951767675...

That said, it's not a perfect comparison because of the Codex model mismatch between runs.

The author seems to be doing a lot of work on skills evaluation.

https://github.com/huggingface/upskill

iainmerrick · 6 days ago

I can't quite tell what's being compared there -- just looks like several different LLMs?

To be clear, I'm suggesting that any specific format for "skills.md" is a red herring, and all you need to do is provide the LLM with good clear documentation.

A useful comparison would be between: a) make a carefully organised .skills/ folder, b) put the same info anywhere and just link to it from your top-level doc, c) just dump everything directly in the top-level doc.

My guess is that it's probably a good idea to break stuff out into separate sections, to avoid polluting the context with stuff you don't need; but the specific way you do that very likely isn't important at all. So (a) and (b) would perform about the same.

iainmerrick commented on Agent Skills agentskills.io/home... · Posted by u/mooreds

prettyblocks · 6 days ago

I find that even though this isn't standard, that these -cli tools will scan the repo for .md files and for the most part execute the skills accordingly. Having said that, I would much prefer standards not just for this, but for plugins as well.

iainmerrick · 6 days ago

Standards for plugins makes sense, because you're establishing a protocol that both sides need to follow to be able to work together.

But I don't see why you need a strict standard for "an informal description of how to do a particular task". I say "informal" because it's necessarily written in prose -- if it were formal, it'd be a shell script.