There's a pattern I keep seeing: LLMs used to replace things we already know how to do deterministically. Parsing a known HTML structure, transforming a table, running a financial simulation. It works, but it's like using a helicopter to cross the street: expensive, slow, and not guaranteed to land exactly where you intended.
The real opportunity with Agent Skills isn't just packaging prompts. It's providing a mechanism that enables a clean split: LLM as the control plane (planning, choosing tools, handling ambiguous steps) and code or sub-agents as the data/execution plane (fetching, parsing, transforming, simulating, or executing NL steps in a separate context).
This requires well-defined input/output contracts and a composition model. I opened a discussion on whether Agent Skills should support this kind of composability:
The same applies to context vs a database. If a reasoning model makes a decision about something, it should be put off to the side and stored as a value/variable/entry somewhere. Instead of using pages and pages of context, it makes sense for some tasks to "press" decisions that become more permanent to the conversation. You can somewhat accomplish that with notebooklm, by turning results into notes into sources, but notebooklm is insular and doesnt have the research and imaging features of gemini.
And also, in writing, writing from top to bottom has its disadvantages. It makes sense to emulate human writing process and have passes, as you flesh out, and conversely summarize writing.
Current LLMs can brute force these things through emulation/observation/mimicry but they arent as good as doing it the right way. Not only would I like to see "skills" but also "processes" where you create a well defined order that tasks are accomplished in sequence. Repeatable templates. This would essentially include variables in the templates, set for replacement.
> Not only would I like to see "skills" but also "processes" where you create a well defined order that tasks are accomplished in sequence. Repeatable templates. This would essentially include variables in the templates, set for replacement.
You can do this with Gemini commands and extensions.
Where in this post's article are you seeing this pattern?
> Parsing a known HTML structure
In most cases, HTML structures that are being parsed aren't known. If they're known, you control them, and you don't need to parse them in the first place. If they're someone else's, who knows when they'll change, or under what condition they're different.
But really, I don't see the stuff you're talking about happening in prod for non-one-off usecases. I see LLMs used in prod usecases exactly for data where you don't know exactly what its shape will be, and there's an enormous amount of such cases. If the same logic is needed every time, of course you don't have an LLM execute that logic, you have the LLM write a deterministic script.
Skills are about empowering LLMs with tools, so the heavy lifting can still be deterministic. Furthermore, pipelines written in LLMs are simpler and less brittle, since handling variation is the essence of machine learning.
I've recently been doing some work with Autodesk. It would be great for an LLM to be as comfortable with the "vocabulary" of these applications as they are with code. Maybe part of this involves creating a language for CAD design in the first place. But the principle that we need to build out vocabularies and subsequently generate and expose "sentences" (workflows) for LLM's to train on seems like a promising direction.
Of course this requires substantial buy in from application owners - create the vocabulary - and users - agree to expose and share the sentences they generate - but the results would be worth it.
Additionally, I can't even get claude or codex to reliable use the prompt and simple rules (use this command to compile) in an agents.md or whatever required markdown file is needed. Why would I assume they will reliably handle skills prompts spread about a codebase?
I've even seen tool usage deteriorate while it's thinking and self commanding through its output to say.. read code from a file. Sometimes it uses tail while other times it gets confused on the output and then writes a basic python program to parse lines and strings from the same file to effectively get what was the same output as before. How bizarre!
Isn't atleast part of that GH issue something that this https://docs.boundaryml.com/guide/introduction/what-is-baml is also trying to solve? LLM inputs and outputs must be functions with defined functions. That was their starting point.
IIUC their most recent arc focuses on prompt optimization[0] where you can optimize — using DSPy and an optimization algo GEPA [1] — using relative weights on different things like errors, token usage, complexity.
Skills are essentially boiling down to distributed parts of a Main Prompt. If you consider a state model you can see this pattern: Task is the state and combining the task's specifics skills defines the current prompt augmentation. When the task changes, another prompt emerges.
In the end, it is the clear guidance of the Agent that is the deciding factor.
> Parsing a known HTML structure, transforming a table, running a financial simulation.
Transforming an arbitrary table is still hard, especially a table on a webpage or in a document. Sometimes I even struggle to find the right library. The effort does not seem worth it for one-off need of such transformation too. LLM can be a great tool for doing the tasks.
How likely are we to look back on Agent/MCP/Skills as some early Netscape peculiarity? I would dive into adoption if I didn't think some new thing would beat the paradigm in a fortnight.
I've built a number of MCP servers, including an MCP wrapper. I'd generally recommend you skip it unless you know you need it. Conversely, I'd generally recommend you write up a couple skills ASAP to get a feel for them. It will take you 20 minutes to write and test some.
MCP does three things conceptually: it lets you build a bridge between an agent and <something else>, it specifies a UI+API layer between the bridge and the LLM, and it formalizes the description of that bridge in a tool-calling format.
It's that UI+API layer that's the biggest pain in the ass, in my opinion. Sometimes you need it; for instance, if you wanted an agent to access your emails, a high quality MCP server that can't destroy your life through enthusiastic tool calling makes sense.
If, however, you have, say a CLI tool or simple API that's reasonably self documenting and you're willing to have it run, and/or if you need specific behavior with a different context setting, then a skill can just be a markdown file that explains what, how, why.
If you know you need them though, do use them. There are four MCP servers I use regularly and they're enormously useful. They're all around the same topic though - pulling in context/data from sources. One is dual-use, in that I occasionally also use it for things like dashboard generation.
I will say, when using MCP be selective about which tools you enable. A lot of the time they come with say 30 tools and you only personally care about 5 of them. The other 25 are just rotting your context.
Agreed. I use only one MCP server regularly and it’s a custom one integrated into my QT desktop app. It has tools for inspecting the widget tree, using selectors to click/type/etc, and take screenshots. Functionality that would otherwise be hard or impossible to reliably implement using CLI calls but gives Claude a closed feedback loop.
All public MCP server I’ve seen have been a disaster with too many tools and tokens polluting the context. It’s really most useful when you need tight integration with some other environment and can write a little custom wrapper to provide it.
Agent/MCP/Skills might be "Netscape-y" in the sense that today's formats will evolve fast. But Netscape still mattered: it lost the market, not the ideas. The patterns survived (JavaScript, cookies, SSL/TLS, progressive rendering) and became best practices we take for granted.
The durable pattern here isn't a specific file format. It's on-demand capability discovery: a small index with concise metadata so the model can find what's available, then pull details only when needed. That's a real improvement over tool calling and MCP's "preload all tools up front" approach, and it mirrors how humans work. Even as models bake more know-how into their weights, novel capabilities will always be created faster than retraining cycles. And even if context becomes unlimited, preloading everything up front remains wasteful when most of it is irrelevant to the task at hand.
So even if "Skills" gets replaced, discoverability and progressive disclosure likely survive.
Yes this 100%. Every person i speak with who is excited about MCP is some LinkedIn Guru or product expert. I'm yet to encounter a seriously technical person excited by any of this.
The problem isn’t having a standard way for agents to branch out. The problem is that AI is the new Javascript web framework: there’s nothing wrong with frameworks, but when everyone and their son are writing a new framework and half those frameworks barely work, you end up with a buggy, fragmented ecosystem.
I get why this happens. Startups want VC money, established companies then want to appear relevant, and then software engineers and students feel pressured to prove they’re hireable. And you end up with one giant pissing contest where half the players likely see the ridiculousness of the situation but have little choice other than to join party.
I have found MCPs to be very useful (albeit with some severe and problematic limitations in the protocol's design). You can bundle them and configure them with a desktop LLM client and distribute them to an organization via something like Jamf. In the context I work in (biotech) I've found it a pretty high-ROI way to give lots of different types of researchers access to a variety of tools and data very cheaply.
I use only one MCP, but I use it a lot: it's chrome devtools. I get Claude Code to test in the browser, which makes a huge difference when I want it to fix a bug I found in the browser - or if I just want it to do a real world test on something it just built.
I have found MCPs helpful. Recently, I used one to migrate a site from WordPress to Sanity. I pasted in the markdown from the original site and told it to create documents that matched my schemas. This was much quicker and more flexible than whipping up a singular migration tool. The Sanity MCP uses oAuth so I also didn’t need to do anything in order to connect to my protected dataset. Just log in. I’ll definitely be using this method in the future for different migrations.
How likely is it to even remember “the AI stuff” 2-3 years from now? What we’re trying to do with LLMs today is extremely unsustainable. NVidia/openai will run out of silly investors eventually…
Skills are just prompt conventions; the exact form may change but the substance is reasonable. MCP, eh, it’s pretty bad, I can see it vanishing.
The agent loop architectural pattern (and that’s the relevant bit) is going to continue to matter. There will be new patterns for sure, but tool calling plus while loop (which is all an “agent” is) is powerful and highly general.
Frontier models will eventually eat all the tedious tailored add-ons as just part of something they can do.
Right now models have roughly all of the written knowledge available to mankind, minus some obscure held out private archives and so on. They have excellent skills and general abilities to construct plausible sequences of actions to accomplish work, but we need to hold their hands to really get decent performance across a wide range of activities. Skills and agent frameworks and MCP carve out different domains of that problem, with successful solutions providing training data for future models that might be able to be either generalized, or they'll be able to create a vast mountain of synthetic data following successful patterns, and make the next generation of models incredibly useful for a huge number of tasks, by default.
It might also be possible that by studying the problem, identifying where mode collapses and issues with training prevent the right sort of generalization, they might tweak the architecture and be able to solve the deficiency through normal training runs, and thereby discard the need for all the bespoke artisanal agent specifications.
It's basically just a way for the LLM to lazy-load curated information, tools, and scripts into context. The benefit of making it a "standard" is that future generations of LLMs will be trained on this pattern specifically, and will get quite good at it.
It's the description that gets inserted into the context, and then if that sounds useful, the agent can opt to use the skill. I believe (but I'm not sure) that the agent chooses what context to pass into the subagent, which gets that context along with the skill's context (the stuff in the Markdown file and the rest of the files in the FS).
This may all be very wrong, though, as it's mostly conjecture from the little I've worked with skills.
Claude also has custom slash-commands, so you can force skill usage as you see fit.
This lets you trigger a skill with '/foo' in a way that resembles the way you'd use the command line.
Claude Code is very good at using well-defined skills without a command though, but in a scenario where this is some nuance between similar skills they are useful.
BUT what makes them powerful is that you can include code with the skill package.
Like I have a skill that uses a Go program to traverse the AST of a Go project to find different issues in it.
You COULD just prompt it but then the LLM would have to dig around using find and grep. Now it runs a single executable which outputs an LLM optimised clump of text for processing.
“inserted at the start of any query” feels like a bit of a misunderstanding to me. It plops the skill text into the context when it needs it or when you tell it to. It’s basically like pasting in text or telling it to read a file, except for the bit where it can decide on its own to do it. I’m not sure start, middle, or end of query is meaningful here.
The agentic development scene has slowly turned into a full-blown JavaScript circus—bright lights, loud chatter, and endless acts that all look suspiciously familiar. We keep wrapping the same old problems in shiny new packages, parading them around as if they’re groundbreaking innovations. How long before the crowd grows tired of yet another round of “RFC” performances?
Well, these agentic / AI companies don't even know what an RFC is, let alone how to write one. The last time they attempted to create a "standard" (MCP) it was not only premature, but it was a complete security mess.
Apart from Google Inc., I have not seen a single "AI company" propose an RFC that was reviewed by the IETF and became a proper internet standard. [0]
"MCP" was one of the worst so-called "standards" ever built since the JWT was proposed. So I do not take Anthropic seriously when they create so-called "open standards" especially when the reference implementation is in Javascript or TypeScript.
It’s a fast moving field. People aren’t coming up with new ideas to be performative. They see issues with the state of the art and make something that may or may not advance things forward. MCP is huge for getting agents to do things in the “real world”. However, it’s costly! Skills is a cheap way to fill that gap for many cases. People are finding immediate value in both of these. Try not to be so pessimistic.
It's not pessimism, but actual compatibility issues
like deno vs npm package ecosystems that didn't work together for many years
There are multiple AGENTS vs CLAUDE vs .github/instructions; skills vs commands; ... intermixed and inconsistent concepts, all out in the wild
When I work on a project, do all the files align? If I work in an org, where developers have agent choice, how many of these instructions and skills "distros" do I need to put (pollute?) my repo with?
I have no idea why I’m about to defend OpenAI here. BUT OpenAI have released some open weight models like gpt-oss and whisper. But sure open weight not open source. And yeah I really don’t like OpenAI as a company to be clear.
They have but it does feel like they are developing a closed platform aka Apple.
Apple has shortcuts, but they haven’t propped it up like a standard that other people can use.
To contrast this is something you can use even if you have nothing to do with Claude, and your tools created will be compatible with the wider ecosystem.
One thing that is interesting to think about is given a skill which is just "pre-context", how can it be _evolved_ to create prompts given _my_ context? e.g. here is their web artifact skill builder from desktop app:
```
web-artifacts-builder
Suite of tools for creating elaborate, multi-component claude.ai HTML artifacts using modern frontend web technologies (React, Tailwind CSS, shadcn/ui). Use for complex artifacts requiring state management, routing, or shadcn/ui components - not for simple single-file HTML/JSX artifacts.
```
Say I want to build a landing page with some relatively static content — I don't know it yet but its just gonna be bootstrap CSS, no SPA/React(ish), it'll be fine with templated server side thing. But I don't know how to express this in words. Could the skill _evolve_ based on what my preferences are and what is possible for a relative novice to grok and construct?
This is a simple example, but it could extend to say using sqlite+litestream instead of postgres or using Gradient boosted trees instead of an expensive transformer based classifier.
I noticed a similar optimization path with skills, where I now have subagents to analyze the performance of a previous skill/command/hook execution, triggered by a command. I've pushed this to my plugin marketplace https://github.com/athola/claude-night-market
The real opportunity with Agent Skills isn't just packaging prompts. It's providing a mechanism that enables a clean split: LLM as the control plane (planning, choosing tools, handling ambiguous steps) and code or sub-agents as the data/execution plane (fetching, parsing, transforming, simulating, or executing NL steps in a separate context).
This requires well-defined input/output contracts and a composition model. I opened a discussion on whether Agent Skills should support this kind of composability:
https://github.com/agentskills/agentskills/issues/11
And also, in writing, writing from top to bottom has its disadvantages. It makes sense to emulate human writing process and have passes, as you flesh out, and conversely summarize writing.
Current LLMs can brute force these things through emulation/observation/mimicry but they arent as good as doing it the right way. Not only would I like to see "skills" but also "processes" where you create a well defined order that tasks are accomplished in sequence. Repeatable templates. This would essentially include variables in the templates, set for replacement.
You can do this with Gemini commands and extensions.
https://cloud.google.com/blog/topics/developers-practitioner...
> Parsing a known HTML structure
In most cases, HTML structures that are being parsed aren't known. If they're known, you control them, and you don't need to parse them in the first place. If they're someone else's, who knows when they'll change, or under what condition they're different.
But really, I don't see the stuff you're talking about happening in prod for non-one-off usecases. I see LLMs used in prod usecases exactly for data where you don't know exactly what its shape will be, and there's an enormous amount of such cases. If the same logic is needed every time, of course you don't have an LLM execute that logic, you have the LLM write a deterministic script.
Of course this requires substantial buy in from application owners - create the vocabulary - and users - agree to expose and share the sentences they generate - but the results would be worth it.
Additionally, I can't even get claude or codex to reliable use the prompt and simple rules (use this command to compile) in an agents.md or whatever required markdown file is needed. Why would I assume they will reliably handle skills prompts spread about a codebase?
I've even seen tool usage deteriorate while it's thinking and self commanding through its output to say.. read code from a file. Sometimes it uses tail while other times it gets confused on the output and then writes a basic python program to parse lines and strings from the same file to effectively get what was the same output as before. How bizarre!
IIUC their most recent arc focuses on prompt optimization[0] where you can optimize — using DSPy and an optimization algo GEPA [1] — using relative weights on different things like errors, token usage, complexity.
[0] https://docs.boundaryml.com/guide/baml-advanced/prompt-optim... [1] https://github.com/gepa-ai/gepa?tab=readme-ov-file
Skills are essentially boiling down to distributed parts of a Main Prompt. If you consider a state model you can see this pattern: Task is the state and combining the task's specifics skills defines the current prompt augmentation. When the task changes, another prompt emerges.
In the end, it is the clear guidance of the Agent that is the deciding factor.
Transforming an arbitrary table is still hard, especially a table on a webpage or in a document. Sometimes I even struggle to find the right library. The effort does not seem worth it for one-off need of such transformation too. LLM can be a great tool for doing the tasks.
MCP does three things conceptually: it lets you build a bridge between an agent and <something else>, it specifies a UI+API layer between the bridge and the LLM, and it formalizes the description of that bridge in a tool-calling format.
It's that UI+API layer that's the biggest pain in the ass, in my opinion. Sometimes you need it; for instance, if you wanted an agent to access your emails, a high quality MCP server that can't destroy your life through enthusiastic tool calling makes sense.
If, however, you have, say a CLI tool or simple API that's reasonably self documenting and you're willing to have it run, and/or if you need specific behavior with a different context setting, then a skill can just be a markdown file that explains what, how, why.
I will say, when using MCP be selective about which tools you enable. A lot of the time they come with say 30 tools and you only personally care about 5 of them. The other 25 are just rotting your context.
All public MCP server I’ve seen have been a disaster with too many tools and tokens polluting the context. It’s really most useful when you need tight integration with some other environment and can write a little custom wrapper to provide it.
The durable pattern here isn't a specific file format. It's on-demand capability discovery: a small index with concise metadata so the model can find what's available, then pull details only when needed. That's a real improvement over tool calling and MCP's "preload all tools up front" approach, and it mirrors how humans work. Even as models bake more know-how into their weights, novel capabilities will always be created faster than retraining cycles. And even if context becomes unlimited, preloading everything up front remains wasteful when most of it is irrelevant to the task at hand.
So even if "Skills" gets replaced, discoverability and progressive disclosure likely survive.
The problem isn’t having a standard way for agents to branch out. The problem is that AI is the new Javascript web framework: there’s nothing wrong with frameworks, but when everyone and their son are writing a new framework and half those frameworks barely work, you end up with a buggy, fragmented ecosystem.
I get why this happens. Startups want VC money, established companies then want to appear relevant, and then software engineers and students feel pressured to prove they’re hireable. And you end up with one giant pissing contest where half the players likely see the ridiculousness of the situation but have little choice other than to join party.
We'll see how many of these are around in a few years.
The agent loop architectural pattern (and that’s the relevant bit) is going to continue to matter. There will be new patterns for sure, but tool calling plus while loop (which is all an “agent” is) is powerful and highly general.
Right now models have roughly all of the written knowledge available to mankind, minus some obscure held out private archives and so on. They have excellent skills and general abilities to construct plausible sequences of actions to accomplish work, but we need to hold their hands to really get decent performance across a wide range of activities. Skills and agent frameworks and MCP carve out different domains of that problem, with successful solutions providing training data for future models that might be able to be either generalized, or they'll be able to create a vast mountain of synthetic data following successful patterns, and make the next generation of models incredibly useful for a huge number of tasks, by default.
It might also be possible that by studying the problem, identifying where mode collapses and issues with training prevent the right sort of generalization, they might tweak the architecture and be able to solve the deficiency through normal training runs, and thereby discard the need for all the bespoke artisanal agent specifications.
So basically a reusable prompt like the previous has asked?
This may all be very wrong, though, as it's mostly conjecture from the little I've worked with skills.
This lets you trigger a skill with '/foo' in a way that resembles the way you'd use the command line.
Claude Code is very good at using well-defined skills without a command though, but in a scenario where this is some nuance between similar skills they are useful.
BUT what makes them powerful is that you can include code with the skill package.
Like I have a skill that uses a Go program to traverse the AST of a Go project to find different issues in it.
You COULD just prompt it but then the LLM would have to dig around using find and grep. Now it runs a single executable which outputs an LLM optimised clump of text for processing.
Inversely, you can persist/summarize a larger bit of context into a skill, so a new agent session can easily pull it in.
So yes, it's just turtles, sorry, prompts all the way down.
https://github.com/alganet/skills/blob/main/skills/left-padd...
Either way, that’s hilarious. Well done.
<conspiracy_mode> maybe all of them were designed to occupy the full context window of earlier GPT models </conspiracy_mode>
Apart from Google Inc., I have not seen a single "AI company" propose an RFC that was reviewed by the IETF and became a proper internet standard. [0]
"MCP" was one of the worst so-called "standards" ever built since the JWT was proposed. So I do not take Anthropic seriously when they create so-called "open standards" especially when the reference implementation is in Javascript or TypeScript.
[0] https://www.rfc-editor.org/standards
> I have not seen a single "AI company" propose an RFC that was reviewed by the IETF and became a proper internet standard.
Why would the IETF have anything to do with LLM/agent standards? This seems like a category error. They also don’t ratify web standards, for example.
like deno vs npm package ecosystems that didn't work together for many years
There are multiple AGENTS vs CLAUDE vs .github/instructions; skills vs commands; ... intermixed and inconsistent concepts, all out in the wild
When I work on a project, do all the files align? If I work in an org, where developers have agent choice, how many of these instructions and skills "distros" do I need to put (pollute?) my repo with?
Deleted Comment
It is not healthy when you have an obsession this bad, seriously. Seek help.
Although Skills are just md files but it’s good to see them “donate” it.
There goal seems to be simple: Focus on coding and improving it. They’ve found a great niche and hopefully revenue generating business there.
OpenAI on the other hand doesn’t give me same vibes, they don’t seem very oriented. They’re playing catchup with both Google models and Anthropic
Apple has shortcuts, but they haven’t propped it up like a standard that other people can use.
To contrast this is something you can use even if you have nothing to do with Claude, and your tools created will be compatible with the wider ecosystem.
Many many MCPs could and should just be a skill instead.
``` web-artifacts-builder
Suite of tools for creating elaborate, multi-component claude.ai HTML artifacts using modern frontend web technologies (React, Tailwind CSS, shadcn/ui). Use for complex artifacts requiring state management, routing, or shadcn/ui components - not for simple single-file HTML/JSX artifacts. ```
Say I want to build a landing page with some relatively static content — I don't know it yet but its just gonna be bootstrap CSS, no SPA/React(ish), it'll be fine with templated server side thing. But I don't know how to express this in words. Could the skill _evolve_ based on what my preferences are and what is possible for a relative novice to grok and construct?
This is a simple example, but it could extend to say using sqlite+litestream instead of postgres or using Gradient boosted trees instead of an expensive transformer based classifier.
Paper & applications published here: https://earthpilot.ai/metaskills/