Skills are cool, but to me it's more of a design pattern / prompt engineering trick than something in need of a hard spec. You can even implement it in an MCP - I've been doing it for a while: "Before doing anything, search the skills MCP and read any relevant guides."
I get this sentiment, but I think it is why it is so powerful actually. It would be like calling Docker/containers just some shell scripts for a kernel feature. It may be conceptually simple, but that doesn't mean it isn't novel and could transform things.
I highly doubt we'll be talking about MCP next year. It is a pretty bad spec but we had to start somewhere.
I agree with you, but also I want to ask if I do understand this correctly: there was a paradigm in which we were aiming for Small Language Models to perform specific types of tasks, orchestrated by the LLM. That is what I perceived the MCP architecture came to standardize.
But here, it seems more like a diamond shape of information flow: the LLM processes the big task, then prompts are customized (not via LLM) with reference to the Skills, and then the customized prompt is fed yet again to the LLM.
I disagree. You wrap this up in a container / runtime spec. + package index and suddenly you’ve got an agent that can dynamically extend its capabilities based upon any skill that anybody has shared. Instead of `uv add foo` for Python packages you’ve got `skill add foo` for agent skills that the agent can run whenever they have a matching need.
Finally a good replacement for MCP. MCP was a horrible idea executed even worse and they hide the complexity under a dangerous "just paste this one liner into your mcpServers config!" together with wasting tens of thousands of tokens.
MCP is a protocol meant for general use for clients, which Claude Skills seems more proprietary. To what extent is Skills expected to be something that other clients, such as web based clients could adopt? To some extent it would probably make sense to expose through the MCP SDK?
Context overload is definitely a problem with MCP, but its plug-and-play nature and discoverability are solid. Pasting a URL (or just using a button or other UX element) to link an MCP server presents a much lower barrier to entry than having the LLM run `cli-tool --help`, which assumes the CLI tool is already installed and the LLM has to know about it.
I think "Skill" is a subset of developer instruction, in which translates to AGENTS.md (or Claude.md). Today to add capability to an AI, all we need a good set of .md files and a AGENTS.md as the base.
In Claude and ChatGPT a project is really just a custom system prompt and an optional bunch of files. Those files are both searchable via tools and get made available in the Code Interpreter container.
I see skills as something you might use inside of a project. You could have a project called "data analyst" with a bunch of skills for different aspects of that task - how to run a regression, how to export data from MySQL, etc.
They're effectively custom instructions that are unlimited in size and that don't cause performance problems by clogging up the context - since the whole point of skills is they're only read into the context when the LLM needs them.
Skills can be toggled on and off, which is good for context management, especially on larger / less frequently needed skills
Currently if a project is 5% or less capacity, it will auto-load all files, so skills also give you a way to avoid that capacity limit. For larger projects, Claude has to search files, which can be unreliable, so skills will again be useful for an explicit "always load this"
I'm getting accused of paid shilling a lot right now.
(If Anthropic had paid me to write this they would probably have asked me NOT to spend a section of the article pointing out flaws in their MCP specification!)
It's pretty neat that they're adding these things. In my projects, I have a `bin/claude` subdirectory where I ask it to put scripts etc. that it builds. In the claude.md I then note that it should look there for tools. It does a pretty good job of this. To be honest, the thing I most need are context-management helpers like "start a claude with this set of MCPs, then that set, and so on". Instead right now I have separate subdirectories that I then treat as projects (which are supported as profiles in Claude) which I then launch a `claude` from. The advantage of the `bin/claude` in each of these things is that it functions as a longer-cycle learning thing. My Claude instantly knows how to analyze certain BigQuery datasets and where to find the credentials file and so on.
Filesystem as profile manager is not something I thought I'd be doing, but here we are.
Ah, in my case, I want to just talk to a video-editing Claude, and then a sys-admin Claude, and so on. I don't want to go through a main Claude who will instantiate these guys. I want to talk to the particular Claudes myself. But if sub-agents work for this, then maybe I just haven't been using them well.
I'm perplexed why they would use such a silly example in their demo video (rotating an image of a dog upside down and cropping). Surely they can find more compelling examples of where these skills could be used?
I've been emulating this in claude code by manually @tagging markdown files containing guides for common tasks in our repository. Nice to see that this step is now automatic as well.
The uptake on Claude-skills seems to have a lot of momentum already!
I was fascinated on Tuesday by “Superpowers” , https://blog.fsck.com/2025/10/09/superpowers/
… and then packaged up all the tool-building I’ve been working on for awhile into somewhat tidy skills that i can delegate agents to:
Delegation is super cool. I can sometimes end up having too much Linear issue context coming in. IE frequently I want a Linear issue description and last comment retrieved. Linear MCP grabs all comments which pollutes the context and fills it up too much.
Sub agents, mcp, skills - wonder how are they supposed to interact with each other?
Feels like fair bit of overlap here. It's ok to proceed in a direction where you are upgrading the spec and enabling claude wth additional capabilities. But one can pretty much use any of these approaches and end up with the same capability for an agent.
Right now feels like a ux upgrade from mcp where you need a json but instead can use a markdown in a file / folder and provide multi-modal inputs.
I don't really see why they had to create a different concept. Maybe makes sense "marketing-wise" for their chat UI, but in Claude Code? Especially when CLAUDE.md is a thing?
MCP Prompts are meant to be user triggered, whereas I believe a Skill is meant to be an LLM-triggered, use-case centric set of instructions for a specific task.
- MCP Prompt: "Please solve GitHub Issue #{issue_id}"
- Skills:
- React Component Development (React best practices, accessible tools)
- REST API Endpoint Development
- Code Review
This will probably result in:
- Single "CLAUDE.md" instructions are broken out into discoverable instructions that the LLM will dynamically utilize based on the user's prompt
- rather than having direct access to Tools, Claude will always need to go through Skill instructions first (making context tighter since it cant use Tools without understanding \*how\* to use them to achieve a certain goal)
- Clients will be able to add infinite MCP servers / tools, since the Tools themselves will no longer all be added to the context window
It's basically a way to decouple User prompts from direct raw Tool access, which actually makes a ton of sense when you think of it.
I see this as a lower overhead replacement for MCP. Rather than managing a bunch of MCP's, use the directory structure to your advantage, leverage the OS's capability to execute
Narrowly focused semantics/affordances (for both LLM and users/future package managers/communities, ease of redistribution and context management:
- skills are plain files that are injected contextually whereas prompts would come w the overhead of live, running code that has to be installed just right into your particular env, to provide a whole mcp server. Tbh prompts also seem to be more about literal prompting, too
- you could have a thousand skills folders for different softwares etc but good luck with having more than a few mcp servers that are loaded into context w/o it clobbering the context
I think those three concepts complement each other quite neatly.
MCPs can wrap APIs to make them usable by an LLM agent.
Skills offer a context-efficient way to make extra instructions available to the agent only when it needs them. Some of those instructions might involve telling it how best to use the MCPs.
Sub-agents are another context management pattern, this time allowing a parent agent to send a sub-agent off on a mission - optimally involving both skills and MCPs - while saving on tokens in that parent agent.
"So I fired up a fresh Claude instance (fun fact: Code Interpreter also works in the Claude iOS app now, which it didn't when they first launched) and prompted:
Create a zip file of everything in your /mnt/skills folder"
It's a fun, terrifying world that this kind of "hack" to exfiltrate data is possible! I hope it does not have full filesystem/bin access, lol. Can it SSH?...
What's the hack? Instead of typing `zip -r mnt.zip /mnt` into bash, you type `Create a zip file of /mnt` in claude code. It's the same thing running as the same user.
I feel like a danger with this sort of thing is that the capability of the system to use the right skill is limited by the little blurb you give about what the skill is for. Contrast with the way a human learns skills - as we gain experience with a skill, we get better at understanding when it's the right tool for the job. But Claude is always starting from ground zero and skimming your descriptions.
> Contrast with the way a human learns skills - as we gain experience with a skill, we get better at understanding when it's the right tool for the job.
Which is precisely why Richard Sutton doesn't think LLMs will evolve to AGI[0]. LLMs are based on mimicry, not experience, so it's more likely (according to Sutton) that AGI will be based on some form of RL (reinforcement learning) and not neural networks (LLMs).
More specifically, LLMs don't have goals and consequences of actions, which is the foundation for intelligence. So, to your point, the idea of a "skill" is more akin to a reference manual, than it is a skill building exercise that can be applied to developing an instrument, task, solution, etc.
The industry has been doing RL on many kinds of neural networks, including LLMs, for quite some time. Is this person saying we RL on some kind of non neural network design? Why is that more likely to bring AGI than an LLM?.
> More specifically, LLMs don't have goals and consequences of actions, which is the foundation for intelligence.
Besides a "reference manual", Claude Skills is analogous to a "toolkit with an instruction manual" in that it includes both instructions (manuals) and executable functions (tools/code)
I would love to understand were this notion of LLM becoming AGI ever came from?
ChatGPT broke upen the dam to massive budget on AI/LM and LLM will probably be a puzzle peace to AGI. But otherwise?
I mean it should be clear that we have so much work to do like RL (which now happens btw. on massive scale because you thumb up or down every day), thinking, Model of Experts, toolcalling and super super critical: Architecture.
Compute is a hard upper limit too.
And the math isn't done either. The performance of Context length has advanced, we also saw other approcheas like a diffusion based models.
Whenever you hear the leading experts talking, they mention world models.
We are still in a phase were we have plenty of very obivous ideas people need to try out.
But alone the quality of whispher, llm as an interface and tool calling can solve problems with robotics and stuff, no one was able to solve that easy ever before.
IMO this is a context window issue. Humans are pretty good are memorizing super broad context without great accuracy. Sometimes our "recall" function doesn't even work right ("How do you say 'blah' in German again?"), so the more you specialize (say, 10k hours / mastery), the better you are at recalling a specific set of "skills", but perhaps not other skills.
On the other hand, LLMs have a programatic context with consistent storage and the ability to have perfect recall, they just don't always generate the expected output in practice as the cost to go through ALL context is prohibitive in terms of power and time.
Skills.. or really just context insertion is simply a way to prioritize their output generation manually. LLM "thinking mode" is the same, for what it's worth - it really is just reprioritizing context - so not "starting from scratch" per se.
When you start thinking about it that way, it makes sense - and it helps using these tools more effectively too.
I’d been re-teaching Claude to craft Rest-api calls with curl every morning for months before i realized that skills would let me delegate that to cheaper models, re-using cached-token-queries, and save my context window for my actual problem-space CONTEXT.
Not really. It's a consequential issue. No matter how big or small the context window is, LLMs simply do not have the concept of goals and consequences. Thus, it's difficult for them to acquire dynamic and evolving "skills" like humans do.
Worth noting, even though it isn’t critical to your argument, that LLMs do not have perfect recall. I got to great lengths to keep agentic tools from relying on memory, because they often get it subtly wrong.
This is the crux of knowledge/tool enrichment in LLMs. The idea that we can have knowledge bases and LLMs will know WHEN to use them is a bit of a pipe dream right now.
Can you be more specific? The simple case seems to be solved, eg if I have an mcp for foo enabled and then ask about a list of foo, Claude will go and call the list function on foo.
LLMs are a probability based calculation, so it will always skim to some degree, and always guess to some degree, and often pick the best choice available to it even though it might not be the best.
For folks who this seems elusive for, it's worth learning how the internals actually work, helps a great deal in how to structure things in general, and then over time as the parent comment said, specifically for individual cases.
Most of the experience is general information not specific to project/discussion. LLM starts with all that knowledge. Next it needs a memory and lookup system for project specific information. Lookup in humans is amazingly fast, but even with a slow lookup, LLMs can refer to it in near real-time.
Skills are literally technical documentation for your project it seems. So now we can finally argue for time to write doc, just name it "AI enhancing skill definitions"
Excellent point, put simply building those preferences and lessons would demand a layer of latent memory, personal models, maybe now is a good time to revisit this idea...
IMHO, don't, don't keep up. Just like "best practices in prompt engineering", these are just temporary workaround for current limitations, and they're bound to disappear quickly. Unless you really need the extra performance right now, just wait until models get you this performance out of the box instead of investing into learning something that'll be obsolete in months.
I agree with your conclusion not to sweat all these features too much, but only because they're not hard at all to understand on demand once you realize that they all boil down to a small handful of ways to manipulate model context.
But context engineering very much not going anywhere as a discipline. Bigger and better models will by no means make it obsolete. In fact, raw model capability is pretty clearly leveling off into the top of an S-curve, and most real-world performance gains over the last year have been precisely because of innovations on how to better leverage context.
I agree with this take. Models and the tooling around them are both in flux. I d rather not spend time learning something in detail for these companies to then pull the plug chasing next-big-thing.
Well, have some understanding: the good folks need to produce something, since their main product is not delivering the much yearned for era of joblessness yet. It's not for you, it's signalling their investors - see, we're not burning your cash paying a bunch of PhDs to tweak the model weights without visible results. We are actually building products. With a huge and willing A/B testing base.
Agree — it's a big downside as a user to have more and more of these provider-specific features. More to learn, more to configure, more to get locked into.
Of course this is why the model providers keep shipping new ones; without them their product is a commodity.
If I were to say "Claude Skills can be seen as a particular productization of a system prompt" would I be wrong?
From a technical perspective, it seems like unnecessary complexity in a way. Of course I recognize there are lot of product decisions that seem to layer on 'unnecessary' abstractions but still have utility.
In terms of connecting with customers, it seems sensible, under the assumption that Anthropic is triaging customer feedback well and leading to where they want to go (even if they don't know it yet).
Update: a sibling comment just wrote something quite similar: "All these things are designed to create lock in for companies. They don’t really fundamentally add to the functionality of LLMs." I think I agree.
All these things are designed to create lock in for companies. They don’t really fundamentally add to the functionality of LLMs. Devs should focus on working directly with model generate apis and not using all the decoration.
Me? I love some lock in. Give me the coolest stuff and I'll be your customer forever. I do not care about trying to be my own AI company. I'd feel the same about OpenAI if they got me first... but they didn't. I am team Anthropic.
Joking aside, I ask Claude how to uses Claude... all the time! Sometimes I ask ChatGTP about Claude. It actually doesn't work well because they don't imbue these AI tools with any special knowledge about how they work, they seem to rely on public documentation which usually lags behind the breakneck pace of these feature-releases.
Thats the start of the singularity. The changes will keep accelerating and less and less people will be able to keep up until only the AIs themselves know how to use.
I don’t think these are things to keep up with. Those would be actual fundamental advances in the transformer architecture and core elements around it.
This stuff is like front end devs building fad add-ons which call into those core elements and falsely market themselves as fundamental advancements.
https://simonwillison.net/2025/Oct/16/claude-skills/
I highly doubt we'll be talking about MCP next year. It is a pretty bad spec but we had to start somewhere.
But here, it seems more like a diamond shape of information flow: the LLM processes the big task, then prompts are customized (not via LLM) with reference to the Skills, and then the customized prompt is fed yet again to the LLM.
Is that the case?
VSCode recently introduced support nested AGENTS.md which albeit less formal, might overlap:
https://code.visualstudio.com/updates/v1_105#_support-for-ne...
It also means that any tool that knows how to read AGENTS.md could start using skills today.
"if you need to create a PDF file first read the file in skills/pdfs/SKILL.md"
I see skills as something you might use inside of a project. You could have a project called "data analyst" with a bunch of skills for different aspects of that task - how to run a regression, how to export data from MySQL, etc.
They're effectively custom instructions that are unlimited in size and that don't cause performance problems by clogging up the context - since the whole point of skills is they're only read into the context when the LLM needs them.
Currently if a project is 5% or less capacity, it will auto-load all files, so skills also give you a way to avoid that capacity limit. For larger projects, Claude has to search files, which can be unreliable, so skills will again be useful for an explicit "always load this"
no reason not to.
See comment here: https://news.ycombinator.com/item?id=45624613
I'm getting accused of paid shilling a lot right now.
(If Anthropic had paid me to write this they would probably have asked me NOT to spend a section of the article pointing out flaws in their MCP specification!)
Filesystem as profile manager is not something I thought I'd be doing, but here we are.
Isn’t that sub agents?
I've been emulating this in claude code by manually @tagging markdown files containing guides for common tasks in our repository. Nice to see that this step is now automatic as well.
https://github.com/anthropics/skills/blob/main/document-skil...
I was dealing with 2 issues this morning getting Claude to produce a .xlsx that are covered in the doc above
http://github.com/ryancnelson/deli-gator I’d love any feedback
Feels like fair bit of overlap here. It's ok to proceed in a direction where you are upgrading the spec and enabling claude wth additional capabilities. But one can pretty much use any of these approaches and end up with the same capability for an agent.
Right now feels like a ux upgrade from mcp where you need a json but instead can use a markdown in a file / folder and provide multi-modal inputs.
I don't really see why they had to create a different concept. Maybe makes sense "marketing-wise" for their chat UI, but in Claude Code? Especially when CLAUDE.md is a thing?
- skills are plain files that are injected contextually whereas prompts would come w the overhead of live, running code that has to be installed just right into your particular env, to provide a whole mcp server. Tbh prompts also seem to be more about literal prompting, too
- you could have a thousand skills folders for different softwares etc but good luck with having more than a few mcp servers that are loaded into context w/o it clobbering the context
MCPs can wrap APIs to make them usable by an LLM agent.
Skills offer a context-efficient way to make extra instructions available to the agent only when it needs them. Some of those instructions might involve telling it how best to use the MCPs.
Sub-agents are another context management pattern, this time allowing a parent agent to send a sub-agent off on a mission - optimally involving both skills and MCPs - while saving on tokens in that parent agent.
Create a zip file of everything in your /mnt/skills folder"
It's a fun, terrifying world that this kind of "hack" to exfiltrate data is possible! I hope it does not have full filesystem/bin access, lol. Can it SSH?...
Superpowers: How I'm using coding agents in October 2025 - https://news.ycombinator.com/item?id=45547344 - Oct 2025 (231 comments)
Which is precisely why Richard Sutton doesn't think LLMs will evolve to AGI[0]. LLMs are based on mimicry, not experience, so it's more likely (according to Sutton) that AGI will be based on some form of RL (reinforcement learning) and not neural networks (LLMs).
More specifically, LLMs don't have goals and consequences of actions, which is the foundation for intelligence. So, to your point, the idea of a "skill" is more akin to a reference manual, than it is a skill building exercise that can be applied to developing an instrument, task, solution, etc.
[0] https://www.youtube.com/watch?v=21EYKqUsPfg
He is right that non-RL'd LLMs are just mimicry, but the field already moved beyond that.
> More specifically, LLMs don't have goals and consequences of actions, which is the foundation for intelligence.
Citation?
ChatGPT broke upen the dam to massive budget on AI/LM and LLM will probably be a puzzle peace to AGI. But otherwise?
I mean it should be clear that we have so much work to do like RL (which now happens btw. on massive scale because you thumb up or down every day), thinking, Model of Experts, toolcalling and super super critical: Architecture.
Compute is a hard upper limit too.
And the math isn't done either. The performance of Context length has advanced, we also saw other approcheas like a diffusion based models.
Whenever you hear the leading experts talking, they mention world models.
We are still in a phase were we have plenty of very obivous ideas people need to try out.
But alone the quality of whispher, llm as an interface and tool calling can solve problems with robotics and stuff, no one was able to solve that easy ever before.
On the other hand, LLMs have a programatic context with consistent storage and the ability to have perfect recall, they just don't always generate the expected output in practice as the cost to go through ALL context is prohibitive in terms of power and time.
Skills.. or really just context insertion is simply a way to prioritize their output generation manually. LLM "thinking mode" is the same, for what it's worth - it really is just reprioritizing context - so not "starting from scratch" per se.
When you start thinking about it that way, it makes sense - and it helps using these tools more effectively too.
I’d been re-teaching Claude to craft Rest-api calls with curl every morning for months before i realized that skills would let me delegate that to cheaper models, re-using cached-token-queries, and save my context window for my actual problem-space CONTEXT.
Not really. It's a consequential issue. No matter how big or small the context window is, LLMs simply do not have the concept of goals and consequences. Thus, it's difficult for them to acquire dynamic and evolving "skills" like humans do.
Of course OpenAI and Anthropic want to be able to reuse the same servers/memory for multiple users, otherwise it would be too expensive.
Could we have "personal" single-tenant setups? Where the LLM incorporates every previous conversation?
For folks who this seems elusive for, it's worth learning how the internals actually work, helps a great deal in how to structure things in general, and then over time as the parent comment said, specifically for individual cases.
The description is equivalent to your short term memory.
The skill is like your long term memory which is retrieved if needed.
These should both be considered as part of the AI agent. Not external things.
You probably mean "starting from square one" but yeah I get you
But context engineering very much not going anywhere as a discipline. Bigger and better models will by no means make it obsolete. In fact, raw model capability is pretty clearly leveling off into the top of an S-curve, and most real-world performance gains over the last year have been precisely because of innovations on how to better leverage context.
Of course this is why the model providers keep shipping new ones; without them their product is a commodity.
From a technical perspective, it seems like unnecessary complexity in a way. Of course I recognize there are lot of product decisions that seem to layer on 'unnecessary' abstractions but still have utility.
In terms of connecting with customers, it seems sensible, under the assumption that Anthropic is triaging customer feedback well and leading to where they want to go (even if they don't know it yet).
Update: a sibling comment just wrote something quite similar: "All these things are designed to create lock in for companies. They don’t really fundamentally add to the functionality of LLMs." I think I agree.
Plugins include: * Commands * MCPs * Subagents * Now, Skills
Marketplaces aggregate plugins.
This stuff is like front end devs building fad add-ons which call into those core elements and falsely market themselves as fundamental advancements.
It’s not exactly wrong, but it leaves out a lot of intermediate steps.