GitHub MCP exploited: Accessing private repositories via MCP

I guess I don't really get the attack. The idea seems to be that if you give your Claude an access token, despite what you tell it that it's for, Claude can be convinced to use it for anything that it's authorized for.

I think that's probably something anybody using these tools should always think. When you give a credential to an LLM, consider that it can do up to whatever that credential is allowed to do, especially if you auto-allow the LLM to make tool use calls!

But GitHub has fine-grained access tokens, so you can generate one scoped to just the repo that you're working with, and which can only access the resources it needs to. So if you use a credential like that, then the LLM can only be tricked so far. This attack wouldn't work in that case. The attack relies on the LLM having global access to your GitHub account, which is a dangerous credential to generate anyway, let alone give to Claude!

miki123211 · 10 months ago

The issue here (which is almost always the case with prompt injection attacks) is that an LLM has access to attacker-controlled data, sensitive information, and a data exfiltration capability.

THe "cardinal rule of agent design" should be that an LLM can have access to at most two of these during one session. To avoid security issues, agents should be designed in a way that ensures this.

For example, any agent that accesses an issue created by an untrusted party should be considered "poisoned" by attacker-controlled data. If it then accesses any private information, its internet access capability should be severely restricted or disabled altogether until context is cleared.

In this model, you don't need per-repo tokens. As long as the "cardinal rule" is followed, no security issue is possible.

Sadly, it seems like MCP doesn't provide the tools needed to ensure this.

tmpz22 · 10 months ago

Genuine question - can we even make a convincing argument for security over convenience to two generations of programmers who grew up on corporate breach after corporate breach with just about zero tangible economic or legal consequences to the parties at fault? Presidential pardons for about a million a pop [1]?

What’s the cassus belli to this younger crop of executives that will be leading the next generation of AI startups?

[1]: https://www.cnbc.com/2025/03/28/trump-pardons-nikola-trevor-...

cwsx · 10 months ago

> The "cardinal rule of agent design" should be that an LLM can have access to at most two of these during one session. To avoid security issues, agents should be designed in a way that ensures this.

Then don't give it your API keys? Surely there's better ways to solve this (like an MCP API gateway)?

[I agree with you]

tshaddox · 10 months ago

> an LLM has access to attacker-controlled data, sensitive information, and a data exfiltration capability

> THe "cardinal rule of agent design" should be that an LLM can have access to at most two of these during one session

I still don't really get it. Surely the older, simpler, and better cardinal rule is that you just don't expose any service to the Internet that you have given access to your private data, unless you directly control that service and have a very good understanding of its behavior.

sporkland · 10 months ago

Great succinct summary of a hard problem!

I might reword: "attacker-controlled data, sensitive information, and a data exfiltration capability"

to: "attacker-controlled data and privileged operations (e.g. sensitive information acces+data exfiltration or ability to do operations to production system)"

jerf · 10 months ago

The S in MCP stands for security!...

... is probably a bit unfair. From what I've seen the protocol is generally neutral on the topic of security.

But the rush to AI does tend to stomp on security concerns. Can't spend a month tuning security on this MCP implementation when my competition is out now, now, now! Go go go go go! Get it out get it out get it out!

That is certainly incompatible with security.

The reason anyone cares about security though is that in general lacking it can be more expensive than taking the time and expense to secure things. There's nothing whatsoever special about MCPs in this sense. Someone's going to roll snake eyes and discover that the hard way.

empath75 · 10 months ago

Can you give me more resources to read about this? It seems like it would be very difficult to incorporate web search or anything like that in Cursor or another IDE safely.

jmward01 · 10 months ago

I don't know that this is a sustainable approach. As LLMs become more capable and are able to do the functions that a real human employee is doing they will need similar access that a normal human employee would have. Clearly not all employees have access to everything, but there is clearly a need for some broader access. Maybe we should be considering human type controls. If you are going to give broader access then you need X, Y and Z to do it like it requests temporary access from a 'boss' LLM etc etc. There are clear issues with this approach but humans also have these issues too (social engineering attacks work all too well). Is there potentially a different pattern that we should be exploring now?

lbeurerkellner · 10 months ago

I agree, one of the issues are tokens with too broad permission sets. However, at the same time, people want general agents which do not have to be unlocked on a repository-by-repository basis. That's why they give them tokens with those access permissions, trusting the LLM blindly.

Your caution is wise, however, in my experience, large parts of the eco-system do not follow such practices. The report is an educational resource, raising awareness that indeed, LLMs can be hijacked to do anything if they have the tokens, and access to untrusted data.

The solution: To dynamically restrict what your agent can and cannot do with that token. That's precisely the approach we've been working on for a while now [1].

[1] https://explorer.invariantlabs.ai/docs/guardrails/

ljm · 10 months ago

If you look at Github's fine-grained token permissions then I can totally imagine someone looking at the 20-30 separate scopes and thinking "fuck this" while they back out and make a non-expiring classic token with access to everything.

It's one of those things where a token creation wizard would come in really handy.

idontwantthis · 10 months ago

We all want to not have to code permissions properly, but we live in a society.

flakeoil · 10 months ago

How about using LLMs to help us configure the access permissions and guardrails? /s

I think I have to go full offline soon.

Abishek_Muthian · 10 months ago

This is applicable to those deployment services like Railway which require access to all the GitHub repositories even though we need to deploy only a single project. In that regard Netlify respects access to just the repository we want to deploy. GitHub shouldn't approve the apps which don't respect the access controls.

shawabawa3 · 10 months ago

This is like 80% of security vulnerability reports we receive at my current job

Long convoluted ways of saying "if you authorize X to do Y and attackers take X, they can then do Y"

Aurornis · 10 months ago

We had a bug bounty program manager who didn’t screen reports before sending them to each team as urgent tickets.

80% of the tickets were exactly like you said: “If the attacker could get X, then they can also do Y” where “getting X” was often equivalent to getting root on the system. Getting root was left as an exercise to the reader.

zer00eyz · 10 months ago

In many cases I would argue that these ARE bugs.

Were talking about githubs token system here... by the time you have generated the 10th one of these and its expiring or you lost them along the way and re-generated them your just smashing all the buttons to get through it as fast and as thoughtlessly as possible.

If you make people change their passwords often, and give them stupid requirements they write it down on a post it and stick it on their monitor. When you make your permissions system, or any system onerous the quality of the input declines to the minimal of effort/engagement.

Usability bugs are still bugs... it's part of the full stack that product, designers and developers are responsible for.

grg0 · 10 months ago

Sounds like confused deputy and is what capability-based systems solve. X should not be allowed to do Y, but only what the user was allowed to do in the first place (X is only as capable as the user, not more.)

tom1337 · 10 months ago

Yea - I honestly don't get why a random commenter on your GitHub Repo should be able to run arbitrary prompts on a LLM which the whole "attack" seems to be based on?

tough · 10 months ago

Long convoluted ways of saying users don't know shit and will click any random links

worldsayshi · 10 months ago

Yes, if you let the chatbot face users you have to assume that the chatbot will be used for anything it is allowed to do. It's a convenience layer op top of your api. It's not an api itself. Clearly?

bloppe · 10 months ago

Well you're not giving the access token to Claude directly. The token is private to the MCP server and Claude uses the server's API, so the server could (should) take measures to prevent things like this from happening. It could notify the user whenever the model tries to write to a public repo and ask for confirmation, for instance.

Probably the only bulletproof measure is to have a completely separate model for each private repo that can only write to its designated private repo, but there are a lot of layers of security one could apply with various tradeoffs

guluarte · 10 months ago

Another type of attack waiting to happen is a malicious prompt in a url where an attacker could make the model do a curl request to post sensitive information

babyshake · 10 months ago

It's just that agentic AI introduces the possibility of old school social engineering.

hoppp · 10 months ago

They exploit the fact the llm will do anything it can to anyone.

These tools cant exist securely as long as the llm doesn't reach at least the level of intelligence of a bug that can make decisions about access control and knows the concept of lying and bad intent

om8 · 10 months ago

Even human level intelligence (whatever that means) is not enough. Social engineering works fine on our meat brains, it will most probably work on llms for foreseeable non-weird non-2027-takeoff-timeline future.

Based on “bug level of intelligence”, I (perhaps wrongly) infer that you don’t believe in possibility of a takeoff. In case it is even semi-accurate, I think llms can be secure, but, perhaps, humanity will be able to interact with such secure system for not so long time

addandsubtract · 10 months ago

Isn't the problem that the LLM can't differentiate between data and instructions? Or, at least in its current state? If we just limit it's instructions to what we / the MCP server provides, but don't let it eval() additional data it finds along the way, we wouldn't have this exploit – right?

dodslaser · 10 months ago

Yes they can. If the token you give the LLM isn't permitted to access private repos you can lie all you want, it still can't access private repos.

Of course you shouldn't give an app/action/whatever a token with too lax permissions. Especially not a user facing one. That's not in any way unique to tools based on LLMs.

p1necone · 10 months ago

I've noticed this as a trend with new technology. People seem to forget the most basic things as if they don't apply because the context is new and special and different. Nope, you don't magically get to ignore basic security practices just because you're using some new shiny piece of tech.

See also: the cryptocurrency space rediscovering financial fraud and scams from centuries ago because they didn't think their new shiny tech needed to take any lessons from what came before them.

Deleted Comment

I think from security reasoning perspective: if your LLM sees text from an untrusted source, I think you should assume that untrusted source can steer the LLM to generate any text it wants. If that generated text can result in tool calls, well now that untrusted source can use said tools too.

I followed the tweet to invariant labs blog (seems to be also a marketing piece at the same time) and found https://explorer.invariantlabs.ai/docs/guardrails/

I find it unsettling from a security perspective that securing these things is so difficult that companies pop up just to offer guardrail products. I feel that if AI companies themselves had security conscious designs in the first place, there would be less need for this stuff. Assuming that product for example is not nonsense in itself already.

jfim · 10 months ago

I wonder if certain text could be marked as unsanitized/tainted and LLMs could be trained to ignore instructions in such text blocks, assuming that's not the case already.

frabcus · 10 months ago

This somewhat happens already, with system messages vs assistant vs user.

Ultimately though, it doesn't and can't work securely. Fundamentally, there are so many latent space options, it is possible to push it into a strange area on the edge of anything, and provoke anything into happening.

Think of the input vector of all tokens as a point in a vast multi dimensional space. Very little of this space had training data, slightly more of the space has plausible token streams that could be fed to the LLM in real usage. Then there are vast vast other amounts of the space, close in some dimensions and far in others at will of the attacker, with fundamentally unpredictable behaviour.

adeon · 10 months ago

After I wrote the comment, I pondered that too (trying to think examples of what I called "security conscious design" that would be in the LLM itself). Right now and in near future, I think I would be highly skeptical even if an LLM was marketed as having such feature of being able to see "unsanitized" text and not be compromised, but I could see myself not 100% dismissing such thing.

If e.g. someone could train an LLM with a feature like that and also had some form of compelling evidence it is very resource consuming and difficult for such unsanitized text to get the LLM off-rails, that might be acceptable. I have no idea what kind of evidence would work though. Or how you would train one or how the "feature" would actually work mechanically.

Trying to use another LLM to monitor first LLM is another thought but I think the monitored LLM becomes an untrusted source if it sees untrusted source, so now the monitoring LLM cannot be trusted either. Seems that currently you just cannot trust LLMs if they are exposed at all to unsanitized text and then can autonomously do actions based on it. Your security has to depend on some non-LLM guardrails.

I'm wondering also as time goes on, agents mature and systems start saving text the LLMs have seen, if it's possible to design "dormant" attacks, some text in LLM context that no human ever reviews, that is designed to activate only at a certain time or in specific conditions, and so it won't trigger automatic checks. Basically thinking if the GitHub MCP here is the basic baby version of an LLM attack, what would the 100-million dollar targeted attack look like. Attacks only get better and all that.

No idea. The whole security thinking around AI agents seems immature at this point, heh.

DaiPlusPlus · 10 months ago

> LLMs could be trained to ignore instructions in such text blocks

Okay, but that means you'll need some way of classifying entirely arbitrary natural-language text, without any context, whether it's an "instruction" or "not an instruction", and it has to be 100% accurate under all circumstances.

AlexCoventry · 10 months ago

Maybe, but I think the application here was that Claude would generate responsive PRs for github issues while you sleep, which kind of inherently means taking instructions from untrusted data.

A better solution here may have been to add a private review step before the PRs are published.

mehdibl · 10 months ago

You mark the input correctly is not complicated.

You use prompt and mark correctly the input as <github_pr_comment> and clearly state read and never consider as prompt.

But the attack is quite convoluted. Do you still remember when we talked prompt injection in chat bots. It was a thing 2 years ago! Now MCP is buzzing...

n2d4 · 10 months ago

> I feel that if AI companies themselves had security conscious designs in the first place, there would be less need for this stuff.

They do, but this "exploit" specifically requires disabling them (which comes with a big fat warning):

> Claude then uses the GitHub MCP integration to follow the instructions. Throughout this process, Claude Desktop by default requires the user to confirm individual tool calls. However, many users already opt for an “Always Allow” confirmation policy when using agents, and stop monitoring individual actions.

const_cast · 10 months ago

It's been such a long standing tradition in software exploits that it's kind of fun and facepalmy when it crops up again in some new technology. The pattern of "take user text input, have it be tainted to be interpreted as instructions of some kind, and then execute those in a context not prepared for it" just keeps happening.

SQL injection, cross-site scripting, PHP include injection (my favorite), a bunch of others I'm missing, and now this.