Readit News logoReadit News
76SlashDolphin commented on Ask HN: What are you working on? (September 2025)    · Posted by u/david927
76SlashDolphin · 3 months ago
Me and a few friends are working on a firewall for LLM clients that blocks the lethal trifecta: https://github.com/Edison-Watch/open-edison

The way it works is the user registers / imports MCP (Model Context Protocol) servers they would like to use. All the tools of those servers are imported and then the firewall uses structured LLM calls to decide what types of action the tool performs among:

- read private data (e.g. read a local file or read your emails)

- perform an activity on your behalf (e.g. send an email or update a calendar invite)

- read public data (e.g. search the web)

The idea is that if all 3 types of tool calls are performed in a single context session, the LLM is vulnerable to jailbreak attacks (e.g. reads personal data -> reads poisoned public data with malicious instructions -> LLM gets tricked and posts personal data).

Once all the tools are classified the user can go inside and make any adjustments and then they are given the option to set up the gateway as an MCP server in their LLM client of choice. For each LLM session the gateway keeps track of all tool calls and, in particular, which action types are raised in the session. If a tool call is attempted that raises all action types for a session, it gets blocked and the user gets a notification, which sends them to the firewall UI where they can see the offending tool calls, and decide to either block the most recent one or add the triggering "set" to an allowlist.

Next steps are transitioning from the web UI for the product to a desktop app with a much cleaner and more streamlined UI. We're still working on improving the UX but the backend is solid and we would really like to get some more feedback for it.

76SlashDolphin commented on Show HN: An MCP Gateway to block the lethal trifecta   github.com/Edison-Watch/o... · Posted by u/76SlashDolphin
datadrivenangel · 3 months ago
So is any combination of MCP servers basically going to require human in the loop approval for everything?

Sounds like it defeats the point.

76SlashDolphin · 3 months ago
That is a fair concern, and while that would happen often in some cases, there are others which rarely export data or rarely read public data where you can manually approve each usecase. Still, we are very interested in seeing how people use MCPs so we can improve the UX, which is why we're publishing this release. If users report that they get too many false positives we can always increase the granularity of the trifecta categories (say, exports data can be exports data publicly or privately. Or reads public data can have different tiers, etc.)
76SlashDolphin commented on Show HN: An MCP Gateway to block the lethal trifecta   github.com/Edison-Watch/o... · Posted by u/76SlashDolphin
pamelafox · 3 months ago
How do you determine if the tools access private data? Is it based solely on their tool description (which can be faked) or by trying them in a sandboxed environment or by analyzing the code?
76SlashDolphin · 3 months ago
It is based on what the MCP server reports to us. As with most current LLM clients we assume that the user has checked the MCP servers they're using for authenticity.
76SlashDolphin commented on Show HN: An MCP Gateway to block the lethal trifecta   github.com/Edison-Watch/o... · Posted by u/76SlashDolphin
sebastiennight · 3 months ago
I'm trying to wrap my head around this:

1. How are you defending against the case of one MCP poisoning your firewall LLM into incorrectly classifying other MCP tools?

2. How would you make sure the LLM shows the warning, as they are non-deterministic?

3. How clear do you expect MCP specs in order for your classification step to be trustworthy? To the best of my knowledge there is no spec that outlines how to "label" a tool for the 3 axes, so you've got another non-deterministic step here. Is "writing to disk" an external comm? It is if that directory is exposed to the web. How would you know?

76SlashDolphin · 3 months ago
Really good questions, let's look at them one by one:

1. We are assuming that the user has done their due diligence verifying the authenticity of the MCP server, in the same way they need to verify them when adding an MCP server to Claude code or VSCode. The gateway protects against an attacker exploiting already installed standard MCP servers, not against malicious servers.

2. That's a very good question - while it is indeed non-deterministic, we have not seen a single case of it not showing the message. Sometimes the message gets mangled but it seems like most current LLMs take the MCP output quite seriously since that is their source of truth about the real world. Also, while the message could in theory not be shown, the offending tool call will still be blocked so the worst case is that the user is simply confused.

3. Currently we follow the trifecta very literally, as in every tool is classified into a subset of {reads private data, writes on behalf of user, reads publicly modifiable data}. We have an LLM classify each tool at MCP server load time and we cache these results based on whatever data the MCP server sends us. If there are any issues with the classification, you can go into the gateway dashboard and modify it however you like. We are planning on making a improvements to the classification down the line but we think it is currently solid enough and we would like to get it into users' hands to get some UX feedback before we add extra functionality.

76SlashDolphin commented on Show HN: An MCP Gateway to block the lethal trifecta   github.com/Edison-Watch/o... · Posted by u/76SlashDolphin
8note · 3 months ago
couldnt the configuring LLM be poisoned by tool descriptions to grant the lethal trifecta to the run time LLM?
76SlashDolphin · 3 months ago
It is possible thay a malicious MCP could poison the LLM's ability to classify it's tools but then your threat model includes adding malicious MCPs which would be a problem for any MCP client. We are considering adding a repository of vetted MCPs (or possibly use one of the existing ones) but, as it is, we rely on the user to make sure that their MCPs are legitimate.
76SlashDolphin commented on Show HN: An MCP Gateway to block the lethal trifecta   github.com/Edison-Watch/o... · Posted by u/76SlashDolphin
bradleybuda · 3 months ago
I think the "lethal trifecta" framing is useful and glad that attempts are being made at this! But there are two big, hard-to-solve problems here:

1. The "lethal trifecta" is also the "productive trifecta" - people want to be able to use LLMs to operate in this space since that's where much of the value is; using private / proprietary data to interact with (do I/O with) the real world.

2. I worry that there will soon be (if not already) a fourth leg to the stool - latent malicious training within the LLMs themselves. I know the AI labs are working on this, but trying to ferret out Manchurian Candidates embedded within LLMs may very well be the greatest security challenge of the next few decades.

76SlashDolphin · 3 months ago
Those are really good points and we do have some plans for them, mainly on the first topic. What we're envisioning in terms of UX for our gateway is that when you set it up it's very defensive but whenever it detects a trifecta, you can mark it as a false positive. Over time the gateway will be trained to be exactly as permissive as the user wishes with only the rare false positive. You can already do that with the gateway today (you get a web notification when the gateway detects a trifecta and if you click into it, you get taken to a menu to approve/deny it if it occurs in the future). Granted, this can make the gateway overly-permissive but we do have plans on how to improve the granularity of these rules.

Regarding the second point, that is a very interesting topic that we haven't thought about. It would seem that our approach would work for this usecase too, though. Currently, we're defending against the LLM being gullible but gullible and actively malicious are not properties that are too different. It's definitely a topic on our radar now, thanks for bringing it up!

76SlashDolphin commented on Show HN: An MCP Gateway to block the lethal trifecta   github.com/Edison-Watch/o... · Posted by u/76SlashDolphin
doctoboggan · 3 months ago
Wouldn't the LLM running in the gateway also be susceptible to the same jailbreaks?
76SlashDolphin · 3 months ago
That's a good question! We do use an LLM to categorise the MCP tools but that is at "add" or "configure" time, not at the time they are called. As such we don't actively run an LLM while the gateway is up, all the rules are already set and requests are blocked based on the hard-set rules. Plus, at this point we don't actually look at the data that is passed around, so even if we change the rules for the trifecta, there's no way for any LLM to be poisoned by a malicious actor feeding bad data.
76SlashDolphin commented on Show HN: An MCP Gateway to block the lethal trifecta   github.com/Edison-Watch/o... · Posted by u/76SlashDolphin
daveguy · 3 months ago
Well, I guess 80-90% protective is better than nothing. Better might be a lock that requires positive confirmation by the user.
76SlashDolphin · 3 months ago
It is possible to configure it like that - when a trifecta is detected, it is possible for the gateway to wait for confirmation before allowing the last MCP call to proceed. The issue with that MCP clients are still in early stages and some of them don't like waiting for a long time until they get a response and act in weird or inconvenient ways if something times out (some of them sensibly disable the entire server if a single tool times out, which in our case disables the entire gateway and therefore all MCP tools). As it is, it's much better to default to returning a block message, and emit a web notification from the gateway dashboard to get the user to approve the usecase, then rerun their previous prompt.
76SlashDolphin commented on Show HN: An MCP Gateway to block the lethal trifecta   github.com/Edison-Watch/o... · Posted by u/76SlashDolphin
noddingham · 3 months ago
Agreed. If someone could help answer the question of "how" I'd appreciate it. I'm currently skeptical but not sure I'm knowledgeable enough to prove myself right or wrong.

But, it just seems to me that some of the 'vulnerabilities' are baked in from the beginning, e.g. control and data being in the same channel AFAIK isn't solvable. How is it possible to address that at all? Sure we can do input validation, sanitization, restrict access, etc. ,etc., and a host of other things but at the end of the day isn't it still non-zero chance that something is exploited and we're just playing whack-a-mole? Not to mention I doubt everyone will define things like "private data" and "untrusted" the same. uBlock tells me when a link is on one of it's lists but I still click go ahead anyways.

76SlashDolphin · 3 months ago
At least in its current state we just use an LLM to categorise each individual tool. We don't look at the data itself, although we have some ideas of how to improve things, as currently it is very "over-defensive". For example, if you have the filesystem MCP and a web search MCP, open-edison will block if you perform a filesystem read, a web search, and then a filesystem write. Still, if you rarely perform writes open-edison would still be useful for tracking things. The UX is such that after an initial block you can make an exception for the same flow the next time it occurs.
76SlashDolphin commented on Show HN: An MCP Gateway to block the lethal trifecta   github.com/Edison-Watch/o... · Posted by u/76SlashDolphin
aaronharnly · 3 months ago
"without risk", "solves", and "Guaranteed" are big words – you might want to temper them.
76SlashDolphin · 3 months ago
Fair criticism! We wrote the Readme earlier on when we were still ironing out the requirements. I'll fix it up shortly.

u/76SlashDolphin

KarmaCake day166February 17, 2022View Original