Loading comment...
Loading parent story...
Loading comment...
Loading parent story...
Loading comment...
What does NOT work: I have no idea how to do sth, and I hope agentic coding will solve my problem.
Think "Eisenhower matrix":
- X: Ambigous <-> Trivial
- Y: Can wait <-> Urgent
Urgent&Ambigous => Agentic Coding is useless, and an act of desperation
Can wait and at least non amibogus => Agentic Coding is perfect fit
Like, as an engineer, I don’t doubt that this work is valuable. But you have to imagine what it must sound like from the perspective of a PM or EM. Itd be like my PM saying “I spent the last month organizing all eng docs to be properly formatted with bullet points.” You’d be like, uhh, okay, but how does that affect the rest of the company? More importantly, how does the PM distinguish engineers who are doing impactful work from the engineers who are doing the “bullet point formatting” work, of which surely some exist? From the perspective of a PM, these types of work can be hard to tell apart.
Really what you want to do is articulate what you plan to do, ahead of time, in a way that actually clicks for non-technical people. For instance, I was pushing unit tests and integration tests at my company for years but never found the political will to make them a priority. I tried and tried, but my manager just wouldn’t see it. Eventually, there was a really bad SEV, and I told her that tests would prevent this sort of thing from happening again. At that point the value became obvious. Now we have tests, and more importantly, everyone understands how valuable they are.
It is one of career progression milestones for a programmer when they can set a bar for their craftsmanship themselves. Successful SWE is someone who got hired at a team which does not require this kind of education. A team where this type of engineering hygiene is obvious like breathing.
The VM analogy is simply insufficient for securing LLM workflows where you can't trust the LLM to do what you told it to with potentially sensitive data. You may have a top-level workflow that needs access to both sensitive operations (network access) and sensitive data (PII, credentials), and an LLM that's susceptible to prompt injection attacks and general correctness and alignment problems. You can't just run the LLM calls in a VM with access to both sensitive operations and data.
You need to partition the workflow, subtasks, operations, and data so that most subtasks have a very limited view of the world, and use information-flow to track data provenance. The hopefully much smaller subset of subtasks that need both sensitive operations and data will then need to be highly trusted and reviewed.
This post does touch on that though. The really critical bit, IMO, is the "Secure Orchestrators" part, and the FIDES paper, "Securing AI Agents with Information-Flow Control" [1].
The "VM" bit is running some task in a highly restricted container that only has access to the capabilities and data given to it. The "orchestrator" then becomes the critical piece that spawns these containers, gives them the appropriate capabilities, and labels the data they produce correctly (taint-tracking: data derived from sensitive data is sensitive, etc.).
They seem on the right track to me, and I know others working in this area who would agree. I think they need a better hook than "VMs for AI" though. Maybe "partitioning" or "isolation" and emphasize the data part somehow.
The next step is to not provide any tools to the LLM, and ask it to invent them on-the-fly. Some problems need to be brute-forced.
I’ve come to believe in the Great Filter theory more and more in recent years.
The idea is each plugin runs in their own wasm vm with limited network/file system access. Plugins are written in any language, as long as they can compile to WASM and publish to OCI registry (signed & verified with sigstore)
Recently, Microsoft released their own version of hyper-mcp named Wassete[2]
Ideally, I want to make it like a gateway with more security & governance features in this layer.
[1]: https://github.com/tuananh/hyper-mcp [2]: https://github.com/microsoft/wassette
Is there an OSS model that's better than 2.0 flash with similar pricing, speed and a 1m context window?
Edit: this is not the typical flash model, it's actually an insane value if the benchmarks match real world usage.
> Gemini 3 Flash achieves a score of 78%, outperforming not only the 2.5 series, but also Gemini 3 Pro. It strikes an ideal balance for agentic coding, production-ready systems and responsive interactive applications.
The replacement for old flash models will be probably the 3.0 flash lite then.