fullstackwife (u/fullstackwife)

fullstackwife commented on Gemini 3 Flash: Frontier intelligence built for speed blog.google/products/gemi... · Posted by u/meetpateltech

fariszr · 2 days ago

These flash models keep getting more expensive with every release.

Is there an OSS model that's better than 2.0 flash with similar pricing, speed and a 1m context window?

Edit: this is not the typical flash model, it's actually an insane value if the benchmarks match real world usage.

> Gemini 3 Flash achieves a score of 78%, outperforming not only the 2.5 series, but also Gemini 3 Pro. It strikes an ideal balance for agentic coding, production-ready systems and responsive interactive applications.

The replacement for old flash models will be probably the 3.0 flash lite then.

fullstackwife · 2 days ago

cost of e2e task resolution should be cheaper, even if single inference cost is higher, you need fewer loops to solve a problem now

Loading parent story...

Loading comment...

Loading parent story...

Loading comment...

fullstackwife commented on Ask HN: What has been your experience with Agentic Coding? · Posted by u/grandimam

fullstackwife · 17 days ago

What works: delegating non ambiguous tasks, let them happen in async, while supported by harness of preexisting automated tests, and established project conventions

What does NOT work: I have no idea how to do sth, and I hope agentic coding will solve my problem.

Think "Eisenhower matrix":

- X: Ambigous <-> Trivial

- Y: Can wait <-> Urgent

Urgent&Ambigous => Agentic Coding is useless, and an act of desperation

Can wait and at least non amibogus => Agentic Coding is perfect fit

fullstackwife commented on How I influence tech company politics as a staff software engineer seangoedecke.com/how-to-i... · Posted by u/facundo_olano

johnfn · 3 months ago

A lot of the frustration I typically hear in this camp is something like “well I shipped a huge refactor that cleaned up all the code, why does no one appreciate that?” One particular interaction that got me thinking was a few years ago listening to an acquaintance telling me how he spent months meticulously cleaning up the data pipeline and making it perfect, and how no one appreciated this work.

Like, as an engineer, I don’t doubt that this work is valuable. But you have to imagine what it must sound like from the perspective of a PM or EM. Itd be like my PM saying “I spent the last month organizing all eng docs to be properly formatted with bullet points.” You’d be like, uhh, okay, but how does that affect the rest of the company? More importantly, how does the PM distinguish engineers who are doing impactful work from the engineers who are doing the “bullet point formatting” work, of which surely some exist? From the perspective of a PM, these types of work can be hard to tell apart.

Really what you want to do is articulate what you plan to do, ahead of time, in a way that actually clicks for non-technical people. For instance, I was pushing unit tests and integration tests at my company for years but never found the political will to make them a priority. I tried and tried, but my manager just wouldn’t see it. Eventually, there was a really bad SEV, and I told her that tests would prevent this sort of thing from happening again. At that point the value became obvious. Now we have tests, and more importantly, everyone understands how valuable they are.

fullstackwife · 3 months ago

Think chefs at top restaurants for example: washing hands is something obvious, no need to get any customer infected with fecal bacteria in order to convince the restaurant management for investing into soap (hygiene takes time, you could serve additional customer!)

It is one of career progression milestones for a programmer when they can set a bar for their craftsmanship themselves. Successful SWE is someone who got hired at a team which does not require this kind of education. A team where this type of engineering hygiene is obvious like breathing.

fullstackwife commented on Things managers do that leaders never would simonsinek.com/stories/5-... · Posted by u/9x39

pclmulqdq · 3 months ago

It's very easy for Simon Sinek to write and speak about leadership when he has never actually done it. Divorced from the messiness of reality, you can write a lot of nice-sounding platitudes.

fullstackwife · 3 months ago

Argumentum ad hominem - seriously, team! we can do better than that!

fullstackwife commented on AI models need a virtual machine blog.sigplan.org/2025/08/... · Posted by u/azhenley

spankalee · 4 months ago

After reading this more closely and following some of the links I think this post is really pointing to something more foundational than the "VM for AI" summary implies.

The VM analogy is simply insufficient for securing LLM workflows where you can't trust the LLM to do what you told it to with potentially sensitive data. You may have a top-level workflow that needs access to both sensitive operations (network access) and sensitive data (PII, credentials), and an LLM that's susceptible to prompt injection attacks and general correctness and alignment problems. You can't just run the LLM calls in a VM with access to both sensitive operations and data.

You need to partition the workflow, subtasks, operations, and data so that most subtasks have a very limited view of the world, and use information-flow to track data provenance. The hopefully much smaller subset of subtasks that need both sensitive operations and data will then need to be highly trusted and reviewed.

This post does touch on that though. The really critical bit, IMO, is the "Secure Orchestrators" part, and the FIDES paper, "Securing AI Agents with Information-Flow Control" [1].

The "VM" bit is running some task in a highly restricted container that only has access to the capabilities and data given to it. The "orchestrator" then becomes the critical piece that spawns these containers, gives them the appropriate capabilities, and labels the data they produce correctly (taint-tracking: data derived from sensitive data is sensitive, etc.).

They seem on the right track to me, and I know others working in this area who would agree. I think they need a better hook than "VMs for AI" though. Maybe "partitioning" or "isolation" and emphasize the data part somehow.

[1]: https://arxiv.org/pdf/2505.23643

fullstackwife · 4 months ago

"Workflow" is the aspect we should try to eliminate, and a LLM+VM combo allows you to do that. Workflow means you provide tools to the LLM, and ask to make use of them to achieve the goal. This works well already, but it fails whenever an unusual problem appears that is not covered by your predefined set of tools. Another issue is that the workflow based approach is always linear even if it is a DAG, or even if you have some kind of loop.

The next step is to not provide any tools to the LLM, and ask it to invent them on-the-fly. Some problems need to be brute-forced.

fullstackwife commented on Ask HN: Do You Believe in Aliens? · Posted by u/otherayden

gaoryrt · 4 months ago

No. I think the appearance of life on Earth was the result of a long series of coincidences. If even a small part of that had been different, life wouldn’t have emerged.

I’ve come to believe in the Great Filter theory more and more in recent years.

fullstackwife · 4 months ago

Just thinking opposite direction: if you were to spawn a new planet with life somewhere in the universe, how would you do that? Micro manage every detail, or rather prepare proper conditions, and see what happens? Wouldn't the "proper conditions" eventually look like series of coincidences?

fullstackwife commented on MCP Gateway and Registry github.com/IBM/mcp-contex... · Posted by u/nikhilk218

tuananh · 4 months ago

I built hyper-mcp[1] as side project with each MCP as a WASM plugin.

The idea is each plugin runs in their own wasm vm with limited network/file system access. Plugins are written in any language, as long as they can compile to WASM and publish to OCI registry (signed & verified with sigstore)

Recently, Microsoft released their own version of hyper-mcp named Wassete[2]

Ideally, I want to make it like a gateway with more security & governance features in this layer.

[1]: https://github.com/tuananh/hyper-mcp [2]: https://github.com/microsoft/wassette

fullstackwife · 4 months ago

For many years we were fine with running DLLs, Java .class deps, npm modules, brew packages etc. why do you think we need so much isolation for left-pad class mcp tools?

fullstackwife commented on How to build a coding agent ghuntley.com/agent/... · Posted by u/ghuntley

fullstackwife · 4 months ago

Exactly my approach to gaining knowledge and learning through building your own(`npx genaicode`). When I was presenting my work on a local meetup I got this exact question: "why u building this instead of just using Cursor". The answer is explained in this article(tl;dr; transformative experience), even though some parts of it are already outdated or will be outdated very soon as the technology is making progress every day.