stephen_cagle (u/stephen_cagle)

stephen_cagle commented on Let's discuss sandbox isolation shayon.dev/post/2026/52/l... · Posted by u/shayonj

pash · 15 days ago

OK, let’s survey how everybody is sandboxing their AI coding agents in early 2026.

What I’ve seen suggests the most common answers are (a) “containers” and (b) “YOLO!” (maybe adding, “Please play nice, agent.”).

One approach that I’m about to try is Sandvault [0] (macOS only), which uses the good old Unix user system together with some added precautions. Basically, give an agent its own unprivileged user account and interact with it via sudo, SSH, and shared directories.

0. https://github.com/webcoyote/sandvault

stephen_cagle · 15 days ago

I use KVM/QEMU on Linux. I have a set of scripts that I use to create a new directory with a VM project and that also installs a debian image for the VM. I have an ./pull_from_vm and ./push_to_vm that I use to pull and push the git code to and from the vm. As well as a ./claude to start claude on the vm and a ./emacs to initialize and start emacs on the vm after syncing my local .spacemacs directory to the vm (I like this because of customized emacs muscle memory and because I worry that emacs can execute arbitrary code if I use it to ssh to the VM client from my host).

I try not to run LLM's directly on my own host. The only exception I have is that I do use https://github.com/karthink/gptel on my own machine, because it is just too damn useful. I hope I don't self own myself with that someday.

stephen_cagle commented on I baked a pie every day for a year theguardian.com/lifeandst... · Posted by u/NaOH

wewtyflakes · 16 days ago

What does that have to do with the article? It is not about 'the most life changing thing for everyone', it was what was life changing for her.

stephen_cagle · 16 days ago

Nothing at all. Just a comment on the internet. Taking a walk AND and baking a pie is even better.

I'm just making a slight point that walking is probably the simplest most effective thing you can do to improve almost every aspect of your life.

stephen_cagle commented on I baked a pie every day for a year theguardian.com/lifeandst... · Posted by u/NaOH

stephen_cagle · 16 days ago

Not to take anything from any other activity that someone embraces, but I imagine that for the majority of people in the developed world, taking a 1 hour walk every day would be the most "life changing" thing you could do.

stephen_cagle commented on Claws are now a new layer on top of LLM agents twitter.com/karpathy/stat... · Posted by u/Cyphase

throwaway13337 · 21 days ago

This is the future we need to make happen.

I would love to subscribe to / pay for service that are just APIs. Then have my agent organize them how I want.

Imagine youtube, gmail, hacker news, chase bank, whatsapp, the electric company all being just apis.

You can interact how you want. The agent can display the content the way you choose.

Incumbent companies will fight tooth and nail to avoid this future. Because it's a future without monopoly power. Users could more easily switch between services.

Tech would be less profitable but more valuable.

It's the future we can choose right now by making products that compete with this mindset.

stephen_cagle · 21 days ago

Biggest question I have is maybe... just maybe... LLM's would have had sufficient intelligence to handle micropayments. Maybe we might not have gone down the mass advertising "you are the product" path?

Like, somehow I could tell my agent that I have a $20 a month budget for entertainment and a $50 a month budget for news, and it would just figure out how to negotiate with the nytimes and netflix and spotify (or what would have been their equivalent), which is fine. But would also be able to negotiate with an individual band who wants to directly sell their music, or a indie game that does not want to pay the Steam tax.

I don't know, just a "histories that might have been" thought.

stephen_cagle commented on IRS lost 40% of IT staff, 80% of tech leaders in 'efficiency' shakeup theregister.com/2026/02/1... · Posted by u/freitasm

conductr · 23 days ago

Captured revenue : cost to capture (could be an audit, billing for interest/fees due, etc. lots of avenues to capture revenue that is being missed).

The problem is these metrics aren't really scalable productivity metrics. If you doubled cost, it might go to 100:1, if you tripled cost, it might go to 0.5:1

Each dollar generally gets more expensive to capture.

stephen_cagle · 23 days ago

Good point, and kind of interesting in that as we keep cutting funding to the IRS, this ratio will probably get wider (which looks good, but is actually bad for what it implies).

stephen_cagle commented on Gemini 3.1 Pro blog.google/innovation-an... · Posted by u/MallocVoidstar

menaerus · 23 days ago

I don't know ... as of now I am literally instructing it to solve the chained expression computation problem which incurs a lot of temporary variables, of which some can be elided by the compiler and some cannot. Think linear algebra expressions which yield a lot of intermediate computations for which you don't want to create a temporary. This is production code and not an easy problem.

And yet it happily told me what I exactly wanted it to tell me - rewrite the goddamn thing using the (C++) expression templates. And voila, it took "it" 10 minutes to spit out the high-quality code that works.

My biggest gripe for now with Gemini is that Antigravity seems to be written by the model and I am experiencing more hiccups than I would like to, sometimes it's just stuck.

stephen_cagle · 23 days ago

Can't argue with that, I'll move my Bayesian's a little in your direction. With that said, are most other models able to do this? Also, did it write the solution itself or use a library like Eigen?

I have noticed that LLM's seem surprisingly good at translating from one (programming) language to another... I wonder if transforming a generic mathematical expression into an expression template is a similar sort of problem to them? No idea honestly.

stephen_cagle commented on Gemini 3.1 Pro blog.google/innovation-an... · Posted by u/MallocVoidstar

neves · 23 days ago

Gemini interesting with Google software gives me the best feature of all LLMs. When I receive a invite for an event, I screenshot it, share with Gemini app and say: add to my Calendar.

It's not very complex, but a great time saver

stephen_cagle · 23 days ago

Yeah, as evidenced by the birds (above), I think it is probably the best vision model at this time. That is a good idea, I should also use it for business cards as well I guess.

stephen_cagle commented on IRS lost 40% of IT staff, 80% of tech leaders in 'efficiency' shakeup theregister.com/2026/02/1... · Posted by u/freitasm

munk-a · 23 days ago

Defunding the IRS is nothing but an effort to reduce tax enforcement. People that have relatively straightforward finances can be trivially audited in a formulaic way with data that's on hand - a lack of human auditing resources tends to benefit those with more complex finances which also tend to be the people with a lot of money who can afford to lobby for less enforcement funding.

Also for reference, in 2024 the IRS had a rate of return of 415:1, they'll obviously target the lowest hanging fruit first but for every dollar of funding received they collected 415 dollars of tax revenue that would have been missed. This is an obscenely efficient organization.

stephen_cagle · 23 days ago

Is that 415:1 the rate of return of an audit, or the expense:revenue ratio of the IRS as a whole? I remember hearing some time ago that the expense ratio was 11% for the IRS? But 415:1 is way way less than 11%.

stephen_cagle commented on Gemini 3.1 Pro blog.google/innovation-an... · Posted by u/MallocVoidstar

spankalee · 23 days ago

I hope this works better than 3.0 Pro

I'm a former Googler and know some people near the team, so I mildly root for them to at least do well, but Gemini is consistently the most frustrating model I've used for development.

It's stunningly good at reasoning, design, and generating the raw code, but it just falls over a lot when actually trying to get things done, especially compared to Claude Opus.

Within VS Code Copilot Claude will have a good mix of thinking streams and responses to the user. Gemini will almost completely use thinking tokens, and then just do something but not tell you what it did. If you don't look at the thinking tokens you can't tell what happened, but the thinking token stream is crap. It's all "I'm now completely immersed in the problem...". Gemini also frequently gets twisted around, stuck in loops, and unable to make forward progress. It's bad at using tools and tries to edit files in weird ways instead of using the provided text editing tools. In Copilot it, won't stop and ask clarifying questions, though in Gemini CLI it will.

So I've tried to adopt a plan-in-Gemini, execute-in-Claude approach, but while I'm doing that I might as well just stay in Claude. The experience is just so much better.

For as much as I hear Google's pulling ahead, Anthropic seems to be to me, from a practical POV. I hope Googlers on Gemini are actually trying these things out in real projects, not just one-shotting a game and calling it a win.

stephen_cagle · 23 days ago

I also worked at Google (on the original Gemini, when it was still Bard internally) and my experience largely mirrors this. My finding is that Gemini is pretty great for factual information and also it is the only one that I can reliably (even with the video camera) take a picture of a bird and have it tell me what the bird is. But it is just pretty bad as a model to help with development, myself and everyone I know uses Claude. The benchmarks are always really close, but my experience is that it does not translate to real world (mostly coding) task.

tldr; It is great at search, not so much action.

stephen_cagle commented on Show HN: Micasa – track your house from the terminal micasa.dev... · Posted by u/cpcloud

stephen_cagle · 23 days ago

I think this is neat. I use org-mode for pretty much everything, which has all of these features I think, but sometimes there is nothing more motivating than a quick responsive UI to actually do something. This looks motivational.

My only pushback is using sqlite. I am a big fan of just using simple (structured) text files that can be edited by hand when needed. Your computer is more than capable of doing all the joining/querying/aggregating/whatever with the text file itself rather than relying on a database. I personally find these sort of file structures comforting as it means they can be easily modified in unsupported ways.