blixt (u/blixt) - Readit News

blixt commented on The browser is the sandbox aifoc.us/the-browser-is-t... · Posted by u/enos_feedler

blixt · 18 days ago

Since AI became capable of long-running sessions with tool calls, one VM per AI as a service became very lucrative. But I do think a large amount of these can indeed run in the browser, especially all the ones that essentially just want to live-update and execute code, or run shells on top of a mounted file system. You can actually do all of this in the user's browser very efficiently. There are two things you lose though: collaboration (you can do it, but it becomes a distributed problem if you don't have a central server) and working in the background (you need to pause all work while the user's tab is suspended or closed).

So if you can work within the constraints there are a lot of benefits you get as a platform: latency goes down a lot, performance may go up depending on user hardware (usually more powerful than the type of VM you'd use for this), bandwidth can go down significantly if you design this right, and your uptime and costs as a platform will improve if you don't need to make sure you can run thousands of VMs at once (or pay a premium for a platform that does it for you)[1]

All that said I'm not sure trying to put an entire OS or something like WebContainers in the user's browser is the way, I think you need to build a slightly custom runtime for this type of local agentic environment. But I'm convinced it's the best way to get the smoothest user experience and smoothest platform growth. We did this at Framer to be able to recompile any part of a website into React code at 60+ frames per second, which meant less tricks necessary to make the platform both feel snappy and be able to publish in a second.

[1] For big model providers like OpenAI and Anthropic there's an interesting edge they have in that they run a tremendous amount of GPU-heavy loads and have a lot of CPUs available for this purpose.

blixt commented on Open Responses openresponses.org/... · Posted by u/davidbarker

blixt · a month ago

I’ve been building on an opinionated provider-agnostic library in Go[1] for a year now and it’s nice to see standardization around the format given how much variety there is between the providers. Hopefully it won’t just be the OpenAI logo on this though.

[1] https://github.com/flitsinc/go-llms

blixt commented on I hate GitHub Actions with passion xlii.space/eng/i-hate-git... · Posted by u/xlii

blixt · a month ago

I've gotten to a point where my workflow YAML files are mostly `mise` tool calls (because it handles versioning of all tooling and has cache support) and webhooks, and still it is a pain. Also their concurrency and matrix strategies are just not working well, and sometimes you end up having to use a REST API endpoint to force cancel a job because their normal cancel functionality simply does not take.

There was a time I wanted our GH actions to be more capable, but now I just want them to do as little as possible. I've got a Cloudflare worker receiving the GitHub webhooks firehose, storing metadata about each push and each run so I don't have to pass variables between workflows (which somehow is a horrible experience), and any long-running task that should run in parallel (like evaluations) happens on a Hetzner machine instead.

I'm very open to hear of nice alternatives that integrate well with GitHub, but are more fun to configure.

blixt commented on Show HN: Create LLM-optimized random identifiers github.com/blixt/tokeydok... · Posted by u/blixt

anonymoushn · a month ago

what does "logprobs look off" mean

blixt · a month ago

If the immediate next token probabilities are flat, that would mean the LLM is not able to predict the next token with any certainty. This might happen if an LLM is thrown off by out of distribution data, though I haven't personally seen it happen with modern models, so it was mostly a sanity check. But examples from the past that would cause this have been simple things like not normalizing token boundaries in your input, trailing whitespace, etc. And sometimes using very rare tokens AKA "glitch tokens" (https://en.wikipedia.org/wiki/Glitch_token).

blixt commented on Useful patterns for building HTML tools simonwillison.net/2025/De... · Posted by u/simonw

blixt · 2 months ago

One thing I tend to do myself is use https://generator.jspm.io/ to produce an import map once for all base dependencies I need (there's also a CLI), then I can easily copy/paste this template and get a self-contained single-file app that still supports JSX, React, and everything else. Some people may think it's overkill, but for me it's much more productive than document.getElementById("...") everywhere.

I don't have a lot of public examples of this, but here's a larger project where I used this strategy for a relatively large app that has TypeScript annotations for easy VSCode use, Tailwind for design, and it even loads in huge libraries like the Monaco code editor etc, and it all just works quite well 100% statically:

HTML file: https://github.com/blixt/go-gittyup/blob/main/static/index.h...

Main entrypoint file: https://github.com/blixt/go-gittyup/blob/main/static/main.js

blixt commented on The "confident idiot" problem: Why AI needs hard rules, not vibe checks steerlabs.substack.com/p/... · Posted by u/steer_dev

blixt · 2 months ago

Yeah I’ve found that the only way to let AI build any larger amount of useful code and data for a user that does not review all of it requires a lot of “gutter rails”. Not just adding more prompting, because it is an after-the-fact solution. Not just verifying and erroring a turn, because it adds latency and allows the model to start spinning out of control. But also isolating tasks and autofixing output keep the model on track.

Models definitely need less and less of this for each version that comes out but it’s still what you need to do today if you want to be able to trust the output. And even in a future where models approach perfect, I think this approach will be the way to reduce latency and keep tabs on whether your prompts are producing the output you expected on a larger scale. You will also be building good evaluation data for testing alternative approaches, or even fine tuning.

blixt commented on Anthropic acquires Bun bun.com/blog/bun-joins-an... · Posted by u/ryanvogel

blixt · 2 months ago

Extrapolating and wildly guessing, we could end up with using all that mostly idle CPU/RAM (the non-VRAM) on the beefy GPUs doing inference on agentic loops where the AI runs small JS scripts in a sandbox (which Bun is the best at, with its faster startup times and lower RAM use, not to mention its extensive native bindings that Node.js/V8 do not have) essentially allowing multiple turns to happen before yielding to the API caller. It would also go well with Anthropic's advanced tool use that they recently announced. This would be a big competitive advantage in the age of agents.