Readit News logoReadit News
dirk94018 commented on Can I run AI locally?   canirun.ai/... · Posted by u/ricardbejarano
dirk94018 · 3 days ago
We wrote the linuxtoaster inference engine, toasted, and are getting 400 prefill, 100 gen on a M4 Max w 128GB RAM on Qwen3-next-coder 6bit, 8bit runs too. KV caching means it feels snappy in chat mode. Local can work. For pro work, programming, I'd still prefer SOTA models, or GLM 4.7 via Cerebras.
dirk94018 commented on A Unix Manifesto for the Age of AI   linuxtoaster.com/manifest... · Posted by u/dirk94018
dirk94018 · 7 days ago
Author here. AI as pipe, not platform. toast is sed with a brain — reads stdin, writes stdout, honors the Unix contract. We're rewriting the command line around that idea, one tool at a time.

Generation is cheap now. Review is not. The skill that can't be automated is the stopping condition. Knowing what should not exist.

The market wants agents. Agents don't work. A pipe does. If this resonates and you build this way, I'd like to talk.

dirk94018 commented on Nobody gets promoted for simplicity   terriblesoftware.org/2026... · Posted by u/SerCe
dirk94018 · 13 days ago
Simplicity is hard. Mark Twain's 'I would have written less had I had more time' at the end of a letter comes to mind. Software dev's tendency to build castles is great for technical managers who want to own complex systems to gain organizational leverage. Worse is better in this context. Even when it makes people who understand cringe.

You would think that things not breaking should be career-positive for SysAdmins, SREs, and DevOps engineers in a way it cannot be for software devs. But even there simplicity is hard and not really rewarded.

Unix philosophy got this right 50 years ago — small tools, composability, do one thing well. Unix reimagined for AI is my attempt to change that.

dirk94018 commented on Nobody gets promoted for simplicity   terriblesoftware.org/2026... · Posted by u/aamederen
dirk94018 · 13 days ago
Simplicity is hard. Mark Twain's 'I would have written less had I had more time' at the end of a letter comes to mind.

Software dev's tendency to build castles is great for technical managers who want to own complex systems to gain organizational leverage. Worse is better in this context. Even when it makes people who understand cringe.

You would think that things not breaking should be career-positive for SysAdmins, SREs, and DevOps engineers in a way it cannot be for software devs. But even there simplicity is hard and not really rewarded.

Unix philosophy got this right 50 years ago — small tools, composability, do one thing well. Unix reimagined for AI is my attempt to change that.

dirk94018 commented on MacBook Pro with M5 Pro and M5 Max   apple.com/newsroom/2026/0... · Posted by u/scrlk
dirk94018 · 14 days ago
For chat type interactions prefill is cached, prompt is processed at 400tk/s and generation is 100-107tk/s, it's quite snappy. Sure, for 130,000 tokens, processing documents it drops to, I think 60tk/s, but don't quote me on that. The larger point is that local LLMs are becoming useful, and they are getting smarter too.

Deleted Comment

dirk94018 commented on MacBook Pro with M5 Pro and M5 Max   apple.com/newsroom/2026/0... · Posted by u/scrlk
dirk94018 · 14 days ago
On M4 Max 128GB we're seeing ~100 tok/s generation on a 30B parameter model in our from scratch inference engine. Very curious what the "4x faster LLM prompt processing" translates to in practice. Smallish, local 30B-70B inference is genuinely usable territory for real dev workflows, not just demos. Will require staying plugged in though.

Dead Comment

dirk94018 commented on Building an Inference Engine in 1,800 Lines of C++   linuxtoaster.com/blog/toa... · Posted by u/dirk94018
dirk94018 · 14 days ago
Author here. This started because our C inference engine was slower than Python, which was annoying.

We got it to 400 tok/s prefill, 100 tok/s generate, 1,800 lines of C++, no dependencies beyond MLX. Just not redoing work was a 125x improvement.

Favorite moment: the model suggested enabling MetalFX to speed up inference. That's Apple's game graphics upscaler. It makes explosions look better.

AMA about any of it. We are working on the Qwen3.5 models. Local AI is going to get a lot better.

u/dirk94018

KarmaCake day119July 22, 2013
About
Building LinuxToaster — composable AI tools for the terminal. https://linuxtoaster.com
View Original