Readit News logoReadit News
omneity commented on Claudia – Desktop companion for Claude code   claudiacode.com/... · Posted by u/zerealshadowban
mikestaas · 7 days ago
Roo code in vs code, and qwen coder in lm studio is a decent local only combo.
omneity · 7 days ago
Strongly seconding Roo Code. I am using it in VSCodium and it's the perfect partner for a fully local coding workflow (nearly 100% open-source too so no vendor is going to pry it from my hand, "ever").

Qwen Coder 30B is my main driver in this configuration and in my experience is quite capable. It runs at 80 tok/s on my M3 Max and I'm able to use it for about 30-50% of my coding tasks, the most menial ones. I am exploring ways to RL its approach to coding so it fits my style a bit more and it's a very exciting prospect whenever I manage to figure it out.

The missing link is autocomplete since Roo only solves the agent part. Continue.dev does a decent job at that but you really want to pair it with a high performance, large context model (so it fits multiple code sections + your recent changes + context about the repo and gives fast suggestions) and that doesn't seem feasible or enjoyable yet in a fully local setup.

omneity commented on GPT-OSS vs. Qwen3 and a detailed look how things evolved since GPT-2   magazine.sebastianraschka... · Posted by u/ModelForge
7moritz7 · 14 days ago
Qwen3 is substantially better in my local testing. As in, adheres to the prompt better (pretty much exactly for the 32B parameter variant, very impressive) and is more organic sounding.

In simplebench gpt-oss (120 bn) flopped hard so it doesn't appear particularly good at logical puzzles either.

So presumably, this comes down to...

- training technique or data

- dimension

- lower number of large experts vs higher number of small experts

omneity · 14 days ago
Qwen3 32B is a dense model, it uses all its parameters all the time. GPT OSS 20B is a sparse MoE model. This means it only uses a fraction (3.6B) at a time. It’s a tradeoff that makes it faster to run than a dense 20B model and much smarter than a 3.6B one.

In practice the fairest comparison would be to a dense ~8B model. Qwen Coder 30B A3B is a good sparse comparison point as well.

omneity commented on Running GPT-OSS-120B at 500 tokens per second on Nvidia GPUs   baseten.co/blog/sota-perf... · Posted by u/philipkiely
codedokode · 18 days ago
By the way I wonder, what has more performance, a $25 000 professional GPU or a bunch of cheaper consumer GPUs costing $25 000 in total?
omneity · 18 days ago
Consumer GPUs in theory and by a large margin (10 5090s will eat an H100 lunch with 6 times the bandwidth, 3x VRAM and a relatively similar compute ratio), but your bottleneck is the interconnect and that is intentionally crippled to avoid beowulf GPU clusters eating into their datacenter market.

Last consumer GPU with NVLink was the RTX 3090. Even the workstation-grade GPUs lost it.

https://forums.developer.nvidia.com/t/rtx-a6000-ada-no-more-...

omneity commented on My 2.5 year old laptop can write Space Invaders in JavaScript now (GLM-4.5 Air)   simonwillison.net/2025/Ju... · Posted by u/simonw
simonw · a month ago
For this particular model, yes.

This new one from Qwen should fit though - it looks like that only needs ~30GB of RAM: https://huggingface.co/lmstudio-community/Qwen3-30B-A3B-Inst...

omneity · a month ago
It takes ~17-20GB on Q4 depending on context length & settings (running it as we speak)

~30GB in Q8 sure, but it's a minimal gain for double the VRAM usage.

omneity commented on GLM-4.5: Reasoning, Coding, and Agentic Abililties   z.ai/blog/glm-4.5... · Posted by u/GaggiX
littlestymaar · a month ago
Then you need to reprocess the previous conversation from scratch when switching from one provider to another, which sounds very expensive for no reason.
omneity · a month ago
Conversations are always "reprocessed from scratch" on every message you send. LLMs are practically stateless and the conversation is the state, as in nothing is kept in memory between two turns.
omneity commented on Show HN: Price Per Token – LLM API Pricing Data   pricepertoken.com/... · Posted by u/alexellman
yieldcrv · a month ago
the local side of things with an $7,000 - $10,000 machine (512gb fast memory, cpu and disk) can almost reach parity with regard to text input and output and 'reasoning', but lags far behind for multimodal anything: audio input, voice output, image input, image output, document input.

there are no out the box solutions to run a fleet of models simultaneously or containerized either

so the closed source solutions in the cloud are light years ahead and its been this way for 15 months now, no signs of stopping

omneity · a month ago
Would running vLLM in docker work for you, or do you have other requirements?
omneity commented on Qwen3-Coder: Agentic coding in the world   qwenlm.github.io/blog/qwe... · Posted by u/danielhanchen
swyx · a month ago
> there's a good reason that no software engineer you know was writing code with Qwen 2.5.

this is disingenous. there are a bunch of hurdles to using open models over closed models and you know them as well as the rest of us.

omneity · a month ago
Also dishonest since the reason Qwen 2.5 got so popular is not so much paper performance.
omneity commented on Show HN: Any-LLM – Lightweight router to access any LLM Provider   github.com/mozilla-ai/any... · Posted by u/AMeckes
omneity · a month ago
Crazy timing!

I shipped a similar abstraction for llms a bit over a week ago:

https://github.com/omarkamali/borgllm

pip install borgllm

I focused on making it Langchain compatible so you could drop it in as a replacement. And it offers virtual providers for automatic fallback when you reach rate limits and so on.

omneity commented on AI is killing the web – can anything save it?   economist.com/business/20... · Posted by u/edward
LtWorf · a month ago
But without stackoverflow how do you think the AI will be able to reply about next year's new programming language?
omneity · a month ago
Eventually through experience and self-play with the technologies in question.
omneity commented on Robot metabolism: Toward machines that can grow by consuming other machines   science.org/doi/10.1126/s... · Posted by u/XzetaU8
omneity · a month ago
Is it actually growing, or is it swarm-like assembly, such as a coral colony?

u/omneity

KarmaCake day1949September 1, 2018
About
Some of what I build:

https://herd.garden

- MCP servers for your agents powered by your favorite websites (3000 websites/MCP servers and counting)

- And a novel automation framework to control your own browser using a puppeteer-like API

https://sawalni.com

- The first AI to speak Moroccan

- A generative AI platform with custom-trained LLMs, NMTs, LID, semantic embedding and other kinds of models for low-resource languages, notably African and Arabic-related languages.

More & contact: https://omarkama.li/about

I'm always happy to chat so hit me up!

View Original