rbitar (u/rbitar) - Readit News

rbitar commented on Show HN: RatatuiRuby wraps Rust Ratatui as a RubyGem – TUIs with the joy of Ruby ratatui-ruby.dev/... · Posted by u/Kerrick

rbitar · 22 days ago

Fantastic, this looks excellent and excited to try it

rbitar commented on Claude Haiku 4.5 anthropic.com/news/claude... · Posted by u/adocomplete

tosh · 4 months ago

One of the main issues I had with Claude Code (maybe it‘s the harness?) was that the agent tends to NOT read enough relevant code before it makes a change.

This leads to unnecessary helper functions instead of using existing helper functions and so on.

Not sure if it is an issue with the models or with the system prompts and so on or both.

rbitar · 4 months ago

I regularly use @ key to add files to context for tasks I know require edits or patterns I want claude to follow, adds a few extra key strokes but in most cases the quality improvement is worth it

rbitar commented on Claude Haiku 4.5 anthropic.com/news/claude... · Posted by u/adocomplete

Topfi · 4 months ago

Was just about to post that Haiku 4.5 does something I have never encountered before [0], there is a massive delta between token/sec depending on the query. Some variance including task specific is of course nothing new, but never as pronounced and reproducible as here.

A few examples, prompted at UTC 21:30-23:00 via T3 Chat [0]:

Prompt 1 — 120.65 token/sec — https://t3.chat/share/tgqp1dr0la

Prompt 2 — 118.58 token/sec — https://t3.chat/share/86d93w093a

Prompt 3 — 203.20 token/sec — https://t3.chat/share/h39nct9fp5

Prompt 4 — 91.43 token/sec — https://t3.chat/share/mqu1edzffq

Prompt 5 — 167.66 token/sec — https://t3.chat/share/gingktrf2m

Prompt 6 — 161.51 token/sec — https://t3.chat/share/qg6uxkdgy0

Prompt 7 — 168.11 token/sec — https://t3.chat/share/qiutu67ebc

Prompt 8 — 203.68 token/sec — https://t3.chat/share/zziplhpw0d

Prompt 9 — 102.86 token/sec — https://t3.chat/share/s3hldh5nxs

Prompt 10 — 174.66 token/sec — https://t3.chat/share/dyyfyc458m

Prompt 11 — 199.07 token/sec — https://t3.chat/share/7t29sx87cd

Prompt 12 — 82.13 token/sec — https://t3.chat/share/5ati3nvvdx

Prompt 13 — 94.96 token/sec — https://t3.chat/share/q3ig7k117z

Prompt 14 — 190.02 token/sec — https://t3.chat/share/hp5kjeujy7

Prompt 15 — 190.16 token/sec — https://t3.chat/share/77vs6yxcfa

Prompt 16 — 92.45 token/sec — https://t3.chat/share/i0qrsvp29i

Prompt 17 — 190.26 token/sec — https://t3.chat/share/berx0aq3qo

Prompt 18 — 187.31 token/sec — https://t3.chat/share/0wyuk0zzfc

Prompt 19 — 204.31 token/sec — https://t3.chat/share/6vuawveaqu

Prompt 20 — 135.55 token/sec — https://t3.chat/share/b0a11i4gfq

Prompt 21 — 208.97 token/sec — https://t3.chat/share/al54aha9zk

Prompt 22 — 188.07 token/sec — https://t3.chat/share/wu3k8q67qc

Prompt 23 — 198.17 token/sec — https://t3.chat/share/0bt1qrynve

Prompt 24 — 196.25 token/sec — https://t3.chat/share/nhnmp0hlc5

Prompt 25 — 185.09 token/sec — https://t3.chat/share/ifh6j4d8t5

I ran each prompt three times and got (within expected variance meaning less than 5% plus or minus) the same token/sec results for the respective prompt. Each used Claude Haiku 4.5 with "High reasoning". Will continue testing, but this is beyond odd. I will add that my very early evals leaned heavily into pure code output, where 200 token/sec is consistently possible at the moment, but it is certainly not the average as claimed before, there I was mistaken. That being said, even across a wider range of challenges, we are above 160 token/sec and if you solely focus on coding, whether Rust or React, Haiku 4.5 is very swift.

[0] Normally not using T3 Chat for evals, just easier to share prompts this way, though was disappointed to find that the model information (token/sec, TTF, etc.) can't be enabled without an account. Also, these aren't the prompts I usually use for evals. Those I try to keep somewhat out of training by only using paid for API for benchmarks. As anything on Hacker News is most assuredly part of model training, I decided to write some quick and dirty prompts to highlight what I have been seeing.

rbitar · 4 months ago

Interesting and if they are using speculative decoding that variance would make sense. Also your numbers line up with what openrouter is now publishing at 169.1tps [1]

Anthropic mentioned this model is more then twice as fast as claude sonnet 4 [2], which OpenRouter averaged at 61.72 tps for sonnet 4 [3]. If these numbers hold we're really looking at an almost 3x improvement in throughput and less then half the initial latency.

[1] https://openrouter.ai/anthropic/claude-haiku-4.5 [2] https://www.anthropic.com/news/claude-haiku-4-5 [3] https://openrouter.ai/anthropic/claude-sonnet-4

rbitar commented on Claude Haiku 4.5 anthropic.com/news/claude... · Posted by u/adocomplete

Topfi · 4 months ago

Update, Haiku 4.5 is not just very targeted in terms of changes but also really fast. Averaging at 220token/sec is almost double most other models I'd consider comparable (though again, far to early to make a proper judgement) and if this can be kept up, that is a massive value add over other models. That is nearly Gemini 2.5 Flash Lite speed for context.

Yes, we got Groq and Cerebras getting up to 1000token/sec, but not with models that seem comparable (again, early, not a proper judgement). Anthropic has been historically the most consistent in outperforming personal benchmarks vs public benchmarks, for what that is worth so I am optimistic.

If speed, performance and pricing are something Anthropic can keep consistent long term (i.e. no regressions), Haiku 4.5 really is a great option for most coding tasks, with Sonnet something I'd tag in only for very specific scenarios. Past Claude models have had a deficiency in longer term chains of tasks. Beyond 7 minutes roughly, performance does appear to worsen with Sonnet 4.5, as an example. That could be an Achilles heel for Haiku 4.5 as well, if not this really is a solid step in terms of efficiency, but I have not done any longer task testing yet.

That being said, Anthropic once again has a rather severe issue it seems, casting a shadow upon this release. From what I am seeing and others are reporting, Claude Code currently does count Haiku 4.5 usage the same as Sonnet 4.5 usage, despite the latter being significantly more expensive. They also did not yet update the Claude Code support pages to reflect the new models usage limits [0]. I really think such information should be public by launch day and hope they can improve their tooling and overall testing, it really continues to throw a shadow over their impressive models.

[0] https://support.claude.com/en/articles/11145838-using-claude...

rbitar · 4 months ago

Where do you get the 220 token/second? Genuinely curious as that would be very impressive for a model comparable to sonnet 4. OpenRouter currently publishing around 116/tps[1]

[1] https://openrouter.ai/anthropic/claude-haiku-4.5

rbitar commented on Cerebras systems raises $1.1B Series G cerebras.ai/press-release... · Posted by u/fcpguru

rbitar · 4 months ago

Congrats to the team, I'm surprised the industry hasn't been as impressed with their benchmarks on token throughput. We're using the Qwen 3 Coder 480b model and seeing ~2000 tokens/second, which is easily 10-20x faster then most LLM models on the market. Even some of the fastest models still only achieve 100-150 tokens / second (see OpenRouter stats by provider). I do feel after around 300-400 tokens/second the gains in speed feel more incremental, so if there was a model at 300+ tokens/second, I would consider that a very competitive alternative.

rbitar commented on BrowserPod: In-browser full-stack environments for IDEs and Agents via WASM labs.leaningtech.com/blog... · Posted by u/apignotti

rbitar · 4 months ago

Really excited for this product, the industry needs alternatives to WebContainers which has become more restrictive around licensing. Also great to see that non-node runtimes (ruby / python) will be supported. Having said that, really wish this was open-source, even if that meant the OSS version had more limited features then the commercial alternative.

rbitar commented on Cerebras Code cerebras.ai/blog/introduc... · Posted by u/d3vr

rbitar · 6 months ago

This token throughput is incredible and going to set a new bar in the industry. The main issue with the cerebras code plan is that number of requests/minute is throttled, and with agentic coding systems each tool call is treated as new "message" so you can easily hit the api limits (10 messages/minute).

One workaround we're doing now that seems to work is use claude for all tasks but delegate specific tools with cerebras/qwen-3-coder-480b model to generate files or other token heavy tasks to avoid spiking the total number of requests. This has cost and latency consequences (and adds complexity to the code), but until those throttle limits are lifted seems to be a good combo. I also find that claude has better quality with tool selection when the number of tools required is > 15 which our current setup has.

rbitar commented on Ask HN: Who is hiring? (June 2025) · Posted by u/whoishiring

rbitar · 8 months ago

Frontend.co | REMOTE | Full-time & Part-time | https://www.frontend.co

Frontend is building an AI-powered Shopify development platform. We use AI to generate full-stack application built using Next.js / Tailwind for Shopify-connected storefronts.

Experience: 7-10+ years as a full-stack experience with recent experience using TypeScript, Next.js, Tailwindcss, Postgres, and Ruby on Rails. E-commerce development using Shopify GraphQL APIs is a plus. Our tech stack is:

- Next.js - Supabase / Postgres - TypeScript - Tailwind - Ruby on Rails - Shopify Storefront + Admin GraphQL APIs

Send us a note at: info[plus]hn@frontend.co