skryl (u/skryl) - Readit News

skryl commented on Cerebras achieves 2,500T/s on Llama 4 Maverick (400B) cerebras.ai/press-release... · Posted by u/ByteAtATime

skryl · 3 months ago

Performance per watt is better than h100 and b200, performance per watt per $ is worse than B200, and it does fp8 just fine

https://arxiv.org/pdf/2503.11698

skryl · 3 months ago

One caveat is that this paper only covers training, which can be done on a single CS-3 using external memory (swapping weights in and out of SRAM). There is no way that a single CS-3 will hit this record inference performance with external memory so this was likely done with 10-20 CS-3 chips and the full model in SRAM. Definitely can’t compare token/$ with that kind of setup vs a DGX.

skryl commented on Cerebras achieves 2,500T/s on Llama 4 Maverick (400B) cerebras.ai/press-release... · Posted by u/ByteAtATime

ryao · 3 months ago

> At over 2,500 t/s, Cerebras has set a world record for LLM inference speed on the 400B parameter Llama 4 Maverick model, the largest and most powerful in the Llama 4 family.

This is incorrect. The unreleased Llama 4 Behemoth is the largest and most powerful in the Llama 4 family.

As for the speed record, it seems important to keep it in context. That comparison is only for performance on 1 query, but it is well known that people run potentially hundreds of queries in parallel to get their money out of the hardware. If you aggregate the tokens per second across all simultaneous queries to get the total throughput for comparison, I wonder if it will still look so competitive in absolute performance.

Also, Cerebras is the company that not only was saying that their hardware was not useful for inference until some time last year, but even partnered with Qualcomm with the claim that Qualcomm’s accelerators had a 10x price performance improvement over their things:

https://www.cerebras.ai/press-release/cerebras-qualcomm-anno...

Their hardware does inference with FP16, so they need ~20 of their CSE-3 chips to run this model. Each one costs ~$2 million, so that is $40 million. The DGX B200 that they used for their comparison costs ~$500,000:

https://wccftech.com/nvidia-blackwell-dgx-b200-price-half-a-...

You only need 1 DGX B200 to run Llama 4 Maverick. You could buy ~80 of them for the price it costs to buy enough Cerebras hardware to run Llama 4 Maverick.

Their latencies are impressive, but beyond a certain point, throughput is what counts and they don’t really talk about their throughput numbers. I suspect the cost to performance ratio is terrible for throughput numbers. It certainly is terrible for latency numbers. That is what they are not telling people.

Finally, I have trouble getting excited about Cerebras. SRAM scaling is dead, so short of figuring out how to 3D stack their wafer scale chips, during fabrication at TSMC, or designing round chips, they have a dead end product since it relies on using an entire wafer to be able to throw SRAM at problems. Nvidia, using DRAM, is far less reliant on SRAM and can use more silicon for compute, which is still shrinking.

skryl · 3 months ago

Performance per watt is better than h100 and b200, performance per watt per $ is worse than B200, and it does fp8 just fine

https://arxiv.org/pdf/2503.11698

skryl commented on Ask HN: Who is hiring? (February 2016) · Posted by u/whoishiring

skryl · 10 years ago

Trusted (http://usetrusted.com) | San Francisco | Onsite, Fulltime | $100-$150k, 0.5-1.0% equity

Contact: alex@usetrusted.com

Trusted alleviates the pain parents face in discovering, scheduling and paying for high quality, vetted child care.

We are a small team working on transforming the child care industry and helping countless parents in the process. We care deeply about the quality of the service we provide but we also pride ourselves on the wellbeing and happiness of our team. Our day to day usually involves a standup around 10am, a few 10 minute exercise breaks throughout the day, and we normally tie things up between 6pm and 7pm.

We're looking for an experienced front-end engineer to lead client-side Javascript development and grow both our internal and customer facing web clients. Because of the small size of our team, we love engineers who feel comfortable across the whole stack but specialize in something they love!

Skills We Are Looking For:

  * 5+ Years of client-side Javascript development 
  * Deep knowledge of React, Angular, Backbone, or another client-side framework
  * Experience with UI/UX testing

Bonus:

  * Design chops
  * A portfolio which showcases your previous work 
  * A Github account with cool projects in it 
  * Experience with server-side technologies (Ruby, Python, PHP, etc)
  * Mobile development experience

skryl commented on A useful Caps Lock key brettterpstra.com/2012/12... · Posted by u/rbcoffee

skryl · 12 years ago

If you're looking for the rest of the private.xml file (HYPER + H/J/K/L) mappings...

https://gist.github.com/skryl/8143550

skryl commented on Guess You Thought I Was Someone To Mess With georgiaweidman.com/wordpr... · Posted by u/teaspoon

skryl · 12 years ago

Thanks for writing this. If anyone else is ever in a similar situation, please do your best to get out of the room. Even if you think your attacker might be hurt and is no longer restraining you, just get out. Get out and THEN call someone. Knock on doors, whatever... if you don't have your phone. Staying put and waiting for the attacker to leave is a BAD idea, even if you get a chance to use a phone.

skryl commented on AsK HN: Why can't the US change to the metric system? · Posted by u/robomartin

skryl · 12 years ago

For the same reason that we're all still on 12 months, 30ish days, 24h, 60m, 60s, 1000ms time.

skryl commented on Show HN: The XKCD Knapsack Solver xkcd287.herokuapp.com... · Posted by u/skryl

mschuster91 · 12 years ago

How are you supposed to use the site? Nothing is draggable in Google Chrome (dev latest)

skryl · 12 years ago

Works for me in Chrome.