Readit News logoReadit News
Shaanveer commented on The path to ubiquitous AI (17k tokens/sec)   taalas.com/the-path-to-ub... · Posted by u/sidnarsipur
aurareturn · 22 days ago
Edit: it seems like this is likely one chip and not 10. I assumed 8B 16bit quant with 4K or more context. This made me think that they must have chained multiple chips together since N6 850mm2 chip would only yield 3GB of SRAM max. Instead, they seem to have etched llama 8B q3 with 1k context instead which would indeed fit the chip size.

This requires 10 chips for an 8 billion q3 param model. 2.4kW.

10 reticle sized chips on TSMC N6. Basically 10x Nvidia H100 GPUs.

Model is etched onto the silicon chip. So can’t change anything about the model after the chip has been designed and manufactured.

Interesting design for niche applications.

What is a task that is extremely high value, only require a small model intelligence, require tremendous speed, is ok to run on a cloud due to power requirements, AND will be used for years without change since the model is etched into silicon?

Shaanveer · 22 days ago
ceo
Shaanveer commented on Bruno Simon – 3D Portfolio   bruno-simon.com/... · Posted by u/razzmataks
tgdn · 3 months ago
Does not work on Chrome, and actually freezes the tab
Shaanveer · 3 months ago
google on linux does not support webgpu. (its hidden behind some flags) https://github.com/gpuweb/gpuweb/wiki/Implementation-Status

u/Shaanveer

KarmaCake day5October 10, 2024View Original