Readit News logoReadit News
hendler commented on A Survey of AI Agent Protocols   arxiv.org/abs/2504.16736... · Posted by u/distalx
anonymousDan · 8 months ago
The hype around agent protocols reminds me of the emperor's new clothes. There's just nothing to it from a technical perspective.
hendler · 8 months ago
HTML was XML for the web. Nothing to it from a technical perspective.
hendler commented on Llama 3.1 405B now runs at 969 tokens/s on Cerebras Inference   cerebras.ai/blog/llama-40... · Posted by u/benchmarkist
zackangelo · a year ago
This is astonishingly fast. I’m struggling to get over 100 tok/s on my own Llama 3.1 70b implementation on an 8x H100 cluster.

I’m curious how they’re doing it. Obviously the standard bag of tricks (eg, speculative decoding, flash attention) won’t get you close. It seems like at a minimum you’d have to do multi-node inference and maybe some kind of sparse attention mechanism?

hendler · a year ago
Check out BaseTen for performant use of GPUs
hendler commented on Llama 3.1 405B now runs at 969 tokens/s on Cerebras Inference   cerebras.ai/blog/llama-40... · Posted by u/benchmarkist
icelancer · a year ago
Groq is legitimate. Cerebras so far doesn't scale (wide) nearly as good as Groq. We'll see how it goes.
hendler · a year ago
Google TPUs, Amazon, a YC funded ASIC/FPGA company, a Chinese Co. all have custom hardware too that might scale well.
hendler commented on Llama 3.1 405B now runs at 969 tokens/s on Cerebras Inference   cerebras.ai/blog/llama-40... · Posted by u/benchmarkist
icelancer · a year ago
I'm a happily-paying customer of Groq but they aren't competitive against Cerebras in the 405b space (literally at all).

Groq has paying customers below the enterprise-level and actually serves all their models to everyone in a wide berth, unlike Cerebras who is very selective, so they have that going for them. But in terms of sheer speed and in the largest models, Groq doesn't really compare.

hendler · a year ago
Is this because 405b doesn't fit on Groq? If they perform better, I would also have liked to have seen.
hendler commented on Vector databases are the wrong abstraction   timescale.com/blog/vector... · Posted by u/jascha_eng
hendler · a year ago
Seems like a nice abstraction.

Since I see DuckDB mentioned, folks wanting serverless may also be interested in LanceDB, written in Rust, with most features built out for Python.

https://lancedb.com/

https://github.com/lancedb/lancedb

Side note, I wrote a proof of concept of embeddings generator being handled inside PostgreSQL, independent of the index.

https://github.com/Hendler/flame

u/hendler

KarmaCake day3787March 21, 2008
About
https://hai.ai
View Original