b4rtazz (u/b4rtazz) - Readit News

As always, take those t/s stats with a huge boulder of salt. The demo shows a question "solved" in < 500 tokens. Still amazing that it's possible, but you'll get nowhere near those speeds when dealing with real-world problems at real-world useful context lengths for "thinking" models (8-16k tokens). Even epyc's with lots of channels go down to 2-4 t/s after ~4096 context length.

b4rtazz · 6 months ago

I checked how it performs in long run (prediction) on 4 x Raspberry Pi 5:

* pos=0 => P 138 ms S 864 kB R 1191 kB Connect

* pos=2000 => P 215 ms S 864 kB R 1191 kB .

* pos=4000 => P 256 ms S 864 kB R 1191 kB manager

* pos=6000 => P 335 ms S 864 kB R 1191 kB the

Posted by u/b4rtazz 6 months ago

Deepseek R1 Distill 8B Q40 on 4 x Raspberry Pi 5 github.com/b4rtaz/distrib...

Posted by u/b4rtazz 9 months ago

Html2llm – Detect Elements on Webpage for LLMs, OmniParser WebAssembly github.com/b4rtaz/html2ll...

Posted by u/b4rtazz 10 months ago

VisioPilot: Free AI Browser Automation Tool with Local LLM Server visiopilot.com/...