jhoho (u/jhoho) - Readit News

ktallett · 3 months ago

I found onlyoffice a good replacement. It is certainly better looking.

jhoho · 3 months ago

It's really beautiful but I stopped using it because of its opaque ties with Russia. https://en.wikipedia.org/wiki/OnlyOffice#Organization

Posted by u/jhoho 4 months ago

History of Diffusion – Sander Dieleman [video]youtube.com/watch?v=ktPGN...

Posted by u/jhoho 4 months ago

History of Diffusion – Yang Song [video]youtube.com/watch?v=ud6z5...

jhoho commented on Show HN: Real-time AI Voice Chat at ~500ms Latency github.com/KoljaB/Realtim... · Posted by u/koljab

sabellito · 4 months ago

Every time I see these things, they look cool as hell, I get excited, then I try to get them working on my gaming PC (that has the GPU), I spend 1-2h fighting with python and give up.

Today's issue is that my python version is 3.12 instead of <3.12,>=3.9. Installing python 3.11 from the official website does nothing, I give up. It's a shame that the amazing work done by people like the OP gets underused because of this mess outside of their control.

"Just use docker". Have you tried using docker on windows? There's a reason I never do dev work on windows.

I spent most of my career in the JVM and Node, and despite the issues, never had to deal with this level of lack of compatibility.

jhoho · 4 months ago

Let me introduce you to the beautiful world of virtual environments. They save you the headache of getting a full installation to run, especially when using Windows.

I prefer miniconda, but venv also does the job.

Posted by u/jhoho 6 months ago

History of Diffusion – Jascha Sohl-Dickstein [video]youtube.com/watch?v=VpYNl...

jhoho commented on AMD's game-changing Strix Halo, formerly Ryzen AI Max, poses for new die shots tomshardware.com/pc-compo... · Posted by u/rbanffy

chipsa · 7 months ago

More RAM, so less movement of the weights around to generate a token. Most of the speed limit on a LLM is bandwidth of getting the weights around. To a great extent, your token speed is approximately your (model size)/(effective bandwidth). If you need to shuffle the weights into VRAM from main RAM, you halve your speed (bandwidth used both to move into VRAM and out). If you need to pull the weights from disk, even worse.

jhoho · 7 months ago

While true, the benchmarks are not run on the Ryzen's NPU but the much stronger GPU.

jhoho commented on AMD's game-changing Strix Halo, formerly Ryzen AI Max, poses for new die shots tomshardware.com/pc-compo... · Posted by u/rbanffy

Havoc · 7 months ago

>AMD also claims its Strix Halo APUs can deliver 2.2x more tokens per second than the RTX 4090 when running the Llama 70B LLM (Large Language Model) at 1/6th the TDP (75W).

That if true is wild.

jhoho · 7 months ago

It's because of the bigger VRAM - 70B parameters don't fit into the 4090's 24GB.

jhoho commented on AMD CPU, Apple M4 Pro Performance – Ryzen AI Max Review [video] youtube.com/watch?v=v7HUu... · Posted by u/jhoho

jhoho · 7 months ago

LLM benchmarks at 13:20 https://youtu.be/v7HUud7IvAo?feature=shared&t=800

u/jhoho

KarmaCake day112October 22, 2019View Original