lanceflt (u/lanceflt)

Deleted Comment

lanceflt commented on Apple M3 Ultra apple.com/newsroom/2025/0... · Posted by u/ksec

bearjaws · 6 months ago

Not sure why you are being downvoted, we already know the performance numbers due to memory bandwidth constraints on the M4 Max chips, it would apply here as well.

525GB/s to 1000GB/s will double the TPS at best, which is still quite low for large LLMs.

lanceflt · 6 months ago

Deepseek R1 (full, Q1) is 14t/s on an M2 Ultra, so this should be around 20t/s

lanceflt commented on Tencent Hunyuan-Large github.com/Tencent/Tencen... · Posted by u/helloericsf

Tepix · 10 months ago

I'm no expert on these MoE models with "a total of 389 billion parameters and 52 billion active parameters". Do hobbyists stand a chance of running this model (quantized) at home? For example on something like a PC with 128GB (or 512GB) RAM and one or two RTX 3090 24GB VRAM GPUs?

lanceflt · 10 months ago

RAM for 4-bit is 1GB per 2 billion parameters. So you will want 256GB RAM and at least one GPU. If you only have one server and one user, it's the full parameter count. (If you have multiple GPUs/servers and many users in parallel, you can shard and route it so you only need the active parameter count per GPU/server. So it's cheaper at scale.)

lanceflt commented on Leak claims RTX 5090 has 600W TGP, RTX 5080 hits 400W tomshardware.com/pc-compo... · Posted by u/quxinxin

kiririn · a year ago

Even the 250W 2080Ti (+150W Intel) is oppressive to be in the same room with during warmer months. I know it probably won't be, but it should be a hard sell in countries that don't have air conditioning as standard. Not to mention the noise needed to cool such heat

lanceflt · a year ago

I'm running a 4090 at 280W, and I'm seeing ~96% of the performance of 450W. There's no need to run it at full power.

lanceflt commented on Have Swiss scientists made a chocolate breakthrough? bbc.co.uk/news/articles/c... · Posted by u/cmsefton

lanceflt · a year ago

This is just an ad for the Swiss chocolate industry. The only people quoted are being funded directly by chocolate manufacturers.

lanceflt commented on Extracting concepts from GPT-4 openai.com/index/extracti... · Posted by u/davidbarker

realPtolemy · a year ago

Indeed, and the very last section about how they’ve now “open sourced” this research is also a bit vague. They’ve shared their research methodology and findings… But isn’t that obligatory when writing a public paper?

lanceflt · a year ago

https://github.com/openai/sparse_autoencoder

They actually open sourced it, for GPT-2 which is an open model.

lanceflt commented on Llama 3-V: Matching GPT4-V with a 100x smaller model and 500 dollars aksh-garg.medium.com/llam... · Posted by u/minimaxir

nomel · a year ago

It's llama 3 training cost + their cost. Meta "kindly" covered the first $700M.

> We add a vision encoder to Llama3 8B

lanceflt · a year ago

They didn't train the vision encoder either, it's unchanged SigLIP by Google.

lanceflt commented on Llama 3-V: Matching GPT4-V with a 100x smaller model and 500 dollars aksh-garg.medium.com/llam... · Posted by u/minimaxir

lanceflt · a year ago

- Llava is not the SOTA open VLM, InternVL-1.5 is https://huggingface.co/spaces/opencompass/open_vlm_leaderboa...

You need to compare the evals to strong open VLMs including this and CogVLM

- This is not "first-ever multimodal model built on top of Llama3", there's already a Llava on Llama3-8b https://huggingface.co/lmms-lab

u/lanceflt

KarmaCake day63May 28, 2024View Original