Readit News logoReadit News
paxys · a year ago
Benchmarks - https://www.reddit.com/r/LocalLLaMA/comments/1h85ld5/comment...

Seems to perform on par with or slightly better than Llama 3.2 405B, which is crazy impressive.

Edit: According to Zuck (https://www.instagram.com/p/DDPm9gqv2cW/) this is the last release in the Llama 3 series, and we'll see Llama 4 in 2025. Hype!!

state_less · a year ago
I'm getting 2.12 tok/s[1] on a 24GB (4090) GPU and 64GB (7950x) CPU memory, splitting the model across the GPU and CPU (40/80 layers on GPU) with lm-studio. Output looks good so far, I can use something like this for a query that I want as good an answer as possible and that I don't want to send out on the network.

If we can get better quantization, or bigger GPU memory footprints, we might be able to use these big models locally for solid coding assistants. That's what I think we have to look forward to (among other benefits) in the year(s) ahead.

1. lmstudio-community/Llama-3.3-70B-Instruct-GGUF/Llama-3.3-70B-Instruct-Q4_K_M.gguf

Me1000 · a year ago
The 32B parameter model size seems like the sweet spot right now, imho. It's large enough to be very useful (Qwen 2.5 32B and the Coder variant our outstanding models), and they run on consumer hardware much more easily than the 70B models.

I hope Llama 4 reintroduces that mid sized model size.

Sharlin · a year ago
A question: How large LLMs can be run at reasonable speed on 12GB (3060), 32GM RAM? How much does quantization impact output quality? I've worked with image models (SD/Flux etc) quite a bit, but haven't yet tried running a local LLM.
pmarreck · a year ago
How do you measure tokens/sec? Here's my attempt on a new M4 Max 128GB, does about 6 words/sec:

    bash> time ollama run llama3.3 "What's the purpose of an LLM?" | tee ~/Downloads/what\ is\ an\ LLM.txt
    A Large Language Model (LLM) is a type of artificial intelligence (AI) designed to process and understand human language. The primary purposes of an LLM are:
(... contents excerpted for brevity)

    Overall, the purpose of an LLM is to augment human capabilities by providing a powerful tool for understanding, generating, and interacting with human language.

    real  0m59.040s
    user  0m0.071s
    sys 0m0.081s

    pmarreck  59s35ms
    20241206220629 ~ bash> wc -w Downloads/what\ is\ an\ LLM.txt
        359 Downloads/what is an LLM.txt

cjbprime · a year ago
Any opinion on whether the q4 quantization is stable/effective? That's a lot of quantization.

Edit: Perhaps answering my own question:

λ ollama run hf.co/lmstudio-community/Llama-3.3-70B-Instruct-GGUF:Q4_K_M

>>> Hi. Who are you?

I'm an artificial intelligence model known as Llama. Llama stands for "Large Language Model Meta AI."

kristianp · a year ago
Can llama.cpp make use of the gpu built into the 7950x CPU? I assume that would improve performance.
85392_school · a year ago
FYI, due to Llama's naming scheme, there is no such thing as Llama 3.2 405B. 8B/70B/405B models are either Llama 3, 3.1, or 3.3 (except for 405B which wasn't initially released), while Llama 3.2 only contains 1B, 3B, 11B (vision), and 90B (vision) models. It's a bit confusing.
paxys · a year ago
Ah, so I guess the comparison is to Llama 3.1 405B.
blueboo · a year ago
It could be worse. It could’ve been Llama 3.1 (New)
yieldcrv · a year ago
yeah I use Llama 3.2 3B and I'm blown away

but also wrestled with this mentally.

Meta both improves the technology or inference, while also trapping themselves alongside every other person training models to always update the training set every few months, so it knows what its talking about with relevant current events

Lerc · a year ago
Given how close it is to 405B in performance it would be interesting to see which has the edge comparing an unquantized 3.3-70B against 405B quantized to be the same size.
vletal · a year ago
That would be 1.38 bits per weight on average, which I can confidently guess would not perform well.
swalsh · a year ago
It's kind of amazing how there seems to be a wall where sizing up the model starts to diminish in terms of intelligence gains. I guess that's why we can still compete with whales even though their brains are like twice as big as ours.
int_19h · a year ago
I have tried 405B at 1-bit quantization. It remains coherent, but didn't seem to be any better than 3.1-70B.
ben30 · a year ago
This reminds me of Steve Jobs's famous comment to Dropbox about storage being 'a feature, not a product.' Zuckerberg - by open-sourcing these powerful models, he's effectively commoditising AI while Meta's real business model remains centred around their social platforms. They can leverage these models to enhance Facebook and Instagram's services while simultaneously benefiting from the community improvements and attention. It's not about selling AI; it's about using AI to strengthen their core business. By making it open, they get the benefits of widespread adoption and development without needing to monetise the models directly.
lolinder · a year ago
Also don't underestimate the value they're getting from making more overtures to the developer community. It could be a coincidence, but it's only since they started releasing these models that I started noticing people on HN calling them "Meta", and attitudes towards them have been far more positive of late than usual.

Good will isn't worth as much as cheap moderation automation and fancy features, but it's worth something.

yodsanklai · a year ago
> Also don't underestimate the value they're getting from making more overtures to the developer community.

I wonder if it's significant. As developers, we're biased to think it matters, but in the grand scheme of things, 99.99% of people don't have a clue about open source or things that matter to hackers. As far as recruitment go, developers look primarily at how much they make, possibly the tech and how it looks on resume. There's always been a stigma around social networks and generally big tech companies, but not to the point it's going to hurt them.

LordDragonfang · a year ago
It's funny how quickly Zuck managed to turn his image around from "data-stealing actual lizard person" to "kind of normal guy" with a few years and a haircut. It's also not lost on me that he's the only "hacker" major tech CEO remaining:

   - Sundar is a glorified bean counter and his company is rotting from the inside, only kept afloat by the money printer that is ads.
   - Satya and Microsoft are in a similar boat, with the only major achievement being essentially buying OpenAI while every other product gets worse
   - Tim Cook is doing good things with Apple, but he still runs the company more like a fashion company than a tech company
   - Amazon was always more about logistics than cool hack value, and that hasn't changed since Bezos left
   - Elon is Elon
Meanwhile Zuck is spending shareholder money pushing forward consumer VR because he thinks it's cool, demoing true AR glasses, releasing open-source models, and building giant Roman-style statues of his wife.

signal11 · a year ago
Facebook Engineering has always been well regarded — starting with React on the front end, but also projects like Open Compute.

Their product management on the other hand— well, I mean, Facebook and Instagram are arguably as popular as McDonald’s. So they’ve got that going for them.

talldayo · a year ago
It's funny. The only time I've ever seen Hacker News unanimously applaud a Facebook product was when Pytorch announced they merged Apple Silicon support. Seems like Mr. Zuckerburg knows how to play a winning hand.
swalsh · a year ago
I call them OpenAI instead of Meta.
ecocentrik · a year ago
It would be strange if they didn't also use these models to generate much more sophisticated models of their user's interests and hyper-targeted advertising that always looks and feels like trusted friend recommendations for the exact product that's been missing from your life.
huijzer · a year ago
I already was thinking for a while what the business model of open source was exactly. Why does Google spend money on Chrome also? After Zuckerberg’s comments it hit me:

Open source is useful for a business if it can either increase revenue or decrease costs.

Examples:

Increase revenue: Chrome and Visual Studio code. For example, the more people code, the more likely it is that they pay MSFT. So VS code aims to make programming as attractive as possible. Similar for Chrome.

Decrease costs: Linux and Llama. As Zuckerburg said himself IIRC, they don’t want one party snowball into an LLM monopoly so they rather help to get the open source ball rolling.

rafaelmn · a year ago
I think Balmers "developers, developers, developers" meme has been around longer than some people here are alive - it served them well in the Windows era and it serves them well in the cloud space.
Spooky23 · a year ago
Exactly. Microsoft wants lots of cash for Copilot, meanwhile, we had Code Llama running with 150 developers before the Microsoft idiots could schedule a meeting.
barbazoo · a year ago
> For example, the more people code, the more likely it is that they pay MSFT. So VS code aims to make programming as attractive as possible

How does that increase revenue in a remotely measurable way?

Chrome, for sure, high market share, default search engine, more money, at least that's how I imagine it.

nimish · a year ago
Maybe Zuck just wants to see cool shit. It's not like he needs the money
petercooper · a year ago
Commoditize your complement: https://gwern.net/complement
muixoozie · a year ago
>storage being 'a feature, not a product.

Somewhat unrelated mini-rant. Upgraded a phone recently after about 3 years. Surprised to see storage still capped around 128GB (in-general). That's got to be artificially held back capacity to push cloud storage services?

viraptor · a year ago
There's lots of phones with more and/or with SD slots. It's not really "capped" as much as default size that seems to work just fine for the majority.
kstrauser · a year ago
I’ve got a 512GB phone with 112GB used. I’ve put absolutely no effort whatsoever into keeping that number done, and I’m not shy about downloading stuff to it.

I’m certain plenty of people need way more than 128GB. I figured I’d be one of them when I bought this. Nope. I bought a much bigger device than I actually needed.

If I’ve used less than 128GB, I’ve gotta think most other people do too. Not all, clearly! But most? I’d bet on it.

Spooky23 · a year ago
They’ve dramatically improved their ad quality. I routinely check out on convert on Facebook and Instagram ads, and I can honestly say I never intentionally clicked on one for 20 years or more, once.
rafaelmn · a year ago
Ironically gen AI made their products worse more than the rest. I can't believe the amount of AI slop I see every time I open Facebook - I'd check it occasionally when replying to messenger and scroll through for a while, after seeing the AI spam I don't even bother.
paxys · a year ago
That was going to happen regardless. From Meta's perspective it's better for their platforms to contain their own AI slop than OpenAI's.
jazzyjackson · a year ago
truly I wonder if they're fooled by their own click fraud, or, if the incentives really do work out that they get paid whether the engagement is from bots or people, but anyway I came here to say the same thing, it's shocking to me how enthusiastic Zuckerburg is about generative AI, what other possible outcome is there except actual human content creation being replaced by slop
barbazoo · a year ago
Do the improvements the community proposes/makes to their models amount to anything significant? For a company like Meta with basically infinite money, do they really benefit from external help?
lolinder · a year ago
I don't have eyes inside of Meta, but keep in mind that we're not just talking about fine-tunes and LoRAs, we're also talking about the entire llama ecosystem, including llama.cpp, ollama, and llamafile. These would not exist (or wouldn't have anything like as much momentum) without Meta's investment, but they're now huge community projects.

I don't know if they use them internally, of course, but they could, and they represent a lot of work.

andy_ppp · a year ago
Baggy Tees, gold chains and now this!? Make this man president immediately!
LorenDB · a year ago
Seems to be more or less on par with GPT-4o across many benchmarks: https://x.com/Ahmad_Al_Dahle/status/1865071436630778109
rvnx · a year ago
Except it is 25x cheaper, available offline, can be uncensored / unaligned, fine-tuneable and backupable.

Sad day for OpenAI. Great for humanity.

madars · a year ago
What are good starting points for uncensoring it? Because it is offline a jailbreak prompt can't be remote-bricked but can one remove censorship from the weights themselves? What does it do to accuracy?
stainablesteel · a year ago
zuck is really on his redemption arc, he's out-doing himself
m3kw9 · a year ago
How said? Tiny violin sad. The typical consumer which likely is 99% of them is not going to use this
Kiro · a year ago
How do you calculate the price?
jug · a year ago
This year seems to finish on the same note as it began -- that most AI evolution happens in the smaller models. There's been a true shift as corporations have started to realize the value of training data and massively outsizing the resulting model size.
usaar333 · a year ago
Which GPT-4O are those?

The 08-06 release seems to be a bit higher on numerous benchmarks than what that shows: https://github.com/openai/simple-evals?tab=readme-ov-file#be...

griomnib · a year ago
This just makes the $200/month even more laughable.
ttul · a year ago
The $200 plan is for people who would pay $200 for a bottle of vodka even though the $20 bottle is molecularly identical.
afro88 · a year ago
How? 4o is part of the plus plan, as is o1.
zxvkhkxvdvbdxz · a year ago
But its $200/month smart!
namlem · a year ago
o1-pro is way smarter still
freediver · a year ago
Does unexpectedly well on our benchmark:

https://help.kagi.com/kagi/ai/llm-benchmark.html

Will dive into it more, but this is impressive.

stavros · a year ago
I asked it:

> I have a sorcerer character on D&D 5e and I've reached level 6. What do I get?

It confabulated a bunch of stuff. I also asked GPT-4, it confabulated a bit. Claude was spot on.

profsummergig · a year ago
Please help me understand something.

I've been out of the loop with HuggingFace models.

What can you do with these models?

1. Can you download them and run them on your Laptop via JupyterLab?

2. What benefits does that get you?

3. Can you update them regularly (with new data on the internet, e.g.)?

4. Can you finetune them for a specific use case (e.g. GeoSpatial data)?

5. How difficult and time-consuming (person-hours) is it to finetune a model?

(If HuggingFace has answers to these questions, please point me to the URL. HuggingFace, to me, seems like the early days of GitHub. A small number were heavy users, but the rest were left scratching their heads and wondering how to use it.)

Granted it's a newbie question, but answers will be beneficial to a lot of us out there.

joshhart · a year ago
Hi,

Yes you can. The community creates quantized variants of these that can run on consumer GPUs. A 4-bit quantization of LLAMA 70b works pretty well on Macbook pros, the neural engine with unified CPU memory is quite solid for these. GPUs is a bit tougher because consumer GPU RAM is still kinda small.

You can also fine-tune them. There are lot of frameworks like unsloth that make this easier. https://github.com/unslothai/unsloth . Fine-tuning can be pretty tricky to get right, you need to be aware of things like learning rates, but there are good resources on the internet where a lot of hobbyists have gotten things working. You do not need a PhD in ML to accomplish this. You will, however, need data that you can represent textually.

Source: Director of Engineering for model serving at Databricks.

Deleted Comment

vtail · a year ago
Thank you Josh. Is there a resource you can point us too that helps answer "what kind of MacBook pro memory do I need to run ABC model at XYZ quantization?"
aiden3 · a year ago
how would the pricing on databricks when using model serving compare to, say, the prices seen in the original post here (i.e., "3.3 70B is 25X cheaper than GPT4o")?
nickpsecurity · a year ago
I’ve been wanting to run into someone on the Databricks team. Can you ask whoever trains models like MPT to consider training an open model only on data clear of copyright claims? Specifically, one using only Gutenberg and the permissive code in The Stack? Or just Gutenberg?

Since I follow Christ, I can’t break the law or use what might be produced directly from infringement. I might be able to do more experiments if a free, legal model is available. Also, we can legally copy datasets like PG19 since they’re public domain. Whereas, most others have works in which I might need a license to distribute.

Please forward the request to the model trainers. Even a 7B model would let us do a lot of research on optimization algorithms, fine-tuning, etc.

profsummergig · a year ago
Thank you! Very helpful!
mhh__ · a year ago
Yes (don't know about JupypterLab), skip, not really, yes, quite irritating so just pay someone else to do it.
profsummergig · a year ago
Thanks! Succinct and complete.
jerpint · a year ago
Basically equivalent to GitHub but for models. Anyone can upload anything, but it kind of standardizes tools and distribution for everyone. They also have a team that helps integrate releases for easier use and libraries for fine tuning
profsummergig · a year ago
Thanks!

I want to download my first HuggingFace model, and play with it. If you know of a resource that can help me decide what to start with, please share. If you don't, no worries. Thanks again.

theanonymousone · a year ago
I'm "tracking" the price of if 1M tokens in OpenRouter and it is decreasing every few refreshes. It's funny: https://openrouter.ai/meta-llama/llama-3.3-70b-instruct
danielhanchen · a year ago
I uploaded 4bit bitsandbytes, GGUFs and original 16bit weights to https://huggingface.co/unsloth for those interested! You can also finetune Llama 3.3 70B in under 48GB of VRAM and 2x faster and use 70% less memory with Unsloth!
bnchrch · a year ago
Open Sourcing Llama is one of the best example and roll out of "Commoditize Your Complement" in memory.

Link to Gwern's "Laws of Tech: Commoditize Your Complement" for those who havent heard of this strategy before

https://gwern.net/complement

bazmattaz · a year ago
That was so interesting. Thanks for sharing