Llama-3.3-70B-Instruct

bash> time ollama run llama3.3 "What's the purpose of an LLM?" | tee ~/Downloads/what\ is\ an\ LLM.txt A Large Language Model (LLM) is a type of artificial intelligence (AI) designed to process and understand human language. The primary purposes of an LLM are:

Overall, the purpose of an LLM is to augment human capabilities by providing a powerful tool for understanding, generating, and interacting with human language. real 0m59.040s user 0m0.071s sys 0m0.081s pmarreck 59s35ms 20241206220629 ~ bash> wc -w Downloads/what\ is\ an\ LLM.txt 359 Downloads/what is an LLM.txt

This reminds me of Steve Jobs's famous comment to Dropbox about storage being 'a feature, not a product.' Zuckerberg - by open-sourcing these powerful models, he's effectively commoditising AI while Meta's real business model remains centred around their social platforms. They can leverage these models to enhance Facebook and Instagram's services while simultaneously benefiting from the community improvements and attention. It's not about selling AI; it's about using AI to strengthen their core business. By making it open, they get the benefits of widespread adoption and development without needing to monetise the models directly.

lolinder · a year ago

Also don't underestimate the value they're getting from making more overtures to the developer community. It could be a coincidence, but it's only since they started releasing these models that I started noticing people on HN calling them "Meta", and attitudes towards them have been far more positive of late than usual.

Good will isn't worth as much as cheap moderation automation and fancy features, but it's worth something.

yodsanklai · a year ago

> Also don't underestimate the value they're getting from making more overtures to the developer community.

I wonder if it's significant. As developers, we're biased to think it matters, but in the grand scheme of things, 99.99% of people don't have a clue about open source or things that matter to hackers. As far as recruitment go, developers look primarily at how much they make, possibly the tech and how it looks on resume. There's always been a stigma around social networks and generally big tech companies, but not to the point it's going to hurt them.

LordDragonfang · a year ago

It's funny how quickly Zuck managed to turn his image around from "data-stealing actual lizard person" to "kind of normal guy" with a few years and a haircut. It's also not lost on me that he's the only "hacker" major tech CEO remaining:

   - Sundar is a glorified bean counter and his company is rotting from the inside, only kept afloat by the money printer that is ads.
   - Satya and Microsoft are in a similar boat, with the only major achievement being essentially buying OpenAI while every other product gets worse
   - Tim Cook is doing good things with Apple, but he still runs the company more like a fashion company than a tech company
   - Amazon was always more about logistics than cool hack value, and that hasn't changed since Bezos left
   - Elon is Elon

Meanwhile Zuck is spending shareholder money pushing forward consumer VR because he thinks it's cool, demoing true AR glasses, releasing open-source models, and building giant Roman-style statues of his wife.

signal11 · a year ago

Facebook Engineering has always been well regarded — starting with React on the front end, but also projects like Open Compute.

Their product management on the other hand— well, I mean, Facebook and Instagram are arguably as popular as McDonald’s. So they’ve got that going for them.

talldayo · a year ago

It's funny. The only time I've ever seen Hacker News unanimously applaud a Facebook product was when Pytorch announced they merged Apple Silicon support. Seems like Mr. Zuckerburg knows how to play a winning hand.

swalsh · a year ago

I call them OpenAI instead of Meta.

ecocentrik · a year ago

It would be strange if they didn't also use these models to generate much more sophisticated models of their user's interests and hyper-targeted advertising that always looks and feels like trusted friend recommendations for the exact product that's been missing from your life.

huijzer · a year ago

I already was thinking for a while what the business model of open source was exactly. Why does Google spend money on Chrome also? After Zuckerberg’s comments it hit me:

Open source is useful for a business if it can either increase revenue or decrease costs.

Examples:

Increase revenue: Chrome and Visual Studio code. For example, the more people code, the more likely it is that they pay MSFT. So VS code aims to make programming as attractive as possible. Similar for Chrome.

Decrease costs: Linux and Llama. As Zuckerburg said himself IIRC, they don’t want one party snowball into an LLM monopoly so they rather help to get the open source ball rolling.

rafaelmn · a year ago

I think Balmers "developers, developers, developers" meme has been around longer than some people here are alive - it served them well in the Windows era and it serves them well in the cloud space.

Spooky23 · a year ago

Exactly. Microsoft wants lots of cash for Copilot, meanwhile, we had Code Llama running with 150 developers before the Microsoft idiots could schedule a meeting.

barbazoo · a year ago

> For example, the more people code, the more likely it is that they pay MSFT. So VS code aims to make programming as attractive as possible

How does that increase revenue in a remotely measurable way?

Chrome, for sure, high market share, default search engine, more money, at least that's how I imagine it.

nimish · a year ago

Maybe Zuck just wants to see cool shit. It's not like he needs the money

petercooper · a year ago

Commoditize your complement: https://gwern.net/complement

muixoozie · a year ago

>storage being 'a feature, not a product.

Somewhat unrelated mini-rant. Upgraded a phone recently after about 3 years. Surprised to see storage still capped around 128GB (in-general). That's got to be artificially held back capacity to push cloud storage services?

viraptor · a year ago

There's lots of phones with more and/or with SD slots. It's not really "capped" as much as default size that seems to work just fine for the majority.

kstrauser · a year ago

I’ve got a 512GB phone with 112GB used. I’ve put absolutely no effort whatsoever into keeping that number done, and I’m not shy about downloading stuff to it.

I’m certain plenty of people need way more than 128GB. I figured I’d be one of them when I bought this. Nope. I bought a much bigger device than I actually needed.

If I’ve used less than 128GB, I’ve gotta think most other people do too. Not all, clearly! But most? I’d bet on it.

Spooky23 · a year ago

They’ve dramatically improved their ad quality. I routinely check out on convert on Facebook and Instagram ads, and I can honestly say I never intentionally clicked on one for 20 years or more, once.

rafaelmn · a year ago

Ironically gen AI made their products worse more than the rest. I can't believe the amount of AI slop I see every time I open Facebook - I'd check it occasionally when replying to messenger and scroll through for a while, after seeing the AI spam I don't even bother.

paxys · a year ago

That was going to happen regardless. From Meta's perspective it's better for their platforms to contain their own AI slop than OpenAI's.

jazzyjackson · a year ago

truly I wonder if they're fooled by their own click fraud, or, if the incentives really do work out that they get paid whether the engagement is from bots or people, but anyway I came here to say the same thing, it's shocking to me how enthusiastic Zuckerburg is about generative AI, what other possible outcome is there except actual human content creation being replaced by slop

barbazoo · a year ago

Do the improvements the community proposes/makes to their models amount to anything significant? For a company like Meta with basically infinite money, do they really benefit from external help?

lolinder · a year ago

I don't have eyes inside of Meta, but keep in mind that we're not just talking about fine-tunes and LoRAs, we're also talking about the entire llama ecosystem, including llama.cpp, ollama, and llamafile. These would not exist (or wouldn't have anything like as much momentum) without Meta's investment, but they're now huge community projects.

I don't know if they use them internally, of course, but they could, and they represent a lot of work.

andy_ppp · a year ago

Baggy Tees, gold chains and now this!? Make this man president immediately!

Please help me understand something.

I've been out of the loop with HuggingFace models.

What can you do with these models?

1. Can you download them and run them on your Laptop via JupyterLab?

2. What benefits does that get you?

3. Can you update them regularly (with new data on the internet, e.g.)?

4. Can you finetune them for a specific use case (e.g. GeoSpatial data)?

5. How difficult and time-consuming (person-hours) is it to finetune a model?

(If HuggingFace has answers to these questions, please point me to the URL. HuggingFace, to me, seems like the early days of GitHub. A small number were heavy users, but the rest were left scratching their heads and wondering how to use it.)

Granted it's a newbie question, but answers will be beneficial to a lot of us out there.

joshhart · a year ago

Hi,

Yes you can. The community creates quantized variants of these that can run on consumer GPUs. A 4-bit quantization of LLAMA 70b works pretty well on Macbook pros, the neural engine with unified CPU memory is quite solid for these. GPUs is a bit tougher because consumer GPU RAM is still kinda small.

You can also fine-tune them. There are lot of frameworks like unsloth that make this easier. https://github.com/unslothai/unsloth . Fine-tuning can be pretty tricky to get right, you need to be aware of things like learning rates, but there are good resources on the internet where a lot of hobbyists have gotten things working. You do not need a PhD in ML to accomplish this. You will, however, need data that you can represent textually.

Source: Director of Engineering for model serving at Databricks.

Deleted Comment

vtail · a year ago

Thank you Josh. Is there a resource you can point us too that helps answer "what kind of MacBook pro memory do I need to run ABC model at XYZ quantization?"

aiden3 · a year ago

how would the pricing on databricks when using model serving compare to, say, the prices seen in the original post here (i.e., "3.3 70B is 25X cheaper than GPT4o")?

nickpsecurity · a year ago

I’ve been wanting to run into someone on the Databricks team. Can you ask whoever trains models like MPT to consider training an open model only on data clear of copyright claims? Specifically, one using only Gutenberg and the permissive code in The Stack? Or just Gutenberg?

Since I follow Christ, I can’t break the law or use what might be produced directly from infringement. I might be able to do more experiments if a free, legal model is available. Also, we can legally copy datasets like PG19 since they’re public domain. Whereas, most others have works in which I might need a license to distribute.

Please forward the request to the model trainers. Even a 7B model would let us do a lot of research on optimization algorithms, fine-tuning, etc.

profsummergig · a year ago

Thank you! Very helpful!

mhh__ · a year ago

Yes (don't know about JupypterLab), skip, not really, yes, quite irritating so just pay someone else to do it.

profsummergig · a year ago

Thanks! Succinct and complete.

jerpint · a year ago

Basically equivalent to GitHub but for models. Anyone can upload anything, but it kind of standardizes tools and distribution for everyone. They also have a team that helps integrate releases for easier use and libraries for fine tuning

profsummergig · a year ago

Thanks!

I want to download my first HuggingFace model, and play with it. If you know of a resource that can help me decide what to start with, please share. If you don't, no worries. Thanks again.

Benchmarks - https://www.reddit.com/r/LocalLLaMA/comments/1h85ld5/comment...

Seems to perform on par with or slightly better than Llama 3.2 405B, which is crazy impressive.

Edit: According to Zuck (https://www.instagram.com/p/DDPm9gqv2cW/) this is the last release in the Llama 3 series, and we'll see Llama 4 in 2025. Hype!!

state_less · a year ago

I'm getting 2.12 tok/s[1] on a 24GB (4090) GPU and 64GB (7950x) CPU memory, splitting the model across the GPU and CPU (40/80 layers on GPU) with lm-studio. Output looks good so far, I can use something like this for a query that I want as good an answer as possible and that I don't want to send out on the network.

If we can get better quantization, or bigger GPU memory footprints, we might be able to use these big models locally for solid coding assistants. That's what I think we have to look forward to (among other benefits) in the year(s) ahead.

1. lmstudio-community/Llama-3.3-70B-Instruct-GGUF/Llama-3.3-70B-Instruct-Q4_K_M.gguf

Me1000 · a year ago

The 32B parameter model size seems like the sweet spot right now, imho. It's large enough to be very useful (Qwen 2.5 32B and the Coder variant our outstanding models), and they run on consumer hardware much more easily than the 70B models.

I hope Llama 4 reintroduces that mid sized model size.

Sharlin · a year ago

A question: How large LLMs can be run at reasonable speed on 12GB (3060), 32GM RAM? How much does quantization impact output quality? I've worked with image models (SD/Flux etc) quite a bit, but haven't yet tried running a local LLM.

pmarreck · a year ago

How do you measure tokens/sec? Here's my attempt on a new M4 Max 128GB, does about 6 words/sec:

(... contents excerpted for brevity)

cjbprime · a year ago

Any opinion on whether the q4 quantization is stable/effective? That's a lot of quantization.

Edit: Perhaps answering my own question:

λ ollama run hf.co/lmstudio-community/Llama-3.3-70B-Instruct-GGUF:Q4_K_M

>>> Hi. Who are you?

I'm an artificial intelligence model known as Llama. Llama stands for "Large Language Model Meta AI."

kristianp · a year ago

Can llama.cpp make use of the gpu built into the 7950x CPU? I assume that would improve performance.

85392_school · a year ago

FYI, due to Llama's naming scheme, there is no such thing as Llama 3.2 405B. 8B/70B/405B models are either Llama 3, 3.1, or 3.3 (except for 405B which wasn't initially released), while Llama 3.2 only contains 1B, 3B, 11B (vision), and 90B (vision) models. It's a bit confusing.

Ah, so I guess the comparison is to Llama 3.1 405B.

blueboo · a year ago

It could be worse. It could’ve been Llama 3.1 (New)

yieldcrv · a year ago

yeah I use Llama 3.2 3B and I'm blown away

but also wrestled with this mentally.

Meta both improves the technology or inference, while also trapping themselves alongside every other person training models to always update the training set every few months, so it knows what its talking about with relevant current events

Lerc · a year ago

Given how close it is to 405B in performance it would be interesting to see which has the edge comparing an unquantized 3.3-70B against 405B quantized to be the same size.

vletal · a year ago

That would be 1.38 bits per weight on average, which I can confidently guess would not perform well.

It's kind of amazing how there seems to be a wall where sizing up the model starts to diminish in terms of intelligence gains. I guess that's why we can still compete with whales even though their brains are like twice as big as ours.

int_19h · a year ago

I have tried 405B at 1-bit quantization. It remains coherent, but didn't seem to be any better than 3.1-70B.

ben30 · a year ago

LorenDB · a year ago

Seems to be more or less on par with GPT-4o across many benchmarks: https://x.com/Ahmad_Al_Dahle/status/1865071436630778109

rvnx · a year ago

Except it is 25x cheaper, available offline, can be uncensored / unaligned, fine-tuneable and backupable.

Sad day for OpenAI. Great for humanity.

madars · a year ago

What are good starting points for uncensoring it? Because it is offline a jailbreak prompt can't be remote-bricked but can one remove censorship from the weights themselves? What does it do to accuracy?

stainablesteel · a year ago

zuck is really on his redemption arc, he's out-doing himself

m3kw9 · a year ago

How said? Tiny violin sad. The typical consumer which likely is 99% of them is not going to use this

Kiro · a year ago

How do you calculate the price?

jug · a year ago

This year seems to finish on the same note as it began -- that most AI evolution happens in the smaller models. There's been a true shift as corporations have started to realize the value of training data and massively outsizing the resulting model size.

usaar333 · a year ago

Which GPT-4O are those?

The 08-06 release seems to be a bit higher on numerous benchmarks than what that shows: https://github.com/openai/simple-evals?tab=readme-ov-file#be...

griomnib · a year ago

This just makes the $200/month even more laughable.

ttul · a year ago

The $200 plan is for people who would pay $200 for a bottle of vodka even though the $20 bottle is molecularly identical.

afro88 · a year ago

How? 4o is part of the plus plan, as is o1.

zxvkhkxvdvbdxz · a year ago

But its $200/month smart!

namlem · a year ago

o1-pro is way smarter still

freediver · a year ago

Does unexpectedly well on our benchmark:

https://help.kagi.com/kagi/ai/llm-benchmark.html

Will dive into it more, but this is impressive.

stavros · a year ago

I asked it:

> I have a sorcerer character on D&D 5e and I've reached level 6. What do I get?

It confabulated a bunch of stuff. I also asked GPT-4, it confabulated a bit. Claude was spot on.

theanonymousone · a year ago

I'm "tracking" the price of if 1M tokens in OpenRouter and it is decreasing every few refreshes. It's funny: https://openrouter.ai/meta-llama/llama-3.3-70b-instruct

danielhanchen · a year ago

I uploaded 4bit bitsandbytes, GGUFs and original 16bit weights to https://huggingface.co/unsloth for those interested! You can also finetune Llama 3.3 70B in under 48GB of VRAM and 2x faster and use 70% less memory with Unsloth!

bnchrch · a year ago

Open Sourcing Llama is one of the best example and roll out of "Commoditize Your Complement" in memory.

Link to Gwern's "Laws of Tech: Commoditize Your Complement" for those who havent heard of this strategy before

https://gwern.net/complement

bazmattaz · a year ago

That was so interesting. Thanks for sharing