The license for this [1] prohibits use of the model and its outputs for any commercial activity, or even any "live" (whatever that means) conditions, commercial or not.
There seems to be an exclusion for using the code outputs as part of "development". But wait! It also prohibits "any internal usage by employees in the context of the company's business activities". However you interpret these clauses, this puts their claims and comparisons on completely unequal ground. They only compare to other open-weight models, not GPT-4 or Opus, but a normal company or individual can do whatever they want with the Llama weights and outputs. LangChain? "Your favourite coding and building environment"? Who cares? It seems you're not allowed to integrate this with anything else and show it to anyone, even as an art project.
There's some irony in the fact that people will ignore this license in exactly the same way Mistral and all the other LLM guys ignore the copyright and licensing on the works they ingest.
So basically I, as an open source author, had my code eaten up by Mistral without my consent, but if I want to use their code model I’m subject to a bunch of restrictions that benefit their bottom line?
The problem these AI companies have is they live in a glass house and they can’t throw IP rocks around without breaking their own “your content is our training data” foundation.
They only reason I can think of that Google doesn’t go after OpenAI for scraping YouTube is then they’d put themselves in the same crosshairs, and may set a precedent they’d also be bound by.
Given the model is “on the web” I have the same rights as Mistral to use anything online however I want without regard for IP, right?
Five years ago it would not have been at all controversial that these weights would not be copyrightable in the US, they're machine generated output on third party data. Yet somehow we've entered a weird timeline where obvious copyfraud is fine, by the same entities that are at best on the line of engaging in commercial copyright infringement at a hereto unforeseen scale.
should it be morally ok to not follow these kinds of license, maybe except when you are selling a service without making any changes? i wonder what people visiting this site thinks about this.
> licensed under the new Mistral AI Non-Production License, which means that you can use it for research and testing purposes. ...
Which basically means "we give you this model. Go find its weaknesses and report on r/locallama. Then we'll use that to improve our commercial model which we won't open-source."
I'm sick of abusing the word "open-source" in this field.
> I'm sick of abusing the word "open-source" in this field.
They don’t call this open source anywhere, do they? As far as I can see, they only say it’s open weights and that it’s available under their Mistral AI Non-Production License for research and testing. That doesn’t scream “open source” to me.
This is maybe a debatable claim, but I’ll contend that without the magnificent rebel who leaked the original LLaMA weights the last, what, 15 months would have gone completely differently.
The legislators and courts and lawyers will be years if not decades sorting all this out.
For now there seems to be a productive if slightly uneasy truce: outside of a few groups at a few firms, everyone seems to be maximizing for innovation and generally behaving under a positive sum expectation.
One imagines if some really cool tune of this model shows up as a magnet or even on huggingface, the courteous thing probably happened: Mistral was notified in advance and some mutually beneficial arrangement was agreed to in outline, maybe inked, maybe not.
I don’t work for Mistral, so that’s pure speculation, but the big company I spent most of my career at would have certainly said “can we hire this person? can we buy this company? can we collaborate with people who do awesome stuff with our stuff that we didn’t think of?”
The icky actors kind of dominate the headlines and I’m as guilty as anyone and guiltier than most of letting that be top of mind too often.
In the large this is really cool and kind of new.
I’m personally rather optimistic that we’re well past the point when outright piracy or flagrantly adversarial license violations are either necessary or useful.
To me this license seems like an invitation to build on Mistral’s work and approach them with the results, and given how well a posture of openness with some safeguards is working out for FAIR and the LLaMA group, that’s certainly the outcome I’d be hoping for in their position.
Maybe open AI was an unrealistic goal. Maybe AvailableAI is what we wind up with, and that wouldn’t be too bad.
If you want to live on the legal edge, it’s unclear whether there is any copyright in model weights (since they don’t have human authorship), so just wait for someone to post the weights someplace where you can get them without agreeing to the license.
So, it's almost entirely useless with that license, because the average pack of corpo beancounters will never let you use it over whatever Microsoft has already sold them.
My favorite thing to ask the models designed for programming is: "Using Python write a pure ASGI middleware that intercepts the request body, response headers, and response body, stores that information in a dict, and then JSON encodes it to be sent to an external program using a function called transmit." None of them ever get it right :)
I normally ask about building a multi-tenant system using async SQLAlchemy 2 ORM where some tables are shared between tenants in a global PostgreSQL schema and some are in a per-tenant schema.
Nothing gets it right first time, but when ChatGPT 4 first came out, I could talk to it more and it would eventually get it right. Not long after that though, ChatGPT degraded. It would get it wrong on the first try, but with every subsequent follow up it would forget one of the constraints. Then when it was prompted to fix that one, it forgot a different one. And eventually it would cycle through all of the constraints, getting at least one wrong each time.
Since then benchmarks came out showing that ChatGPT “didn’t really degrade”, but all of the benchmarks seemed focused on single question/answer pairs and not actual multi-turn chat. For this kind of thing, ChatGPT 4 has never managed to recover to as good as it was when it was first released in my experience.
It’s been months since I’ve had to deal with that kind of code, so I might be forgetting something, but I just tried it with Codestral and it spat out something that looked reasonable very quickly on its first try.
>It would get it wrong on the first try, but with every subsequent follow up it would forget one of the constraints. Then when it was prompted to fix that one, it forgot a different one. And eventually it would cycle through all of the constraints, getting at least one wrong each time.
That drives me nuts and makes me ragequit about half the time. Although it's usually more effective to go and correct your initial prompt rather than prompt it again
I had a similar experience. I was trying to get GPT 4 to write some R/Stan code for a bit of bayesian modelling. It would get the model wrong, and then I would walk it through how to do it right, and by the end it would almost get it right, but on the next step, it would be like, oh, this is what you want, and the output was identical to the first wrong attempt, which would start the loop over again.
Give an LLM all the time you want, and they will still not get it right. In fact, they most likely will give worse and worse answers with time. That’s a big difference with a software developer.
I love to ask it to "make me a Node.js library that pings an ipv4 address, but you must use ZERO dependencies, you must only the native Node.js API modules"
The majority of models (both proprietary and open-weight) don't understand:
- by inference, ping means we're talking about ICMP
- ICMP requires raw sockets
- Node.js has no native raw socket API
You can do some CoT trickery to help it reason about the problem and maybe finally get it settled on a variety of solutions (usually some flavor of building a native add-on using C/C++/Rust/Go), or just guide it there step by step yourself, but the back and forth to get there requires a ton of pre-knowledge of the problem space which sorta defeats the purpose. If you just feed it the errors you get verbatim trying to run the code it generates, you end up in painful feedback loops.
(Note: I never expect the models to get this right, it's just a good microcosmic but concrete example of where knowledge & reasoning meets actual programming acumen, so its cool to see how models evolve to get better, if at all, at the task).
This is the same level of gotcha that everyone complains about when interviewing. It's mainly just depending on the interviewee having the same assumptions (pings definitely do not have to be icmp) and the same knowledge base, usually bespoke, (node.js peculiarities). I can see that an llm should know whether raw sockets are available, but that's not what you asked.
In fact you deliberately asked for something impossible and hold up undefined behavior as undefined like it's impugning something.
I usually through some complex Rust code with lifetime requirements. And ask them to fix it.
LLMs aren't capable on providing much help for that in general, other than some very basic cases.
The best way to get your work done is still to look into Rust forums.
It works amazingly well for the ones that never coded in Rust, at least in my experience. It took me a couple hours and 120 lines of code to set up a WebRTC signaling server.
Damn, show us your brilliant prompt then. LLMs cannot do this, not even in python, of which there are libraries like Blacksheep that honestly make it a trivial task.
It's something I know how to do after figuring it out myself and discovering the potential sharp edges, so I've made it into a fun game to test the models. I'd argue that it's a great prompt (to keep using consistently over time) to see the evolution of this wildly accelerating field.
i've been noticing that there's a divergence in philosophy between Llama style LLMs (Mistral are Meta alums so I'm counting them in tehre) and OpenAI/GPT style LLMs when it comes to code.
GPT3.5+ prioritized code very heavily - there's no CodeGPT, its just GPT4, and every version is better than the last.
Whereas the Llama/Mistral models are now shipping the general language model first, then adding CodeLlama/Codestral with additional pretraining (it seems like we don't know how much more tokens are on this one, but CodeLLama was 500B-1T extra tokens of code).
Zuck has mentioned recently that he doesnt see coding ability as important for his usecases, whereas obviously OpenAI is betting heavily on code as a way to improve LLM reasoning for AGI.
That's a really surprising thing to hear, where did you see that? The only quote I've seen is this one:
>“One hypothesis was that coding isn’t that important because it’s not like a lot of people are going to ask coding questions in WhatsApp,” he says. “It turns out that coding is actually really important structurally for having the LLMs be able to understand the rigor and hierarchical structure of knowledge, and just generally have more of an intuitive sense of logic.”
Make Sense, they want better interaction whit users for Whatsapp, Instagram and Facebook marketers, content creation and moderation,and their glasses(ai /ar) I don't see in that context why the should push more effort into llm coding, is sad anyways
> OpenAI is betting heavily on code as a way to improve LLM reasoning for AGI.
And researchers from Google Deepmind, University of Wisconsin-Madison and Laboratoire de l’Informatique du Parallélisme, University of Lyon, actually publish some of their results in that direction [1,2].
Codex[1] is OpenAI's CodeGPT. It's what powers GitHub Copilot and it is very good but not publicly accessible. Maybe they don't want something else to outcompete Copilot.
Is there a way to use this within VSCode like copilot , meaning having the "shadow code" appear while you code instead of having to tho back-and-forth between the editor and a chat-like interface ?
For me, a significant component of the quality of these tools resides on the "client" side; being able to engineer a prompt that will yield to accurate code being generated by the model. The prompt needs to find and embed the right chunks from the user current workspace, or even from his entire org repos. The model is "just" one piece of the puzzle.
Not using Codestral (yet) but check out Continue.dev[1] with Ollama[2] running llama3:latest and starcoder2:3b. It gives you a locally running chat and edit via llama3 and autocomplete via starcoder2.
It's not perfect but it's getting better and better.
Having the chats in Obsidian lets me save them to reference them later in my notes. When I first started using it in VSCode when programming in Python it felt like a lot of noise at first. It kept generating a lot of useless recommendations, but recently it has been super helpful.
I think my only gripe is I sometimes forget to turn off my ollama systemd unit and I get some noticeable video lag when playing games on my workstation. I think for my next video card upgrade, I am going to build a new home server that can fit my current NVIDIA RTX 3090 Ti and use that as a dedicated server for running ollama.
I created a simple CLI app that does this in my workspace, which is under source control so after the LLM execution all the changes are highlighted by diff and the LLM also creates a COMMIT_EDITMSG file describing what it changed. Now I don't use chatgpt anymore, only this cli tool.
I never saw something like this integrated directly on VSCode tho (and isn't my preferred workflow anyway, command line works better).
- You shall only use the Mistral Models and Derivatives (whether or not created by Mistral AI) for testing, research, Personal, or evaluation purposes in Non-Production Environments;
- Subject to the foregoing, You shall not supply the Mistral Models, Derivatives, or Outputs in the course of a commercial activity, whether in return for payment or free of charge, in any medium or form, including but not limited to through a hosted or managed service (e.g. SaaS, cloud instances, etc.), or behind a software layer
Yes, RAM requirement is BnL same for GPU and using the metal/GPU in Apple Silicon.
Running LLM models on a MacBook Pro with Apple Silicon vs. a PC with an Nvidia 4090 GPU has trade-offs. My 128GB MacBook Pro handles models using up to 96GB of unified memory, running at a little under half the speed of a 4090. If you use a quantized version of full floating point model, you can run the largest open models available.
While the 4090 has 24GB of dedicated memory and higher bandwidth (1000 GB/s vs. 400 GB/s on M3 Max), the Mac’s unified memory system (up to 128GB) is flexible and holds smarter models (8 bit and 6 bit models act still mostly all there, 4 bit is so so, 2 bit is brain damaged).
The M2 Ultra in Mac Studio offers even more (800 GB/s bandwidth and 192GB memory). So, ok, 6 or 8 of 4090 cards or 4 x A6000 cards excels in raw performance, but Apple’s unified memory in a laptop fits in your backback.
It's not clear to me why Macbooks and Mac Studio Ultras with maxed out RAM aren't selling better if you look at the convenience and price relative to model size. Models that fit in one 4090 or even a pair of 4090s are toys compared to what fits on these, so for the big models you're comparing a laptop to a minifridge.
It's a bit slower perhaps than the mac, but i get the best of both worlds. That is I get a lot of RAM to hold the model and I can offload as much of it as possible to the GPU. This works especially well with models like mixtral 8x22, but also models like llama3 and the old large bloom model.
I also get the utility of running Linux instead of the closed up mac os.
But running large models locally is not exclusive to mac studio, you can do the same on PC for a much lower cost.
The rule of thumb is roughly 44gb, as most models are trained in bf16, and require 16 bits per parameter, so 2 bytes. You need a bit more for activations, so maybe 50GB?
you need enough RAM and HBM (GPU RAM) so it’s a constraint on both.
Wait for a gguf release of this and it will fit neatly into a 3090 with a decent quant. I'm excited for this model and I'll be adding it to my collection.
I'm honestly not sure on how to measure the amount of vRAM required for these models but I suspect this would run relatively fast, depending on your use case, on a mid to high end 20 or 30 series card. No idea about Apple unified RAM. I get a lot out of performance out of even older cards such as a 1080ti but haven't tested this model.
If I can’t use the output of this in practical code completion use cases, it’s meaningless, because GH Copilot exists. Idk what they’re thinking or what business model they’re envisioning - Copilot is far and away the best model of this kind anyway
There seems to be an exclusion for using the code outputs as part of "development". But wait! It also prohibits "any internal usage by employees in the context of the company's business activities". However you interpret these clauses, this puts their claims and comparisons on completely unequal ground. They only compare to other open-weight models, not GPT-4 or Opus, but a normal company or individual can do whatever they want with the Llama weights and outputs. LangChain? "Your favourite coding and building environment"? Who cares? It seems you're not allowed to integrate this with anything else and show it to anyone, even as an art project.
[1] https://mistral.ai/licenses/MNPL-0.1.md
The problem these AI companies have is they live in a glass house and they can’t throw IP rocks around without breaking their own “your content is our training data” foundation.
They only reason I can think of that Google doesn’t go after OpenAI for scraping YouTube is then they’d put themselves in the same crosshairs, and may set a precedent they’d also be bound by.
Given the model is “on the web” I have the same rights as Mistral to use anything online however I want without regard for IP, right?
Utter absurdity.
Deleted Comment
> licensed under the new Mistral AI Non-Production License, which means that you can use it for research and testing purposes. ...
Which basically means "we give you this model. Go find its weaknesses and report on r/locallama. Then we'll use that to improve our commercial model which we won't open-source."
I'm sick of abusing the word "open-source" in this field.
They don’t call this open source anywhere, do they? As far as I can see, they only say it’s open weights and that it’s available under their Mistral AI Non-Production License for research and testing. That doesn’t scream “open source” to me.
This is maybe a debatable claim, but I’ll contend that without the magnificent rebel who leaked the original LLaMA weights the last, what, 15 months would have gone completely differently.
The legislators and courts and lawyers will be years if not decades sorting all this out.
For now there seems to be a productive if slightly uneasy truce: outside of a few groups at a few firms, everyone seems to be maximizing for innovation and generally behaving under a positive sum expectation.
One imagines if some really cool tune of this model shows up as a magnet or even on huggingface, the courteous thing probably happened: Mistral was notified in advance and some mutually beneficial arrangement was agreed to in outline, maybe inked, maybe not.
I don’t work for Mistral, so that’s pure speculation, but the big company I spent most of my career at would have certainly said “can we hire this person? can we buy this company? can we collaborate with people who do awesome stuff with our stuff that we didn’t think of?”
The icky actors kind of dominate the headlines and I’m as guilty as anyone and guiltier than most of letting that be top of mind too often.
In the large this is really cool and kind of new.
I’m personally rather optimistic that we’re well past the point when outright piracy or flagrantly adversarial license violations are either necessary or useful.
To me this license seems like an invitation to build on Mistral’s work and approach them with the results, and given how well a posture of openness with some safeguards is working out for FAIR and the LLaMA group, that’s certainly the outcome I’d be hoping for in their position.
Maybe open AI was an unrealistic goal. Maybe AvailableAI is what we wind up with, and that wouldn’t be too bad.
On whose code is Mistral trained?
Examples: recurring infringement from Microsoft on open-source projects, Google scraping content to build their own database, etc...
Now they just lack the means to enforce it.
> Mistral AI may terminate this Agreement at any time [...]. Sections 5, 6, 7 and 8 shall survive the termination of this Agreement.
[0] https://o565.com/content-ownership-and-licensing-agreement/
Deleted Comment
Nothing gets it right first time, but when ChatGPT 4 first came out, I could talk to it more and it would eventually get it right. Not long after that though, ChatGPT degraded. It would get it wrong on the first try, but with every subsequent follow up it would forget one of the constraints. Then when it was prompted to fix that one, it forgot a different one. And eventually it would cycle through all of the constraints, getting at least one wrong each time.
Since then benchmarks came out showing that ChatGPT “didn’t really degrade”, but all of the benchmarks seemed focused on single question/answer pairs and not actual multi-turn chat. For this kind of thing, ChatGPT 4 has never managed to recover to as good as it was when it was first released in my experience.
It’s been months since I’ve had to deal with that kind of code, so I might be forgetting something, but I just tried it with Codestral and it spat out something that looked reasonable very quickly on its first try.
That drives me nuts and makes me ragequit about half the time. Although it's usually more effective to go and correct your initial prompt rather than prompt it again
The majority of models (both proprietary and open-weight) don't understand:
- by inference, ping means we're talking about ICMP
- ICMP requires raw sockets
- Node.js has no native raw socket API
You can do some CoT trickery to help it reason about the problem and maybe finally get it settled on a variety of solutions (usually some flavor of building a native add-on using C/C++/Rust/Go), or just guide it there step by step yourself, but the back and forth to get there requires a ton of pre-knowledge of the problem space which sorta defeats the purpose. If you just feed it the errors you get verbatim trying to run the code it generates, you end up in painful feedback loops.
(Note: I never expect the models to get this right, it's just a good microcosmic but concrete example of where knowledge & reasoning meets actual programming acumen, so its cool to see how models evolve to get better, if at all, at the task).
In fact you deliberately asked for something impossible and hold up undefined behavior as undefined like it's impugning something.
The best way to get your work done is still to look into Rust forums.
'Pray, Mr. Babbage, if you put into the machine wrong figures, will the right answers come out?'
Deleted Comment
GPT3.5+ prioritized code very heavily - there's no CodeGPT, its just GPT4, and every version is better than the last.
Whereas the Llama/Mistral models are now shipping the general language model first, then adding CodeLlama/Codestral with additional pretraining (it seems like we don't know how much more tokens are on this one, but CodeLLama was 500B-1T extra tokens of code).
Zuck has mentioned recently that he doesnt see coding ability as important for his usecases, whereas obviously OpenAI is betting heavily on code as a way to improve LLM reasoning for AGI.
That's a really surprising thing to hear, where did you see that? The only quote I've seen is this one:
>“One hypothesis was that coding isn’t that important because it’s not like a lot of people are going to ask coding questions in WhatsApp,” he says. “It turns out that coding is actually really important structurally for having the LLMs be able to understand the rigor and hierarchical structure of knowledge, and just generally have more of an intuitive sense of logic.”
https://www.theverge.com/2024/1/18/24042354/mark-zuckerberg-...
And researchers from Google Deepmind, University of Wisconsin-Madison and Laboratoire de l’Informatique du Parallélisme, University of Lyon, actually publish some of their results in that direction [1,2].
[1]: https://deepmind.google/discover/blog/funsearch-making-new-d...
[2]: https://www.nature.com/articles/s41586-023-06924-6
Codex[1] is OpenAI's CodeGPT. It's what powers GitHub Copilot and it is very good but not publicly accessible. Maybe they don't want something else to outcompete Copilot.
[1] https://openai.com/index/openai-codex/
No, if anything he said Meta realized coding abilities make the model overall better, so they focused on those more than before.
Deleted Comment
For me, a significant component of the quality of these tools resides on the "client" side; being able to engineer a prompt that will yield to accurate code being generated by the model. The prompt needs to find and embed the right chunks from the user current workspace, or even from his entire org repos. The model is "just" one piece of the puzzle.
It's not perfect but it's getting better and better.
[1] https://www.continue.dev/ [2] https://ollama.com/
I've had the odd crash now and again, but I can't think of many sites that will reliably make it hard crash. It's almost impressive.
Having the chats in Obsidian lets me save them to reference them later in my notes. When I first started using it in VSCode when programming in Python it felt like a lot of noise at first. It kept generating a lot of useless recommendations, but recently it has been super helpful.
I think my only gripe is I sometimes forget to turn off my ollama systemd unit and I get some noticeable video lag when playing games on my workstation. I think for my next video card upgrade, I am going to build a new home server that can fit my current NVIDIA RTX 3090 Ti and use that as a dedicated server for running ollama.
I never saw something like this integrated directly on VSCode tho (and isn't my preferred workflow anyway, command line works better).
https://m.youtube.com/watch?v=mjltGOJMJZA
- You shall only use the Mistral Models and Derivatives (whether or not created by Mistral AI) for testing, research, Personal, or evaluation purposes in Non-Production Environments;
- Subject to the foregoing, You shall not supply the Mistral Models, Derivatives, or Outputs in the course of a commercial activity, whether in return for payment or free of charge, in any medium or form, including but not limited to through a hosted or managed service (e.g. SaaS, cloud instances, etc.), or behind a software layer
Is there a rule-of-thumb estimate for how much RAM this would need to be used locally?
Is the RAM requirement the same for a GPU and "unified" RAM like Apple silicon?
When the model gets quantized to say 4bit ints, it'll be 22B params * 0.5 bytes = 11GB for example.
B: number of parameters
Q: quantization (16 = no quantization)
via https://news.ycombinator.com/item?id=40090566
Running LLM models on a MacBook Pro with Apple Silicon vs. a PC with an Nvidia 4090 GPU has trade-offs. My 128GB MacBook Pro handles models using up to 96GB of unified memory, running at a little under half the speed of a 4090. If you use a quantized version of full floating point model, you can run the largest open models available.
While the 4090 has 24GB of dedicated memory and higher bandwidth (1000 GB/s vs. 400 GB/s on M3 Max), the Mac’s unified memory system (up to 128GB) is flexible and holds smarter models (8 bit and 6 bit models act still mostly all there, 4 bit is so so, 2 bit is brain damaged).
The M2 Ultra in Mac Studio offers even more (800 GB/s bandwidth and 192GB memory). So, ok, 6 or 8 of 4090 cards or 4 x A6000 cards excels in raw performance, but Apple’s unified memory in a laptop fits in your backback.
It's not clear to me why Macbooks and Mac Studio Ultras with maxed out RAM aren't selling better if you look at the convenience and price relative to model size. Models that fit in one 4090 or even a pair of 4090s are toys compared to what fits on these, so for the big models you're comparing a laptop to a minifridge.
It's a bit slower perhaps than the mac, but i get the best of both worlds. That is I get a lot of RAM to hold the model and I can offload as much of it as possible to the GPU. This works especially well with models like mixtral 8x22, but also models like llama3 and the old large bloom model.
I also get the utility of running Linux instead of the closed up mac os.
But running large models locally is not exclusive to mac studio, you can do the same on PC for a much lower cost.
Aren't these machines extremly expensive and generally not upgradable?
Deleted Comment
you need enough RAM and HBM (GPU RAM) so it’s a constraint on both.
Dead Comment