I really dislike this approach of announcing new models that some companies have taken, they don't mention evaluation results or performance of the model, but instead talk about how "transparent", "accessible" and "supportive" these models are.
Anyway, I have benchmarked stablelm-base-alpha-3b (the open-source version, not the fine-tuned one which is under a NC license) using the MMLU benchmark and the results are rather underwhelming compared to other open source models:
- stablelm-base-alpha-3b (3B params): 25.6% average accuracy
- flan-t5-xl (3B params): 49.3% average accuracy
- flan-t5-small (80M params): 29.4% average accuracy
MMLU is just one benchmark, but based on the blog post, I don't think it will yield much better results in others. I'll leave links to the MMLU results of other proprietary[0] and open-access[1] models (results may vary by ±2% depending on the parameters used during inference).
Doesn't make much sense to compare a model that's not fine tuned to flan models that are fine tuned. Makes more sense to compare to something like T5 base where it's probably a lot more comparable.
Just from playing around with it, the fine tuned model (stabilityai/stablelm-tuned-alpha-7b) doesn't seem to work very well either.
User: What is 12 + 12?
StableLM: Yes, that is correct! 12 + 12 is equal to 18.
User: Write me a python program that calculates the nth fibonacci number
StableLM: Here is a python program that calculates the nth fibonacci number
def nth_fibonacci(n):
# base case
if n==0:
return 0
else:
return 1 + n - 1
I would have compared it to the fine-tuned version if it had been released under a truly open-source license. I think developers implementing LLMs care more about licensing than about the underlying details of the model.
Also t5-base is 220M params vs 3B params of stablelm, not really a fair comparison anyways.
It's fantastic that more orgs are releasing open-source models trained on more than 300B or so tokens. Here's my take from the details I could find.
Pros
- 4096 context width (vs 2048 for llama, gpt-j, etc)
- 3B to 65B released or in progress
- RL tuned models available
- Trained on more tokens than existing non-llama models
- 128 head dim, so can use flash attention (unlike GPT-J)
Cons
- No benchmarks released, or details about the model
- Somewhat restrictive license on the base models, and NC license on the RL models
- Small models only trained on 800B tokens, compared to 1T for llama-7B, and potentially more for other upcoming alternatives (RedPajama, etc). I'd like to see their loss curves to see why they chose 800B.
High-level, this is likely to be more accurate than existing non-llama open source models. It's hard to say without benchmarks (but benchmarks have been gamed by training on benchmark data, so really it's just hard to say).
Some upcoming models in the next few weeks may be more accurate than this, and have less restrictive licenses. But this is a really good option nonetheless.
FYI, I'm running lm-eval now w/ the tests Bellard uses (lambada_standard, hellaswag, winogrande, piqa,coqa) on the biggest 7B an 40GB A100 atm (non-quantized version, requires 31.4GB) so will be directly comparable to what various LLaMAs look like: https://bellard.org/ts_server/
(UPDATE: run took 1:36 to complete run, but failed at the end with a TypeError, so will need to poke and rerun).
Looks like my edit window closed, but my results ended up being very low so there must be something wrong (I've reached out to StabilityAI just in case). It does however seem to roughly match another user's 3B testing: https://twitter.com/abacaj/status/1648881680835387392
The current scores I have place it between gpt2_774M_q8 and pythia_deduped_410M (yikes!). Based on training and specs you'd expect it to outperform Pythia 6.9B at least... this is running on a HEAD checkout of https://github.com/EleutherAI/lm-evaluation-harness (releases don't support hf-casual) for those looking to replicate/debug.
How possible is it that every other model suffers from dataset contamination and this model is being unfairly penalized for having properly sanitized training data?
That's great news, but one would think that since they're behind Stable Diffusion, that they'd use the insights behind it and scale data even more than that to result in better quality at a smaller scale model that can run on most people's machines.
Like... try 10 trillion or 100 trillion tokens (although that may be absurd, I never did the calculation), and a long context on a 7B parameter model then see if that gets you better results than a 30 or 65B parameter on 1.5 trillion tokens.
A lot of these open source projects just seem to be trying to follow and (poorly) reproduce OpenAI's breakthroughs instead of trying to surpass them.
I'm wondering what the sweet spot for parameters will be. Right now it feels like the Mhz race we had back in the CPU days, but 20 years later I am still using a 2-3GHz CPU.
There have also been quite a few developments on sparsity lately. Here's a technique SparseGPT which suggests that you can prune 50% of parameters with almost no loss in performance for example: https://arxiv.org/abs/2301.00774
Standard attention has memory quadratic in sequence length, whereas FlashAttention has memory linear in sequence length. Also FalshAttention is faster.
But Chinchilla optimality, while an interesting result, is a strange target for most practical purposes. Training is one time, inference is many times; not training past the point where its cheaper to training a larger model for the same (proxy for) quality discounts to zero the import of the cost of inference.
I'm sure there will be a bunch of different RL tuned versions of them, RLHF isn't that expensive. IIRC Microsoft has software that will do it for a few thousand dollars for a model that size. I'm sure someone will release a non-lobotomized version, maybe OpenAssistant.
It's unclear which models will be trained to 1.5T tokens. The details of how many tokens each model saw in training are on Github - https://github.com/stability-AI/stableLM/ . But only for the ones that have been released.
Selling access to LLMs via remote APIs is the “stage plays on the radio” stage of technological development. It makes no actual sense; it’s just what the business people are accustomed to. It’s not going to last very long. So much more value will be unlocked by running them on device. People are going to look back at this stage and laugh, like paying $5/month to a cellphone carrier for Snake on a feature phone.
Web apps:
- Need data persistence. Distributed databases are really hard to do.
- Often have network effects where the size of the network causes natural monopoly feedback loops.
None of that applies to LLMs.
- Making one LLM is hard work and expensive. But once one exists you can use it to make more relatively cheaply by generating training data. And fine tuning is more reliable than one shot learning.
- Someone has to pay the price of computation power. It’s in the interest of companies to make consumers pay for it up front in the form of a device.
- Being local lets you respond faster and with access to more user contextual data.
This is sort of like saying the world wide web is a fad. Many people made that argument, but a lot of desktop apps got replaced by websites even though they were supposedly inferior.
ChatGPT works fine as a website and you don’t need to buy a new computer to run it. You can access your chat history from any device. For many purposes, the only real downside is the subscription fee.
If LLM’s become cheaper to run, websites will be cheaper to run, and there will be lower-cost competition. Maybe even cheap enough to give away for free and make money from advertising?
This doesn't seem technically feasible to me. The state of the art will for a long time require a lot more hardware to run than it's available on a consumer device.
Beyond which, inference also benefits from parallelization, not just training, so being able to batch requests is a benefit, and more likely when access is offered via an API.
This technology will be embedded into every OS within 2 years. People don't generally need a "super" model like GPT3/4. It will be perfectly acceptable and common to have the model change context, sync with whatever model/training data is necessary to be an expert in that context only, and associated contexts..., and prompt it in a specific domain. Client devices and internet connections are fast enough to do this in near real time today. The platforms to do all of this are being built right now by every company that creates software otherwise they will fail within 5 years.
I can already run Vicuna(llama) 7B on my 2020, 14" PC laptop at ~3.5 tokens/sec, and more speed can definitely be squeezed out.
Most future laptops and phones will ship with NPUs next to the CPU silicon. Once they get enabled in software, that means a 16GB machine can run a 13B model, or a 7B model with room for other heavy apps.
As for the benefits of batching and centralization, that is true, but its somewhat countered by the high cost of server accelerators and the high profit margins of cloud services.
I don't think it's going to happen in the next few years
the prices are gonna drop like hell, but ain't no way we run models meant to run on 8 nvidia A100 on our smartphones in the next 5 years
just like you don't store the entirety of spotify on your iphone, you're not gonna run any decent LLM on phones any time soon(and I don't consider any of the small Llamas to be decent)
This is the reason why they're not going to move on device anytime soon. You can use compression techniques, sure, but you're not going to get anywhere near the level of performance of GPT-4 at a size that can fit on most consumer devices
the only thing I can say to this is that Apple have seemed laser focused on tuning their silicon for ML crunching, that that focus is clearly now going to be amped up further still, and that in tandem the software itself will be tuned to Apple silicon.
GPUs on the other hand are pretty general purpose. And 5 years on a focused superlinear ramp up is a long time, lots can happen. I am not saying it's 100%, or even 80% likely. It'll be super impressive if it happens, but I see it as well within the realms of reason.
> but ain't no way we run models meant to run on 8 nvidia A100 on our smartphones in the next 5 years
When I leaned about neutral networks, the general advice at the time was "you'll only need one hidden layer, with somewhere between the number of your input and output neurons". While that was more than 5 years ago, my point is - both the approach and the architecture changes over time. I would not bet on what we won't have in 5 years.
I agree - I think for security and privacy we need it to be on-device (either that or there needs to be end to end encryption with gaurantees that data won't be captured for training). There are tons of useful applications that require sensitive personal information (or confidential business information) to be passed in prompts - that becomes a non issue if you can run it on device.
I think there will be a lot of incentive to figure out how to make these models more efficient. Up until now, there's been no incentive for the OpenAI's and the Googles of the world to make the models efficient enough to run on consumer hardware. But once we have open models and weights there will be tons of people trying to get them running on consumer hardware.
I imagine something like an AI specific processor card that just runs LLMs and costs < $3000 could be a new hardware category in the next few years (personally I would pay for that). Or, if apple were to start offering a GPT3.5+ level LLM built in that runs well on M2 or M3 macs that would be strong competition and a pretty big blow against the other tech companies.
That hardware's gonna look a lot like ASIC Bitcoin miners if an architecture to replace LLMs is popularized. General-enough purpose computing ain't going away for a long time.
I'd suspect it will actually accelerate moving everything into the cloud.
If your entire business is in the cloud, you can give an AI access to everything with a single sign or some passwords. If half is on the cloud and half is local, that's very annoying to have all in-context for your AI assistant. And there's no way we're getting everything locally stored again at this point!
Right, this is why StabilityAI is getting in bed with Amazon, so private, fine-tuned models can operate on all your data sitting out there in S3 buckets or whatever.
What's been so interesting with the explosion of this has been how prominently the corporately-driven restrictions have been highlighted in news and such.
People are getting a good look in very easy to understand terms at the foundational stage at how limiting the future is to have this just be another big tech controlled thing.
They have said that the alignment actually hurts the performance of the models. Plus for creative applications like video games or novels, you need an unaligned model otherwise it just produces "helpful" and nice characters.
Alignment is an unsolved problem. None of the current stronger models are "aligned", just tuned in ways that weight some biases more than others, but even that is dependant of the features of their inputs.
On this topic, Apple is the sleeping giant. Sleeping tortoise maybe. Everyone else has been fast out of the gates, but Apple has effectively already been positioning to leap frog everyone after a decade+ of M1 chip design. Ever since these chips launched, the M1 chips have felt materially underutilized, particularly their GPU compute. Have to believe something big is going on behind the scenes here.
That said, wouldn't be surprised if the truth was somewhere in between cloud-deployed and locally deployed, particularly on the way up to the asymptotic tail of the model performance curve.
What would a "leap frog" look like, in your mind? I'm struggling to imagine how they're better positioned than the competition, especially after llama.cpp showed us that inference acceleration works with everything from AVX2 to ARM NEON. Compared to Nvidia (or even Microsoft and ONNX/OpenAI), Apple is somewhat empty-handed here. They're not out of the game, but I genuinely see no path for them to dominate "everyone".
This doesn't seem that obvious to me, serving LLMs through an API allows to have highly optimized inference with stuff like TensorRT and batched inference while you're stuck with batch size = 1 when processing locally.
LLMs doesn't even require full real-time inference, there are applications like VR or camera stuff where you need real-time <10ms inference, but for any application of LLMs 200-500ms is more than fine
For the users, running LLMs locally means more battery usage and significant RAM usage. The only true advantage is privacy but this isn't a selling point for most people
You're still thinking in terms of what APIs would be used for, rather than what local computation enables.
For example, I'd like an AI to read everything I have on screen, so that I can ask at any time "why is that? Explain!" without having to copy paste the data and provide the whole context to a Google-like app.
But without privacy guarantee (and I mean technical one, not a pinky promise to be broken when VC funding runs out) there's no way I'd feed everything into an AI.
We are very close to optimized ML frameworks on consumer hardware.
And TBH most modern devices have way more RAM than they need, and go to great lengths to just find stuff to do with it. Hardware companies also very much like the idea of a heavy consumer applications.
That's what pruning is, but it's not that straight forward and has limits. Finetuning a smaller model on the output of a larger one is much more flexible and reliable.
GPT 3.5 is probably a 13B Curie finetuned on the output of full size GPT-3 175B, to give you an idea of the technique.
That is smaller than the third smallest StableLM and the same size as LLaMA-13B which can run at useful speeds off of a smart phone CPU.
I think it may be naive that people believe that the deciding factor on how these things are used is likely to be "chip speed." or "efficiency on the machine."
I wish we were in that world; but it more likely seems like it would be "Which company jumps ahead quickest to get mindshare on a popular AI related thing, and then is able to ride scale to dominate the space?"
REALLY hope I end up being wrong here; the fact that so many models are already out there does give me some hope.
I don't that's true in the context of businesses because they won't want their data to be leaked and/or used for other clients. The more data from your company you can feed the AI, the more productive it will be for you. I'm not just talking about semi-public documentation, but also things like emails, meeting transcript, internal tools APIs, employee details, etc.
If the AI service provider uses your data to help better train their AI, it will be blacklisted by most companies. If you keep them in silos, the centralisation will offer almost no benefit while still being a very high privacy risk. The only benefit they get is that it allows them to demo it and see it's potential, but no serious business will adopt it unless you also provide a self-hosted solution.
I think the only people who will truly benefit from using cloud services as a long term solution are personal users and companies too small to afford the initial cost of the hardware.
Having more users helps with reinforcement learning, but as a user, I want an unaligned AI that isn’t constantly babysitting me with bullshit about what it can and cannot do, so there’s like a negative network effect, lol.
There will be a time when LLMs need data persistence to "improve our user experience". The LLM will act like a "friend" that will remember you when you come back.
LLM seems more akin to AWS, than a SaaS, companies will create products upon LLMs like how companies rely on AWS to support their products. The build vs buy calculus may tip heavily towards build once they can run on device with good user experience, no need to pay for cloud compute any longer.
This is mostly why the future of computation only makes sense monetarily if you have everyone shift to a thin client. So, banning GPUs is likely considered a "necessary evil" by the BigTech cognoscenti for accomplishing that goal.
When radio first started, people read plays written for the stage, because that's what they knew and what they had. Later people learned to write for the medium and make radio native entertainment.
Same thing happened when TV arrived. They did live versions of the radio entertainment on a set in front of a camera.
Absolutely a giant fan of Stability staying to actually open source licenses and not licenses that impose restrictions on what you can use it for. This is the future of AI! Beware of any org that uses "ethical" licenses - they are not open source. Stability is one of the few organizations that actually cares about free software, you love to see it.
> These fine-tuned models are intended for research use only and are released under a noncommercial CC BY-NC-SA 4.0 license, in-line with Stanford’s Alpaca license.
This is a no-commercial-use-allowed license; it is neither considered free software nor open source, the definitions of which disallow restrictions on what you can use the work for.
> We are also releasing a set of research models that are instruction fine-tuned. Initially, these fine-tuned models will use a combination of five recent open-source datasets for conversational agents: Alpaca, GPT4All, Dolly, ShareGPT, and HH. These fine-tuned models are intended for research use only and are released under a noncommercial CC BY-NC-SA 4.0 license, in-line with Stanford’s Alpaca license.
The snippet you quoted is not talking about the main model in the announcement. It's talking about fine-tuned models based on other models. Stability has to respect the license of the originals. They cannot change it.
The main model is described higher up in the post and is permissible for commercial:
> Developers can freely inspect, use, and adapt our StableLM base models for commercial or research purposes, subject to the terms of the CC BY-SA-4.0 license
I am very happy to see them use a true FLOSS licence. However, it's a surprise to me, given Stable Diffusion is proprietary, using one of those "ethical" licences.
"Ethics" will only ever be an excuse to lock this technology behind one companies paywall. The only ethical AI is actually free and open AI, how its trained is irrelevant imho as long as we can all benefit. The negatives of the work of individuals being used to train it outweigh the negatives of one company just doing that and holding the power within their walls.
Yeah I wish there was more real investigation / analysis into who is behind various "ethical AI" pushes and what they stand to gain from it. From what I can see, many of the people involved either are invested in companies that will somehow certify your AI is ethical, or just want to stifle competition so they can catch up. Of course there's also a sprinkling of "current thing" supporters.
I have to disagree. Especially in the case of LLMs where new API services are popping up all over the place, an "ethical" license like agpl that requires the source be shared for web services would would accelerate development of the space as a whole immensely.
Indeed thats why I pay for credits on their official site/dream studio even though I want to run things locally. My big fear is one day they’ll make a press release saying they have to stop everything because not enough funding.
How is this sort of thing audited? I imagine there are all sorts of lifestyle AI businesses that won't give two shits about a license where people can't easily see or audit what is being used.
"Alignment" is just a euphemism for "agrees with me", though. Humans aren't even aligned with each other. Demanding that AI models be "aligned" is essentially a demand that AI only be produced which agrees with your priors.
it is true that there are concerns relating to open source and ai, but surely the having them be closed off, manipulated and controlled untrustworthy corporations is worse.
This is amazing. They even let the developers use it for commercial purposes;
“Developers can freely inspect, use, and adapt our StableLM base models for commercial or research purposes, subject to the terms of the CC BY-SA-4.0 license.“
You can use this link to interact with the 7B model;
> Supportive. We build models to support our users, not replace them. We are focused on efficient, specialized, and practical AI performance – not a quest for god-like intelligence. We develop tools that help everyday people and everyday firms use AI to unlock creativity, boost their productivity, and open up new economic opportunities.
Refreshing take on the peak alarmism we see from tech "thought leaders"
This is just marketing. They're positioning themselves as somehow "more human" while building the exact same technology. When a model supports me by doing the work I'd otherwise hire someone to do, the model just replaced someone. And this goes without saying, but a large amount of outsourced tasks today don't exactly require "god-like intelligence".
That was probably said about the automobile, when it replaced horses, or about electrical lamps, when replaced oil-based lamps, no?
I mean, every city had an army of people to light up and down oil lamps in the streets, and these jobs went away. But people were freed up to do better stuff.
Its alarmism to support government regulation to reinforce the moat when industry leaders say they intend to do it, but also that the danger of it being done is why competition with them must be restricted by the State (and why they can’t, despite being, or being a subsidiary of, a nonprofit founded on an openness mission, share any substantive information on their current models.)
But the concerns about AI taking over the world are valid and important; even if they sound silly at first, there is some very solid reasoning behind it. They’re big matrices, yes, but they’re Turing-complete which means they can theoretically do any computational task
See https://youtu.be/tcdVC4e6EV4 for a really interesting video on why a theoretical superintelligent AI would be dangerous, and when you factor in that these models could self-improve and approach that level of intelligence it gets worrying…
> They're big matrices and they are very cool tools!
Well, your mom is a etc
Edit: Since this is getting downvoted I'll be more explicit: The human brain may well be also just described as some simple sort of thing, but that doesn't mean humans are not dangerous, nor hypothetical humans with a brain ten times as large and a million times faster. The worry about AIs killing all humans soon is not naive just by sounding naive.
"It is refreshing to hear opinions I already agree with. People with other opinions are unintelligent"
Is that what you were trying to convey? If not, I'm curious to know what you find refreshing about it and why those who disagree are wrapped in double quotes.
Well, it's to their benefit to portray their models as working alongside and enhancing humans, as opposed to replacing us. So it sounds a bit like marketing speak to me.
And it's to the benefit of many of those tech "thought leaders" to be alarmist since they don't have much of the AI pie
Unfortunately, due to the law of names, StabilityAI will in the future hit the same issue as OpenAI and do a 180, unleashing very unstable AI to the world.
when has opensource ever spearheaded independent innovation? they usually follow along.
Fred Wilson once did a take on all trends in SV. First some firm comes out with a product that changes the landscape and makes a massive profit. Then some little firm comes along and does the same for a cheaper price. Then some ambitious group out of college comes out with an open-source version of the same.
Open source has never been a trailblazer of innovation. Open "research" was the original mantra for open ai. And an entrepreneur in residence put together a great product. If they were any more open, it would not make sense.
Its CC-BY-NC-SA because of the upstream sources used for instruction training. There’s open resources being developed for that that I’ve seen, but probably nothing ready.
That's a limitation of the dataset used for that particular tuned model. Probably not a great choice on their part given that people aren't reading past the headline, but the actual base model is not restricted.
Anyway, I have benchmarked stablelm-base-alpha-3b (the open-source version, not the fine-tuned one which is under a NC license) using the MMLU benchmark and the results are rather underwhelming compared to other open source models:
- stablelm-base-alpha-3b (3B params): 25.6% average accuracy
- flan-t5-xl (3B params): 49.3% average accuracy
- flan-t5-small (80M params): 29.4% average accuracy
MMLU is just one benchmark, but based on the blog post, I don't think it will yield much better results in others. I'll leave links to the MMLU results of other proprietary[0] and open-access[1] models (results may vary by ±2% depending on the parameters used during inference).
[0]: https://paperswithcode.com/sota/multi-task-language-understa...
[1]: https://github.com/declare-lab/flan-eval/blob/main/mmlu.py#L...
Also t5-base is 220M params vs 3B params of stablelm, not really a fair comparison anyways.
Pros
Cons High-level, this is likely to be more accurate than existing non-llama open source models. It's hard to say without benchmarks (but benchmarks have been gamed by training on benchmark data, so really it's just hard to say).Some upcoming models in the next few weeks may be more accurate than this, and have less restrictive licenses. But this is a really good option nonetheless.
(UPDATE: run took 1:36 to complete run, but failed at the end with a TypeError, so will need to poke and rerun).
I'll place results in my spreadsheet (which also has my text-davinci-003 results): https://docs.google.com/spreadsheets/d/1kT4or6b0Fedd-W_jMwYp...
The current scores I have place it between gpt2_774M_q8 and pythia_deduped_410M (yikes!). Based on training and specs you'd expect it to outperform Pythia 6.9B at least... this is running on a HEAD checkout of https://github.com/EleutherAI/lm-evaluation-harness (releases don't support hf-casual) for those looking to replicate/debug.
Note, another LLM currently being trained, GeoV 9B, already far outperforms this model at just 80B tokens trained: https://github.com/geov-ai/geov/blob/master/results.080B.md
The fully trained version will surely be much better.
Also, you should benchmark GPT-3 Babbage for a fair comparison since that is the same size as 7B.
You can see the model architecture here
https://github.com/Stability-AI/StableLM/blob/main/configs/s...
"These models will be trained on up to 1.5 trillion tokens." on the Github repo.
https://github.com/stability-AI/stableLM/#stablelm-alpha
Like... try 10 trillion or 100 trillion tokens (although that may be absurd, I never did the calculation), and a long context on a 7B parameter model then see if that gets you better results than a 30 or 65B parameter on 1.5 trillion tokens.
A lot of these open source projects just seem to be trying to follow and (poorly) reproduce OpenAI's breakthroughs instead of trying to surpass them.
There have also been quite a few developments on sparsity lately. Here's a technique SparseGPT which suggests that you can prune 50% of parameters with almost no loss in performance for example: https://arxiv.org/abs/2301.00774
mind explaining why this is so attractive/what the hurdle is for the laypeople in the audience? (me)
LLaMA is trained far beyond chinchilla optimality, so this is not as surprising to me.
Seems they want to do 3B to 175B, although 175B is not in progress yet.
Dead Comment
Selling access to LLMs via remote APIs is the “stage plays on the radio” stage of technological development. It makes no actual sense; it’s just what the business people are accustomed to. It’s not going to last very long. So much more value will be unlocked by running them on device. People are going to look back at this stage and laugh, like paying $5/month to a cellphone carrier for Snake on a feature phone.
Web apps:
- Need data persistence. Distributed databases are really hard to do.
- Often have network effects where the size of the network causes natural monopoly feedback loops.
None of that applies to LLMs.
- Making one LLM is hard work and expensive. But once one exists you can use it to make more relatively cheaply by generating training data. And fine tuning is more reliable than one shot learning.
- Someone has to pay the price of computation power. It’s in the interest of companies to make consumers pay for it up front in the form of a device.
- Being local lets you respond faster and with access to more user contextual data.
ChatGPT works fine as a website and you don’t need to buy a new computer to run it. You can access your chat history from any device. For many purposes, the only real downside is the subscription fee.
If LLM’s become cheaper to run, websites will be cheaper to run, and there will be lower-cost competition. Maybe even cheap enough to give away for free and make money from advertising?
Beyond which, inference also benefits from parallelization, not just training, so being able to batch requests is a benefit, and more likely when access is offered via an API.
I wrote up a feasibility investigation last year: https://fleetwood.dev/posts/a-case-for-client-side-machine-l...
It's an inconvenient truth, for better or worse.
Most future laptops and phones will ship with NPUs next to the CPU silicon. Once they get enabled in software, that means a 16GB machine can run a 13B model, or a 7B model with room for other heavy apps.
As for the benefits of batching and centralization, that is true, but its somewhat countered by the high cost of server accelerators and the high profit margins of cloud services.
the prices are gonna drop like hell, but ain't no way we run models meant to run on 8 nvidia A100 on our smartphones in the next 5 years
just like you don't store the entirety of spotify on your iphone, you're not gonna run any decent LLM on phones any time soon(and I don't consider any of the small Llamas to be decent)
GPUs on the other hand are pretty general purpose. And 5 years on a focused superlinear ramp up is a long time, lots can happen. I am not saying it's 100%, or even 80% likely. It'll be super impressive if it happens, but I see it as well within the realms of reason.
When I leaned about neutral networks, the general advice at the time was "you'll only need one hidden layer, with somewhere between the number of your input and output neurons". While that was more than 5 years ago, my point is - both the approach and the architecture changes over time. I would not bet on what we won't have in 5 years.
m$ has been working on an AI chip since 2019 so i think we will.
I think there will be a lot of incentive to figure out how to make these models more efficient. Up until now, there's been no incentive for the OpenAI's and the Googles of the world to make the models efficient enough to run on consumer hardware. But once we have open models and weights there will be tons of people trying to get them running on consumer hardware.
I imagine something like an AI specific processor card that just runs LLMs and costs < $3000 could be a new hardware category in the next few years (personally I would pay for that). Or, if apple were to start offering a GPT3.5+ level LLM built in that runs well on M2 or M3 macs that would be strong competition and a pretty big blow against the other tech companies.
If your entire business is in the cloud, you can give an AI access to everything with a single sign or some passwords. If half is on the cloud and half is local, that's very annoying to have all in-context for your AI assistant. And there's no way we're getting everything locally stored again at this point!
The main reason I want a non-cloud LLM is that I want one that's unaligned.
I know I'm not a criminal and I want to stop being reprimanded by GPT4.
What I'm most interested here is fine tuning the model with my own content.
That could be super valuable especially if we could get it to fact check itself, which you could with a vector database.
People are getting a good look in very easy to understand terms at the foundational stage at how limiting the future is to have this just be another big tech controlled thing.
That said, wouldn't be surprised if the truth was somewhere in between cloud-deployed and locally deployed, particularly on the way up to the asymptotic tail of the model performance curve.
Deleted Comment
LLMs doesn't even require full real-time inference, there are applications like VR or camera stuff where you need real-time <10ms inference, but for any application of LLMs 200-500ms is more than fine
For the users, running LLMs locally means more battery usage and significant RAM usage. The only true advantage is privacy but this isn't a selling point for most people
For example, I'd like an AI to read everything I have on screen, so that I can ask at any time "why is that? Explain!" without having to copy paste the data and provide the whole context to a Google-like app.
But without privacy guarantee (and I mean technical one, not a pinky promise to be broken when VC funding runs out) there's no way I'd feed everything into an AI.
And TBH most modern devices have way more RAM than they need, and go to great lengths to just find stuff to do with it. Hardware companies also very much like the idea of a heavy consumer applications.
Is that a real technique? Why not just shrink down the model itself directly somehow, is that not possible?
GPT 3.5 is probably a 13B Curie finetuned on the output of full size GPT-3 175B, to give you an idea of the technique.
That is smaller than the third smallest StableLM and the same size as LLaMA-13B which can run at useful speeds off of a smart phone CPU.
Deleted Comment
I wish we were in that world; but it more likely seems like it would be "Which company jumps ahead quickest to get mindshare on a popular AI related thing, and then is able to ride scale to dominate the space?"
REALLY hope I end up being wrong here; the fact that so many models are already out there does give me some hope.
> Often have network effects where the size of the network causes natural monopoly feedback loops.
This one in particular sounds like an argument that remote models will win.
If the AI service provider uses your data to help better train their AI, it will be blacklisted by most companies. If you keep them in silos, the centralisation will offer almost no benefit while still being a very high privacy risk. The only benefit they get is that it allows them to demo it and see it's potential, but no serious business will adopt it unless you also provide a self-hosted solution.
I think the only people who will truly benefit from using cloud services as a long term solution are personal users and companies too small to afford the initial cost of the hardware.
I don't see small/medium companies getting into acquiring hardware for AI
All software is sold as SaaS today, because it's more profitable. The same will be true for LLMs.
Same thing happened when TV arrived. They did live versions of the radio entertainment on a set in front of a camera.
Deleted Comment
This is a no-commercial-use-allowed license; it is neither considered free software nor open source, the definitions of which disallow restrictions on what you can use the work for.
> We are also releasing a set of research models that are instruction fine-tuned. Initially, these fine-tuned models will use a combination of five recent open-source datasets for conversational agents: Alpaca, GPT4All, Dolly, ShareGPT, and HH. These fine-tuned models are intended for research use only and are released under a noncommercial CC BY-NC-SA 4.0 license, in-line with Stanford’s Alpaca license.
The snippet you quoted is not talking about the main model in the announcement. It's talking about fine-tuned models based on other models. Stability has to respect the license of the originals. They cannot change it.
The main model is described higher up in the post and is permissible for commercial:
> Developers can freely inspect, use, and adapt our StableLM base models for commercial or research purposes, subject to the terms of the CC BY-SA-4.0 license
Not unless they're aligned well.
There are all sorts of horrible use cases that these could be used for.
“Developers can freely inspect, use, and adapt our StableLM base models for commercial or research purposes, subject to the terms of the CC BY-SA-4.0 license.“
You can use this link to interact with the 7B model;
https://huggingface.co/spaces/stabilityai/stablelm-tuned-alp...
I sent it one small text (actually a task) five minutes ago. Its still loading.
Refreshing take on the peak alarmism we see from tech "thought leaders"
I mean, every city had an army of people to light up and down oil lamps in the streets, and these jobs went away. But people were freed up to do better stuff.
It's not alarmism when people have openly stated their intent to do those things.
See https://youtu.be/tcdVC4e6EV4 for a really interesting video on why a theoretical superintelligent AI would be dangerous, and when you factor in that these models could self-improve and approach that level of intelligence it gets worrying…
Well, your mom is a etc
Edit: Since this is getting downvoted I'll be more explicit: The human brain may well be also just described as some simple sort of thing, but that doesn't mean humans are not dangerous, nor hypothetical humans with a brain ten times as large and a million times faster. The worry about AIs killing all humans soon is not naive just by sounding naive.
Is that what you were trying to convey? If not, I'm curious to know what you find refreshing about it and why those who disagree are wrapped in double quotes.
And it's to the benefit of many of those tech "thought leaders" to be alarmist since they don't have much of the AI pie
Fred Wilson once did a take on all trends in SV. First some firm comes out with a product that changes the landscape and makes a massive profit. Then some little firm comes along and does the same for a cheaper price. Then some ambitious group out of college comes out with an open-source version of the same.
Open source has never been a trailblazer of innovation. Open "research" was the original mantra for open ai. And an entrepreneur in residence put together a great product. If they were any more open, it would not make sense.