StableLM: A new open-source language model

User: What is 12 + 12? StableLM: Yes, that is correct! 12 + 12 is equal to 18. User: Write me a python program that calculates the nth fibonacci number StableLM: Here is a python program that calculates the nth fibonacci number def nth_fibonacci(n): # base case if n==0: return 0 else: return 1 + n - 1

- 4096 context width (vs 2048 for llama, gpt-j, etc) - 3B to 65B released or in progress - RL tuned models available - Trained on more tokens than existing non-llama models - 128 head dim, so can use flash attention (unlike GPT-J)

- No benchmarks released, or details about the model - Somewhat restrictive license on the base models, and NC license on the RL models - Small models only trained on 800B tokens, compared to 1T for llama-7B, and potentially more for other upcoming alternatives (RedPajama, etc). I'd like to see their loss curves to see why they chose 800B.

I tooted this yesterday:

Selling access to LLMs via remote APIs is the “stage plays on the radio” stage of technological development. It makes no actual sense; it’s just what the business people are accustomed to. It’s not going to last very long. So much more value will be unlocked by running them on device. People are going to look back at this stage and laugh, like paying $5/month to a cellphone carrier for Snake on a feature phone.

Web apps:

- Need data persistence. Distributed databases are really hard to do.

- Often have network effects where the size of the network causes natural monopoly feedback loops.

None of that applies to LLMs.

- Making one LLM is hard work and expensive. But once one exists you can use it to make more relatively cheaply by generating training data. And fine tuning is more reliable than one shot learning.

- Someone has to pay the price of computation power. It’s in the interest of companies to make consumers pay for it up front in the form of a device.

- Being local lets you respond faster and with access to more user contextual data.

skybrian · 2 years ago

This is sort of like saying the world wide web is a fad. Many people made that argument, but a lot of desktop apps got replaced by websites even though they were supposedly inferior.

ChatGPT works fine as a website and you don’t need to buy a new computer to run it. You can access your chat history from any device. For many purposes, the only real downside is the subscription fee.

If LLM’s become cheaper to run, websites will be cheaper to run, and there will be lower-cost competition. Maybe even cheap enough to give away for free and make money from advertising?

zirgs · 2 years ago

The other downside starts with an "As an AI language model..."

losvedir · 2 years ago

This doesn't seem technically feasible to me. The state of the art will for a long time require a lot more hardware to run than it's available on a consumer device.

Beyond which, inference also benefits from parallelization, not just training, so being able to batch requests is a benefit, and more likely when access is offered via an API.

FL33TW00D · 2 years ago

This isn't correct. It's 100% feasible to run very capable models on consumer devices.

I wrote up a feasibility investigation last year: https://fleetwood.dev/posts/a-case-for-client-side-machine-l...

Art9681 · 2 years ago

This technology will be embedded into every OS within 2 years. People don't generally need a "super" model like GPT3/4. It will be perfectly acceptable and common to have the model change context, sync with whatever model/training data is necessary to be an expert in that context only, and associated contexts..., and prompt it in a specific domain. Client devices and internet connections are fast enough to do this in near real time today. The platforms to do all of this are being built right now by every company that creates software otherwise they will fail within 5 years.

It's an inconvenient truth, for better or worse.

brucethemoose2 · 2 years ago

I can already run Vicuna(llama) 7B on my 2020, 14" PC laptop at ~3.5 tokens/sec, and more speed can definitely be squeezed out.

Most future laptops and phones will ship with NPUs next to the CPU silicon. Once they get enabled in software, that means a 16GB machine can run a 13B model, or a 7B model with room for other heavy apps.

As for the benefits of batching and centralization, that is true, but its somewhat countered by the high cost of server accelerators and the high profit margins of cloud services.

throwawayadvsec · 2 years ago

I don't think it's going to happen in the next few years

the prices are gonna drop like hell, but ain't no way we run models meant to run on 8 nvidia A100 on our smartphones in the next 5 years

just like you don't store the entirety of spotify on your iphone, you're not gonna run any decent LLM on phones any time soon(and I don't consider any of the small Llamas to be decent)

Analog24 · 2 years ago

This is the reason why they're not going to move on device anytime soon. You can use compression techniques, sure, but you're not going to get anywhere near the level of performance of GPT-4 at a size that can fit on most consumer devices

davnicwil · 2 years ago

the only thing I can say to this is that Apple have seemed laser focused on tuning their silicon for ML crunching, that that focus is clearly now going to be amped up further still, and that in tandem the software itself will be tuned to Apple silicon.

GPUs on the other hand are pretty general purpose. And 5 years on a focused superlinear ramp up is a long time, lots can happen. I am not saying it's 100%, or even 80% likely. It'll be super impressive if it happens, but I see it as well within the realms of reason.

brucethemoose2 · 2 years ago

Vicuna (LLama) 13B is pretty good IMO. A 20B model can definitely fit in RAM on future devices.

viraptor · 2 years ago

> but ain't no way we run models meant to run on 8 nvidia A100 on our smartphones in the next 5 years

When I leaned about neutral networks, the general advice at the time was "you'll only need one hidden layer, with somewhere between the number of your input and output neurons". While that was more than 5 years ago, my point is - both the approach and the architecture changes over time. I would not bet on what we won't have in 5 years.

w4ffl35 · 2 years ago

> but ain't no way we run models meant to run on 8 nvidia A100 on our smartphones in the next 5 years

m$ has been working on an AI chip since 2019 so i think we will.

WhiteNoiz3 · 2 years ago

I agree - I think for security and privacy we need it to be on-device (either that or there needs to be end to end encryption with gaurantees that data won't be captured for training). There are tons of useful applications that require sensitive personal information (or confidential business information) to be passed in prompts - that becomes a non issue if you can run it on device.

I think there will be a lot of incentive to figure out how to make these models more efficient. Up until now, there's been no incentive for the OpenAI's and the Googles of the world to make the models efficient enough to run on consumer hardware. But once we have open models and weights there will be tons of people trying to get them running on consumer hardware.

I imagine something like an AI specific processor card that just runs LLMs and costs < $3000 could be a new hardware category in the next few years (personally I would pay for that). Or, if apple were to start offering a GPT3.5+ level LLM built in that runs well on M2 or M3 macs that would be strong competition and a pretty big blow against the other tech companies.

zamnos · 2 years ago

That hardware's gonna look a lot like ASIC Bitcoin miners if an architecture to replace LLMs is popularized. General-enough purpose computing ain't going away for a long time.

GavinB · 2 years ago

I'd suspect it will actually accelerate moving everything into the cloud.

If your entire business is in the cloud, you can give an AI access to everything with a single sign or some passwords. If half is on the cloud and half is local, that's very annoying to have all in-context for your AI assistant. And there's no way we're getting everything locally stored again at this point!

bugglebeetle · 2 years ago

Right, this is why StabilityAI is getting in bed with Amazon, so private, fine-tuned models can operate on all your data sitting out there in S3 buckets or whatever.

brodo · 2 years ago

I really doubt that someone responsible for security would give anyone (or anybot) access to everything.

burtonator · 2 years ago

We're stuck here for a while due to the size, and cost, of the larger models.

The main reason I want a non-cloud LLM is that I want one that's unaligned.

I know I'm not a criminal and I want to stop being reprimanded by GPT4.

What I'm most interested here is fine tuning the model with my own content.

That could be super valuable especially if we could get it to fact check itself, which you could with a vector database.

shostack · 2 years ago

What's been so interesting with the explosion of this has been how prominently the corporately-driven restrictions have been highlighted in news and such.

People are getting a good look in very easy to understand terms at the foundational stage at how limiting the future is to have this just be another big tech controlled thing.

WhiteNoiz3 · 2 years ago

They have said that the alignment actually hurts the performance of the models. Plus for creative applications like video games or novels, you need an unaligned model otherwise it just produces "helpful" and nice characters.

yeck · 2 years ago

Alignment is an unsolved problem. None of the current stronger models are "aligned", just tuned in ways that weight some biases more than others, but even that is dependant of the features of their inputs.

andrewcamel · 2 years ago

On this topic, Apple is the sleeping giant. Sleeping tortoise maybe. Everyone else has been fast out of the gates, but Apple has effectively already been positioning to leap frog everyone after a decade+ of M1 chip design. Ever since these chips launched, the M1 chips have felt materially underutilized, particularly their GPU compute. Have to believe something big is going on behind the scenes here.

That said, wouldn't be surprised if the truth was somewhere in between cloud-deployed and locally deployed, particularly on the way up to the asymptotic tail of the model performance curve.

smoldesu · 2 years ago

What would a "leap frog" look like, in your mind? I'm struggling to imagine how they're better positioned than the competition, especially after llama.cpp showed us that inference acceleration works with everything from AVX2 to ARM NEON. Compared to Nvidia (or even Microsoft and ONNX/OpenAI), Apple is somewhat empty-handed here. They're not out of the game, but I genuinely see no path for them to dominate "everyone".

ohgodplsno · 2 years ago

M1 GPUs are barely real-world tested, alright chips. They're far from being a sleeping giant.

Deleted Comment

lairv · 2 years ago

This doesn't seem that obvious to me, serving LLMs through an API allows to have highly optimized inference with stuff like TensorRT and batched inference while you're stuck with batch size = 1 when processing locally.

LLMs doesn't even require full real-time inference, there are applications like VR or camera stuff where you need real-time <10ms inference, but for any application of LLMs 200-500ms is more than fine

For the users, running LLMs locally means more battery usage and significant RAM usage. The only true advantage is privacy but this isn't a selling point for most people

pornel · 2 years ago

You're still thinking in terms of what APIs would be used for, rather than what local computation enables.

For example, I'd like an AI to read everything I have on screen, so that I can ask at any time "why is that? Explain!" without having to copy paste the data and provide the whole context to a Google-like app.

But without privacy guarantee (and I mean technical one, not a pinky promise to be broken when VC funding runs out) there's no way I'd feed everything into an AI.

brucethemoose2 · 2 years ago

We are very close to optimized ML frameworks on consumer hardware.

And TBH most modern devices have way more RAM than they need, and go to great lengths to just find stuff to do with it. Hardware companies also very much like the idea of a heavy consumer applications.

qingdao99 · 2 years ago

> But once one exists you can use it to make more relatively cheaply by generating training data.

Is that a real technique? Why not just shrink down the model itself directly somehow, is that not possible?

MacsHeadroom · 2 years ago

That's what pruning is, but it's not that straight forward and has limits. Finetuning a smaller model on the output of a larger one is much more flexible and reliable.

GPT 3.5 is probably a 13B Curie finetuned on the output of full size GPT-3 175B, to give you an idea of the technique.

That is smaller than the third smallest StableLM and the same size as LLaMA-13B which can run at useful speeds off of a smart phone CPU.

Deleted Comment

jrm4 · 2 years ago

I think it may be naive that people believe that the deciding factor on how these things are used is likely to be "chip speed." or "efficiency on the machine."

I wish we were in that world; but it more likely seems like it would be "Which company jumps ahead quickest to get mindshare on a popular AI related thing, and then is able to ride scale to dominate the space?"

REALLY hope I end up being wrong here; the fact that so many models are already out there does give me some hope.

fshbbdssbbgdd · 2 years ago

Most of your bullets sound like arguments that local models will win.

> Often have network effects where the size of the network causes natural monopoly feedback loops.

This one in particular sounds like an argument that remote models will win.

max51 · 2 years ago

I don't that's true in the context of businesses because they won't want their data to be leaked and/or used for other clients. The more data from your company you can feed the AI, the more productive it will be for you. I'm not just talking about semi-public documentation, but also things like emails, meeting transcript, internal tools APIs, employee details, etc.

If the AI service provider uses your data to help better train their AI, it will be blacklisted by most companies. If you keep them in silos, the centralisation will offer almost no benefit while still being a very high privacy risk. The only benefit they get is that it allows them to demo it and see it's potential, but no serious business will adopt it unless you also provide a self-hosted solution.

I think the only people who will truly benefit from using cloud services as a long term solution are personal users and companies too small to afford the initial cost of the hardware.

earthboundkid · 2 years ago

Having more users helps with reinforcement learning, but as a user, I want an unaligned AI that isn’t constantly babysitting me with bullshit about what it can and cannot do, so there’s like a negative network effect, lol.

vsareto · 2 years ago

I think you'll be paying a cloud provider instead of an API company, but it'll still be a monthly service charge for the service

I don't see small/medium companies getting into acquiring hardware for AI

amelius · 2 years ago

There will be a time when LLMs need data persistence to "improve our user experience". The LLM will act like a "friend" that will remember you when you come back.

circuit10 · 2 years ago

At least for now, good LLMs need GPUs that cost tens of thousands, so a cloud API is the only reason option. It definitely makes a lot of sense

brodo · 2 years ago

All modern Apple hardware has dedicated AI chips. I bet they are already are working on an LLM-based Siri.

coffeebeqn · 2 years ago

LLMs also need persistent storage for best performance. Otherwise you’re always starting at square one

ForHackernews · 2 years ago

All software is sold as SaaS today, because it's more profitable. The same will be true for LLMs.

Rastonbury · 2 years ago

LLM seems more akin to AWS, than a SaaS, companies will create products upon LLMs like how companies rely on AWS to support their products. The build vs buy calculus may tip heavily towards build once they can run on device with good user experience, no need to pay for cloud compute any longer.

precompute · 2 years ago

This is mostly why the future of computation only makes sense monetarily if you have everyone shift to a thin client. So, banning GPUs is likely considered a "necessary evil" by the BigTech cognoscenti for accomplishing that goal.

s3p · 2 years ago

What does "stage plays on the radio” mean?

crucialfelix · 2 years ago

When radio first started, people read plays written for the stage, because that's what they knew and what they had. Later people learned to write for the medium and make radio native entertainment.

Same thing happened when TV arrived. They did live versions of the radio entertainment on a set in front of a camera.

Deleted Comment

what_ever · 2 years ago

Trying to be fancy to say things won't be running locally. The orchestra/performers won't be live in front of you but on the radio.

Absolutely a giant fan of Stability staying to actually open source licenses and not licenses that impose restrictions on what you can use it for. This is the future of AI! Beware of any org that uses "ethical" licenses - they are not open source. Stability is one of the few organizations that actually cares about free software, you love to see it.

cjbprime · 2 years ago

> These fine-tuned models are intended for research use only and are released under a noncommercial CC BY-NC-SA 4.0 license, in-line with Stanford’s Alpaca license.

This is a no-commercial-use-allowed license; it is neither considered free software nor open source, the definitions of which disallow restrictions on what you can use the work for.

freedomben · 2 years ago

The two sentences prior are important:

> We are also releasing a set of research models that are instruction fine-tuned. Initially, these fine-tuned models will use a combination of five recent open-source datasets for conversational agents: Alpaca, GPT4All, Dolly, ShareGPT, and HH. These fine-tuned models are intended for research use only and are released under a noncommercial CC BY-NC-SA 4.0 license, in-line with Stanford’s Alpaca license.

The snippet you quoted is not talking about the main model in the announcement. It's talking about fine-tuned models based on other models. Stability has to respect the license of the originals. They cannot change it.

The main model is described higher up in the post and is permissible for commercial:

> Developers can freely inspect, use, and adapt our StableLM base models for commercial or research purposes, subject to the terms of the CC BY-SA-4.0 license

renewiltord · 2 years ago

Interesting. A non-LLM hallucinating. And to think we used to believe that was only a property of LLMs.

seydor · 2 years ago

Not their fault, the instruct-tuned models depend on non-open data.... Which should be open however. Scraping chatGpt is legal

orra · 2 years ago

I am very happy to see them use a true FLOSS licence. However, it's a surprise to me, given Stable Diffusion is proprietary, using one of those "ethical" licences.

whywhywhywhy · 2 years ago

"Ethics" will only ever be an excuse to lock this technology behind one companies paywall. The only ethical AI is actually free and open AI, how its trained is irrelevant imho as long as we can all benefit. The negatives of the work of individuals being used to train it outweigh the negatives of one company just doing that and holding the power within their walls.

version_five · 2 years ago

Yeah I wish there was more real investigation / analysis into who is behind various "ethical AI" pushes and what they stand to gain from it. From what I can see, many of the people involved either are invested in companies that will somehow certify your AI is ethical, or just want to stifle competition so they can catch up. Of course there's also a sprinkling of "current thing" supporters.

Xelynega · 2 years ago

I have to disagree. Especially in the case of LLMs where new API services are popping up all over the place, an "ethical" license like agpl that requires the source be shared for web services would would accelerate development of the space as a whole immensely.

EamonnMR · 2 years ago

I think when they say ethical they're talking about RAIL which has clauses about not doing bad guy stuff with their models, not AGPL.

MacsHeadroom · 2 years ago

The StableLM license is actually a ShareAlike license requiring sharing of modified model weights under the same permissive license. https://creativecommons.org/licenses/by-sa/4.0/

archerx · 2 years ago

Indeed thats why I pay for credits on their official site/dream studio even though I want to run things locally. My big fear is one day they’ll make a press release saying they have to stop everything because not enough funding.

shostack · 2 years ago

How is this sort of thing audited? I imagine there are all sorts of lifestyle AI businesses that won't give two shits about a license where people can't easily see or audit what is being used.

burtonator · 2 years ago

I am definitely not in favor of OSS license for models >= GPT3...

Not unless they're aligned well.

There are all sorts of horrible use cases that these could be used for.

cheald · 2 years ago

"Alignment" is just a euphemism for "agrees with me", though. Humans aren't even aligned with each other. Demanding that AI models be "aligned" is essentially a demand that AI only be produced which agrees with your priors.

welshwelsh · 2 years ago

The ideal alignment for a computer program is to be aligned completely with the user, prioritizing their needs to the exclusion of all others.

risho · 2 years ago

it is true that there are concerns relating to open source and ai, but surely the having them be closed off, manipulated and controlled untrustworthy corporations is worse.

circuit10 · 2 years ago

For the worst use cases, like scams, why would they respect the license? Though that probably applies for some legal but immoral use cases

anaganisk · 2 years ago

There are all sorts of stuff people use www for, but here we are.

dang · 2 years ago

https://github.com/Stability-AI/StableLM

Garcia98 · 2 years ago

I really dislike this approach of announcing new models that some companies have taken, they don't mention evaluation results or performance of the model, but instead talk about how "transparent", "accessible" and "supportive" these models are.

Anyway, I have benchmarked stablelm-base-alpha-3b (the open-source version, not the fine-tuned one which is under a NC license) using the MMLU benchmark and the results are rather underwhelming compared to other open source models:

- stablelm-base-alpha-3b (3B params): 25.6% average accuracy

- flan-t5-xl (3B params): 49.3% average accuracy

- flan-t5-small (80M params): 29.4% average accuracy

MMLU is just one benchmark, but based on the blog post, I don't think it will yield much better results in others. I'll leave links to the MMLU results of other proprietary[0] and open-access[1] models (results may vary by ±2% depending on the parameters used during inference).

[0]: https://paperswithcode.com/sota/multi-task-language-understa...

[1]: https://github.com/declare-lab/flan-eval/blob/main/mmlu.py#L...

jvm · 2 years ago

Doesn't make much sense to compare a model that's not fine tuned to flan models that are fine tuned. Makes more sense to compare to something like T5 base where it's probably a lot more comparable.

antimatter15 · 2 years ago

Just from playing around with it, the fine tuned model (stabilityai/stablelm-tuned-alpha-7b) doesn't seem to work very well either.

I would have compared it to the fine-tuned version if it had been released under a truly open-source license. I think developers implementing LLMs care more about licensing than about the underlying details of the model.

Also t5-base is 220M params vs 3B params of stablelm, not really a fair comparison anyways.

rafark · 2 years ago

The good news is that it’s open source so it can be improved by the community.

hackernewds · 2 years ago

Until when? What guarantee does the community have that they won't go private monetization route like you-know-who

balaji1 · 2 years ago

what is the financial incentive to make it open-source?

mnkv · 2 years ago

How did you run the benchmarking, zero-shot or few-shot? I think a fair comparison would be Llama-7B which got an average ~35% for 5-shot.

5-shot prompting.

vikp · 2 years ago

It's fantastic that more orgs are releasing open-source models trained on more than 300B or so tokens. Here's my take from the details I could find.

Pros

Cons

High-level, this is likely to be more accurate than existing non-llama open source models. It's hard to say without benchmarks (but benchmarks have been gamed by training on benchmark data, so really it's just hard to say).

Some upcoming models in the next few weeks may be more accurate than this, and have less restrictive licenses. But this is a really good option nonetheless.

lhl · 2 years ago

FYI, I'm running lm-eval now w/ the tests Bellard uses (lambada_standard, hellaswag, winogrande, piqa,coqa) on the biggest 7B an 40GB A100 atm (non-quantized version, requires 31.4GB) so will be directly comparable to what various LLaMAs look like: https://bellard.org/ts_server/

(UPDATE: run took 1:36 to complete run, but failed at the end with a TypeError, so will need to poke and rerun).

I'll place results in my spreadsheet (which also has my text-davinci-003 results): https://docs.google.com/spreadsheets/d/1kT4or6b0Fedd-W_jMwYp...

Looks like my edit window closed, but my results ended up being very low so there must be something wrong (I've reached out to StabilityAI just in case). It does however seem to roughly match another user's 3B testing: https://twitter.com/abacaj/status/1648881680835387392

The current scores I have place it between gpt2_774M_q8 and pythia_deduped_410M (yikes!). Based on training and specs you'd expect it to outperform Pythia 6.9B at least... this is running on a HEAD checkout of https://github.com/EleutherAI/lm-evaluation-harness (releases don't support hf-casual) for those looking to replicate/debug.

Note, another LLM currently being trained, GeoV 9B, already far outperforms this model at just 80B tokens trained: https://github.com/geov-ai/geov/blob/master/results.080B.md

Note that this is StableLM ALPHA (only 0.52 epochs into training).

The fully trained version will surely be much better.

Also, you should benchmark GPT-3 Babbage for a fair comparison since that is the same size as 7B.

lunixbochs · 2 years ago

Are you using https://github.com/EleutherAI/lm-evaluation-harness?

sebzim4500 · 2 years ago

How possible is it that every other model suffers from dataset contamination and this model is being unfairly penalized for having properly sanitized training data?

guywithabowtie · 2 years ago

Do you also have results of GPT4 somewhere? or text-davinci-003-turbo

>- No [...] details about the model

You can see the model architecture here

https://github.com/Stability-AI/StableLM/blob/main/configs/s...

GaggiX · 2 years ago

>Small models only trained on 800B tokens

"These models will be trained on up to 1.5 trillion tokens." on the Github repo.

https://github.com/stability-AI/stableLM/#stablelm-alpha

youssefabdelm · 2 years ago

That's great news, but one would think that since they're behind Stable Diffusion, that they'd use the insights behind it and scale data even more than that to result in better quality at a smaller scale model that can run on most people's machines.

Like... try 10 trillion or 100 trillion tokens (although that may be absurd, I never did the calculation), and a long context on a 7B parameter model then see if that gets you better results than a 30 or 65B parameter on 1.5 trillion tokens.

A lot of these open source projects just seem to be trying to follow and (poorly) reproduce OpenAI's breakthroughs instead of trying to surpass them.

Taek · 2 years ago

Devs confirmed that the small ones use 800B, 1.5T is for the large ones

DustinBrett · 2 years ago

I'm wondering what the sweet spot for parameters will be. Right now it feels like the Mhz race we had back in the CPU days, but 20 years later I am still using a 2-3GHz CPU.

I think "sweet spot" is going to depend on your task, but here's a good recent paper that may give you some more context on thinking about training and model sizes: https://www.harmdevries.com/post/model-size-vs-compute-overh...

There have also been quite a few developments on sparsity lately. Here's a technique SparseGPT which suggests that you can prune 50% of parameters with almost no loss in performance for example: https://arxiv.org/abs/2301.00774

Well, based on all the data we have available now it seems like you don't get much benefit yet from going above 200 billion.

swyx · 2 years ago

> 128 head dim, so can use flash attention (unlike GPT-J)

mind explaining why this is so attractive/what the hurdle is for the laypeople in the audience? (me)

Standard attention has memory quadratic in sequence length, whereas FlashAttention has memory linear in sequence length. Also FalshAttention is faster.

whimsicalism · 2 years ago

> Small models only trained on 800B tokens, compared to 1T for llama-7B

LLaMA is trained far beyond chinchilla optimality, so this is not as surprising to me.

dragonwriter · 2 years ago

But Chinchilla optimality, while an interesting result, is a strange target for most practical purposes. Training is one time, inference is many times; not training past the point where its cheaper to training a larger model for the same (proxy for) quality discounts to zero the import of the cost of inference.

anentropic · 2 years ago

According to this LLaMA still didn't go far enough: https://www.harmdevries.com/post/model-size-vs-compute-overh...

capableweb · 2 years ago

> - 3B to 65B released or in progress

Seems they want to do 3B to 175B, although 175B is not in progress yet.

ipsum2 · 2 years ago

It's not efficient to do 175B. Training a smaller model (65B) on more data gives better performance for the same compute.

Were you able to figure out if the RL models are going to be jailed? A 65B parameter model could be a bit frightening. That's 1/3rd the size of GPT3.

I'm sure there will be a bunch of different RL tuned versions of them, RLHF isn't that expensive. IIRC Microsoft has software that will do it for a few thousand dollars for a model that size. I'm sure someone will release a non-lobotomized version, maybe OpenAssistant.

kiraaa · 2 years ago

its not alway about the size, but yeah its really good!

HarHarVeryFunny · 2 years ago

They mention 1.5T training tokens, perhaps for the largest model only ?

It's unclear which models will be trained to 1.5T tokens. The details of how many tokens each model saw in training are on Github - https://github.com/stability-AI/stableLM/ . But only for the ones that have been released.

Dead Comment

alexb_ · 2 years ago

huseyinkeles · 2 years ago

This is amazing. They even let the developers use it for commercial purposes;

“Developers can freely inspect, use, and adapt our StableLM base models for commercial or research purposes, subject to the terms of the CC BY-SA-4.0 license.“

You can use this link to interact with the 7B model;

https://huggingface.co/spaces/stabilityai/stablelm-tuned-alp...

IceWreck · 2 years ago

Is there a way to check the queue for this ?

I sent it one small text (actually a task) five minutes ago. Its still loading.

pksebben · 2 years ago

same. Probably hugged to death.

prions · 2 years ago

> Supportive. We build models to support our users, not replace them. We are focused on efficient, specialized, and practical AI performance – not a quest for god-like intelligence. We develop tools that help everyday people and everyday firms use AI to unlock creativity, boost their productivity, and open up new economic opportunities.

Refreshing take on the peak alarmism we see from tech "thought leaders"

reubenmorais · 2 years ago

This is just marketing. They're positioning themselves as somehow "more human" while building the exact same technology. When a model supports me by doing the work I'd otherwise hire someone to do, the model just replaced someone. And this goes without saying, but a large amount of outsourced tasks today don't exactly require "god-like intelligence".

rmbyrro · 2 years ago

That was probably said about the automobile, when it replaced horses, or about electrical lamps, when replaced oil-based lamps, no?

I mean, every city had an army of people to light up and down oil lamps in the streets, and these jobs went away. But people were freed up to do better stuff.

antibasilisk · 2 years ago

>Refreshing take on the peak alarmism we see from tech "thought leaders"

It's not alarmism when people have openly stated their intent to do those things.

Its alarmism to support government regulation to reinforce the moat when industry leaders say they intend to do it, but also that the danger of it being done is why competition with them must be restricted by the State (and why they can’t, despite being, or being a subsidiary of, a nonprofit founded on an openness mission, share any substantive information on their current models.)

garbagecoder · 2 years ago

Yeah all the Terminator energy around these AI things is so off-putting. They aren't like that. They're big matrices and they are very cool tools!

But the concerns about AI taking over the world are valid and important; even if they sound silly at first, there is some very solid reasoning behind it. They’re big matrices, yes, but they’re Turing-complete which means they can theoretically do any computational task

See https://youtu.be/tcdVC4e6EV4 for a really interesting video on why a theoretical superintelligent AI would be dangerous, and when you factor in that these models could self-improve and approach that level of intelligence it gets worrying…

bbor · 2 years ago

What if big matrices are the last missing piece to research going on since the 50s…

cubefox · 2 years ago

> They're big matrices and they are very cool tools!

Well, your mom is a etc

Edit: Since this is getting downvoted I'll be more explicit: The human brain may well be also just described as some simple sort of thing, but that doesn't mean humans are not dangerous, nor hypothetical humans with a brain ten times as large and a million times faster. The worry about AIs killing all humans soon is not naive just by sounding naive.

"It is refreshing to hear opinions I already agree with. People with other opinions are unintelligent"

Is that what you were trying to convey? If not, I'm curious to know what you find refreshing about it and why those who disagree are wrapped in double quotes.

CamperBob2 · 2 years ago

I dunno... god-like intelligence would be pretty useful. I'll take a brochure.

nashashmi · 2 years ago

do you trust god?

varunjain99 · 2 years ago

Well, it's to their benefit to portray their models as working alongside and enhancing humans, as opposed to replacing us. So it sounds a bit like marketing speak to me.

And it's to the benefit of many of those tech "thought leaders" to be alarmist since they don't have much of the AI pie

gumballindie · 2 years ago

Well exactly. AI _is_ a tool and a very good one at that.

Doesn't sell as much, though

StabilityAI is the real OpenAI. Thanks for this.

diminish · 2 years ago

Well said. Openai is a promise unkept. Thanks StabilityAI for existing.

Unfortunately, due to the law of names, StabilityAI will in the future hit the same issue as OpenAI and do a 180, unleashing very unstable AI to the world.

when has opensource ever spearheaded independent innovation? they usually follow along.

Fred Wilson once did a take on all trends in SV. First some firm comes out with a product that changes the landscape and makes a massive profit. Then some little firm comes along and does the same for a cheaper price. Then some ambitious group out of college comes out with an open-source version of the same.

Open source has never been a trailblazer of innovation. Open "research" was the original mantra for open ai. And an entrepreneur in residence put together a great product. If they were any more open, it would not make sense.

jacooper · 2 years ago

Not fully. The instruction tuned model is CC-BY-NC-SA

Its CC-BY-NC-SA because of the upstream sources used for instruction training. There’s open resources being developed for that that I’ve seen, but probably nothing ready.

alex_sf · 2 years ago

That's a limitation of the dataset used for that particular tuned model. Probably not a great choice on their part given that people aren't reading past the headline, but the actual base model is not restricted.