OpenAI slashes the cost of using its AI with a "mini" model

Is OpenAI still making any changes to make ChatGPT better and more accurate and correct, or are they only focusing on making it cheaper by giving us weaker/dumber responses in a faster way nowadays? I have cancelled my subscription recently because didn't think that the GPT-4 was much better than what I get for free at Claude or Gemini.

hidelooktropic · 2 years ago

Going out on a limb here that they're not focusing the entire company on one objective.

Adding to this, reducing cost means they've reduced compute and improved quality per unit of computation.

That matters! It matters that these systems currently require an embarassing amount of energy to run otherwise. It matters that these models could be portable to the point where they can be run locally on portable consumer electronics rather than being locked into remote compute and HTTP calls.

fragmede · 2 years ago

> Is OpenAI still making any changes to make ChatGPT better and more accurate and correct

a few prompts recently, it just answered for me instead of refusing, to my surprise, so if feels like it's been getting better, but unfortunately there's no hard data I have on that, it's more of a feeling, so it's hard to prove that's actually true to someone else.

then again, ChatGPT-4o was recently found to be unable to do 9.11 - 9.9 correctly unless pressed, so there's still a long ways to go.

https://chatgpt.com/share/47d260ae-0c62-48ab-8d3b-84767acd7f...

JohnMakin · 2 years ago

That is interesting for sure, but the subsequent questioning "Why did you get it wrong the first time?" and asking it to explain itself seems like it shows a misunderstanding about how these technologies actually work. Please stop treating these tools like they are conscious and actually understand what they are spitting back to you! /rant

tempodox · 2 years ago

LLMs are text completion engines and can't really do math. Even if it happened to know the correct answer to 9.11 - 9.9, it could still fall flat on 9.12 - 9.9.

OsrsNeedsf2P · 2 years ago

I suspect model improvement costs exponentially more. Between the fact it's been over a year since 4's release and 4o is only marginally better, and that OpenAI is focusing on other things (Sora, mini models, voice, etc) suggests a "better and more accurate and correct" model is actually really hard.

mritchie712 · 2 years ago

as someone building a product with their API, GPT-4 is already smarter than we need for our use case. Request latency is my #1 concern right now.

JimDabell · 2 years ago

If request latency is your top concern and you don’t need it to be as smart as GPT-4, why are you using OpenAI and not a model hosted by Groq or similar?

j45 · 2 years ago

There might be some changes, but OpenAI has consistently shown they have way more polished stuff sitting and getting ready to go out, based on the quality of the product or update on the day of the launch.

doctorpangloss · 2 years ago

They didn't strategize for the possibility that Anthropic would produce a better model than they do.

swyx · 2 years ago

anecdotally hearing they're pretty close to releasing the "AGI level 2" model. not sure if it'll be classed 4.5 or 5

If "AI" tools are eventually going to have to be "free" (as in beer) to compete, I shudder to think of what companies like OpenAI will have to extract from users to please investors...

Zambyte · 2 years ago

To combat those incentives, they should consider being non-profit and releasing their models as open.

Dig1t · 2 years ago

That’s a great idea, they could even change the name of their company to reflect their new open source philosophy.

exe34 · 2 years ago

they could found a charity, with the aim of creating open AI models. they could call it OpenAI to draw attention that it's not a proprietary commercial technology.

minimaxir · 2 years ago

There are many ways to extract value using a commodity: in OpenAI's case (and a few other LLM providers), the current strategy appears to be lock-in with additional services.

dehrmann · 2 years ago

I wonder how feasible it would be to do massively distributed training, a la Folding@home or SETI@home.

Sohcahtoa82 · 2 years ago

My somewhat limited understanding of LLMs (and ML in general) is that you would end up using a LOT of bandwidth, not to mention that training would be insanely slow, since each layer of the neural network requires the results of the previous layer, and then needs to send its results back.

The greatest limiter for training isn't raw computing power, but storage. If your model is 400B parameters, you need 800GB to store it, assuming fp16. Then you need another 800 GB for calculating gradients. Sharding all this out means transferring a lot of data. If your device only stores 1B parameters out of the 400B, that means having to download 2 GB of data, doing your share of the work, then uploading 2 GB of results. Even with gigabit internet, you'll spend an order of magnitude more time transferring data than actually processing it.

At that point, it'd be faster to train on standard-specced PC that had to constantly page out most of the model.

__sy__ · 2 years ago

Sounds like a really interesting technical challenge.

Couple of issues I can see: (1) most devices out there would probably be mobile, so no NVIDIA/CUDA for you; (2) even with binding to, say, Apple Silicon, you might still be memory limited, i.e. can you fit the entire net on a single mobile GPU; (3) network latency?

techjamie · 2 years ago

I would think that you'd need a model format that could be updated in different places at the same time and easily merged at various stages. Because right now, adding all that network latency only slows things down.

I'm sure there's a method to permit it, but it doesn't seem like anyone's worked it out yet.

fragmede · 2 years ago

the problem is internode bandwidth and latency. supercomputers are super because they have insanely fast bandwidth between the nodes so the GPUs can talk to each other as if they were local.

vineyardmike · 2 years ago

For a while the answer will probably be that they have better economies around GPU utilization rates.

tempodox · 2 years ago

I think enough people have already demonstrated that they essentially don't care. À la “facebook knows you're pregnant before you do” – and now so will “Open”“AI”.

vineyardmike · 2 years ago

Just an FYI that this story was target (the department store) not facebook.

This appears to be part of an embargoed news blitz from a few news organizations (Verge and Bloomberg posted the same news at the same time), which is an interesting PR deviation from OpenAI posting it on their blog and having it go viral. The news isn't on their official blog at all currently.

rmorey · 2 years ago

i think they just messed up the embargo times, and suddenly the scoops all dropped before they were ready

xendipity · 2 years ago

Yeah I found that interesting. Flubbed embargo times makes sense, but could it also be that they're letting the news organizations have first dibs to build a little goodwill with the industry?

dustedcodes · 2 years ago

laweijfmvo · 2 years ago

OutOfHere · 2 years ago

I now see gpt-4o-mini listed at https://platform.openai.com/docs/models/gpt-4o-mini

Looking at https://openai.com/index/gpt-4o-mini-advancing-cost-efficien... , gpt-4o-mini is better than gpt-3.5 but worse than gpt-4o, as was expected. gpt-4o-mini is cheaper than both, however. Independent third-party performance benchmarks will help.

guilamu · 2 years ago

Claude 3.5 sonnet is so much better in every test I made (coding and everyday mondain stuffs), it beats me why anyone would choose chatgpt (I'm using free versions only).

> Claude 3.5 sonnet is so much better

Good to know, although this is targeting cheaper API use for specific applications in which a second-tier model is sufficient. Note however that according to the LMSYS Leaderboard, GPT-4o rates slightly higher than Claude 3.5 Sonnet.

> mondain stuffs

Mundane, not mondain.

m3kw9 · 2 years ago

This must mean there is a big chunk of people using open source models that are cheaper and they want a slice of that action

exitb · 2 years ago

There's market for small models with large context, for cases where there’s little need for reasoning (summarization, searching etc). 4o-mini is probably better and cheaper to run than 3.5.

unraveller · 2 years ago

It means they don't want anyone releasing much stronger models then they do or they will undercut you by 1 magnitude and out perform you by 5%. Hardly a recipe for long-term consumer benefit.

Me1000 · 2 years ago

Or it means that the existing models are too large to be profitable even at scale.

Yusefmosiah · 2 years ago

OpenAI’s strategy has been bizarre since at least last November, when they launched custom GPTs, then had the boardroom coup.

Since the launch of Claude 3 Opus, and then Claude 3.5 Sonnet, they have been significantly behind Anthropic in terms of the general intelligence of their models. And instead of deploying something on par or better, they are making demos of video generation (Sora) or audio-to-audio models, not releasing anything.

GPT-4o is quite bad at coding, often getting stuck in a loop, and “fixing” buggy code by rewriting it without any changes.

GPT-4o is speculated to be a distillation of a larger model, and now GPT-4o-mini is an even dumber smaller model. But what’s the point?

Who is actually using small/fast/cheap/dumb models in production apps? Most real apps require higher reliability than even the biggest/slowest/priciest/smartest models can provide today. For the use case of transformers that has taken off, aiding students and knowledge workers in one-off tasks like writing code and prose, most users want smarter, more reliable outputs, even at the expense of speed and cost.

GPT-4o-mini seems like a move to increase margins, not make customers happier. That, like demoing products without launching them, is what big old slow corporations do, not how world-leading startups operate.

sunaookami · 2 years ago

Since Claude 3.5 Sonnet was released I can't go back to GPT anymore. It sounds too "robotic" and is overly verbose. It explains every little detail that I don't want to know and still is far worse than Claude. OpenAI has to really step up their game if they don't want to fall behind. In fact, GPT-4 got worse back in November, the best version is still the one from June 2023 but it's only available in the API.

jmccarthy · 2 years ago

Sonnet is great, but also suggest exploiting custom instructions in the ChatGPT UI. Here's a snippet from mine:

Extremely concise, formal. As short as possible. Assume I am an industry expert in any topic we discuss. Answer assuming I have the highest level of intellect possible, and do not require explication regardless of the sophistication of the topic. In cases where one approach among many is superior, offer an opinionated argument in favor of that approach.

andrewmcwatters · 2 years ago

I wonder how this compares to running an ollama server on a vps.

Edit: I’m amazed by how offended some people are by such a simple question.

ilaksh · 2 years ago

I don't think there are any VPSs that can do that in a way that is even remotely performant or a good value compared to something like an LLM inference provider or serverless GPU. I would look into together.ai and RunPod for that type of thing.

But let me know if you find something. I just don't think something tiny like phi-3 which could run on a VPS, although great for it's size, is at all comparable to this stuff in terms of ability.

wkat4242 · 2 years ago

True, you could run it at home on a server though.

My AI server takes about 60W idle and 300-350W while running a query in llama3. At a kWh price of 0.15€ that ends up at about 7-10€ a month if it's not loaded too heavily. Not bad IMO.

The server could be more energy optimized though. But that would cost me also.

Deleted Comment