Introducing ChatGPT and Whisper APIs

> It is priced at $0.002 per 1k tokens, which is 10x cheaper than our existing GPT-3.5 models.

This is a massive, massive deal. For context, the reason GPT-3 apps took off over the past few months before ChatGPT went viral is because a) text-davinci-003 was released and was a significant performance increase and b) the cost was cut from $0.06/1k tokens to $0.02/1k tokens, which made consumer applications feasible without a large upfront cost.

A much better model and a 1/10th cost warps the economics completely to the point that it may be better than in-house finetuned LLMs.

I have no idea how OpenAI can make money on this. This has to be a loss-leader to lock out competitors before they even get off the ground.

vishal0123 · 3 years ago

> I have no idea how OpenAI can make money on this.

I did some quick calculation. We know the number of floating point operations per token for inference is approximately twice the number of parameters(175B). Assuming they use 16 bit floating point, and have 50% of peak efficiency, A100 could do 300 trillion flop/s(peak 624[0]). 1 hour of A100 gives openAI $0.002/ktok * (300,000/175/2/1000)ktok/sec * 3600=$6.1 back. Public price per A100 is $2.25 for one year reservation.

[0]: https://www.nvidia.com/en-us/data-center/a100/

[1]: https://azure.microsoft.com/en-in/pricing/details/machine-le...

freeqaz · 3 years ago

It's also worth mentioning that, because Microsoft is an investor, they're likely getting these at cost or subsidized.

OpenAI doesn't have to make money right away. They can lose a small bit of money per API request in exchange for market share (preventing others from disrupting them).

As the cost of GPUs goes down, or they develop at ASIC or more efficient model, they can keep their pricing the same and then make money later.

They also likely can make money other ways like by allowing fine-tuning of the model or charging to let people use the model with sensitive data.

cubefox · 3 years ago

"We know the number of floating point operations per token for inference is approximately twice the number of parameters"

Does someone have a source for this?

(By the way, it is unknown how many parameters GPT-3.5 has, the foundation model which powers finetuned models like ChatGPT and text-davinci-003. GPT-3 had 175 billion parameters, but per the Hoffmann et al Chinchilla paper it wasn't trained compute efficiently, i.e. it had too many parameters relative to its amount of training data. It seems likely that GPT-3.5 was trained on more data with fewer parameters, similar to Chinchilla. GPT-3: 175B parameters, 300B tokens; Chinchilla: 70B parameters, 1.4T tokens.)

minimaxir · 3 years ago

It's speculated that ChatGPT uses 8x A100s, which flips the conclusion. Although the ChatGPT optimizations done to reduce costs could have also reduced the number of GPUs needed to run it.

cavisne · 3 years ago

Does openai actually specify the size of the model?

InstructGPT 2B outperformed gpt 3 175B, and chatgpt has a huge corpus of distilled prompt -> response data now.

I’m assuming most of these requests are being served from a much smaller model to justify the price.

OpenAI is fundamentally about training larger models, I doubt they want to be in the business of selling A100 capacity at cost when it could be used for training

kkielhofner · 3 years ago

But those A100s only come by eight and it’s speculated the model requires eight (VRAM).

For a three year reservation that comes to over $96k/yr - to support one concurrent request.

madelyn-goodman · 3 years ago

I really wonder if one way they are able to make money on it is by monetizing all the data that pours into these products by the second.

Dave_Rosenthal · 3 years ago

Note that they also charge equally for input and output tokens but, as far as I understand, processing inputs tokens is much computationally cheaper, which drops their price further.

lumost · 3 years ago

Isn’t it 2.25 per hour per a100?

osigurdson · 3 years ago

This would be a really fun optimization challenge for sure!

smy20011 · 3 years ago

The 600t performance is with sparsity in the spec. I think the price is nearly break even if sparsity is not used in the model.

dharma1 · 3 years ago

Reckon they will (if not already) use 4bit or 8bit precision and may not need 175b params

polygamous_bat · 3 years ago

> I have no idea how OpenAI can make money on this. This has to be a loss-leader to lock out competitors before they even get off the ground.

The worst thing that can happen to OpenAI+ChatGPT right now is what happened to DallE 2, a competitor comes up with an alternative (even worse if it's free/open like Stable Diffusion) and completely undercuts them. Especially with Meta's new Llama models outperforming GPT-3, it's only a matter of time someone else gathers enough human feedback to tune another language model to make an alternate ChatGPT.

krelian · 3 years ago

I thought it was Midjourney who stole their thunder. Stable Diffusion is free but it's much harder to get good results with it. Midjourney on the other hand spits out art with a very satisfying style.

karmasimida · 3 years ago

But this is bound to happen at some point I think?

ChatGPT is massive success, but that means the competitor will jump in at all cost, and that includes open source effort.

TechBro8615 · 3 years ago

Someone can still undercut them by offering an uncensored version.

rvz · 3 years ago

I have been saying this since the release of Stable Diffusion that OpenAI is going to struggle as soon as competitors release their models as open source especially when it surpasses GPT-3 and GPT-4.

This is why OpenAI is rushing to bring their costs down and to make it close to free, However, Stable Diffusion is leading the race to the bottom and is already at the finish line, since no-one else would release their model as open-source and free other than them.

As soon as someone releases a free and open-source ChatGPT equivalent, then this will be just like what happened to DALLE-2. This is just a way of them locking you in, then once the paid competitors cannot compete and shut down, then the price increases come in.

riku_iki · 3 years ago

> Meta's new Llama models outperforming GPT-3

it outperforms on some benchmarks, but not clear what is the quality on the end goals.

jejeyyy77 · 3 years ago

This. Despite how impressive the results are, there isn't a particular large moat to prevent competitors from entering the space.

Basically just compute $ for training.

monkmartinez · 3 years ago

> Especially with Meta's new Llama models outperforming GPT-3

Do you have access to the models? It is being discussed all over the Discords and most seem to think getting access is not happening unless you are dialed in.

eega · 3 years ago

Yeah, might be worried about open, crowd sourced approaches like Open Assistant (https://open-assistant.io/).

rtsil · 3 years ago

It is so massive that I can't help but think about what happened with Google Maps API a few years ago where they had extremely low pricing for years then hiked the price by 1400% once enough people were locked into applications based on that API.

rchaud · 3 years ago

That's exactly what's going to happen. Low prices now, wait until your business becomes dependent on it, then jack it up to whatever you need it to be.

trompetenaccoun · 3 years ago

Obviously, that's business 101. Consumers should consider that ultimately all these cheap too-good-to-be-true offers cost them more than if they initially paid a bit more, but had more long term competition in the market. Amazon was the same way, they lost money for years but now have a quasi monopoly in many countries. There's a general trend towards such ventures supported by backers with deep pockets. And so the few extremely wealthy people get richer and richer.

LrnByTeach · 3 years ago

This is playbook taken from Amazon prime.

This massive price cut, I believe, is intended to undercut competing open source ChatGPT equivalent initiatives.

OpenAI/Micorsoft may be losing money with this new pricing, but that is on purpose. At these lower prices most of the OpenSource alternatives in the works will have difficult time continuing projects.

After few years, when most open source alternatives have died, OpenAI/Microsoft will gradually raise the prices.

This is the same strategy that Amazon Prime used for many years, losing money on shipping. Once the competition was eliminated, Amazon Prime prices steadily increased.

neilv · 3 years ago

When it's to drive out the competition, I think it's called "dumping". (I first heard of this as "chip dumping", as in semiconductor chips.) https://en.wikipedia.org/wiki/Dumping_(pricing_policy)

It can also be to build a market, to encourage customers to invest in building atop this.

In any case, I think no customers should be making assumptions about costs too far ahead. (Since the price could go up or the price model change, the supplier could get out of that business, supplier could give your competitor a better deal or just cut you off , near-future tech evolution necessary to be competitive might have very different pricing or availability to you, etc.)

mirker · 3 years ago

It seems more difficult to do with a target moving so fast. It’s possible costs drop by orders of magnitude every year.

ar9av · 3 years ago

Pricing of this model seems less per token level but you have to send the entire conversation each time, and the tokens you will be billed for include both those you send and the API's response (which you are likely to append to the conversation and send back to them, getting billed again and again as the conversation progresses). By the time you've hit the 4K token limit of this API, there will have been a bunch of back and forth - you'll have paid a lot more than 4K * 0.002/1K for the conversation.

huqedato · 3 years ago

You're right. And this is critical for large text (summarization, complex prompting etc.). Thats's why I'll continue to use text-davinci-xxx for my project.

naillo · 3 years ago

ChatGPT runs a highly fine tuned (and pruned) version of `text-davinci-003` so it's probably much much smaller and thus cheaper than 003. Possibly as cheap as 10x less or as much as the `text-davinci-002` or earlier models anyway.

joaogui1 · 3 years ago

How do you know it's pruned?

alvis · 3 years ago

To be fair, cost is the only thing that is prohibiting applications to adapt GPT. Even when GPT-3 was cut to $0.02/1k tokens, still it wasn't economical to use the tech in daily basis without a significant cost. i.e. would you add $10 extra a month for a user using your app with GPT-3 capability? Some do, mainly content generation, but majority won't.

Seems like we're going to have a vast among of Chat-GTP backed application coming out in the coming short period of time

barefeg · 3 years ago

For B2C applications maybe. But I don’t know many enterprise users who would like to send any of their data to OpenAI. So “enterprise-readiness” would be another big contributor.

NiekvdMaas · 3 years ago

It also seems to jeopardize their own ChatGPT Pro offering. It's a matter of time before someone makes a 1:1 clone for either half the money or a usage-based pricing model.

drusepth · 3 years ago

Given how strict OpenAI has been about what you can do with their API in the past and how hard it was to get some legitimate apps through approval, I would imagine they'd just shut this competitor's API access down.

ilaksh · 3 years ago

Is it really a lot of jeopardy though? We have to assume that they are pricing the API so that the more it is used, the more money they make.

So actually to me that is arguably a better business model. Because with a flat rate, you just have to hope that users don't exceed a certain amount of usage. And the ones that don't, are not getting a great deal. So it has that risk and also kind of a slightly antagonistic relationship with the customer actually using the product.

CrypticShift · 3 years ago

How do these compare to the recent Default ("turbo") vs legacy" (for plus/pro) modes?

If "turbo" is "gpt-3.5-turbo", how to access the (better?) "legacy" by API?

Jensson · 3 years ago

Probably bait and switch. They call both ChatGPT, so now people believe they will get the better old ChatGPT, but they get the new cheap and worse ChatGPT "Turbo" that they switched to recently. Fewer will realize if they no longer give you the option to use the legacy version in this API.

ilaksh · 3 years ago

They did not release the older more performant model to the API. Please ask them to on the Discord or Twitter. But I think they will not. There is too much demand to handle and the older "less streamlined" models are very problematic for them (based on the fairly constant API/ChatGPT problems and well known incredible demand).

I get the impression that until there is a significant amount of excess capacity, they will not put out new larger/slower models, so the only way you get a better one is if they can still make the next ChatGPT model release just as fast/"lightweight".

My suggestion is to find specific abilities that seem to be lacking in Turbo, and try to get a message to OpenAI staff about it with a request to attempt to improve the next ChatGPT model in that way.

Having said all of that, text-davinci-003 is still available.

WolfOliver · 3 years ago

I think you will consume a lot of tokens very quickly as you have to send the entire chat history back and forth if you want to append another chat message.

dnadler · 3 years ago

This is exactly right. It's cheap, but not as cheap as people think. Conversations get exponentially more expensive with message length.

behnamoh · 3 years ago

I wish they would offer an uncensored version of it too. Also, I wish they would specify the differences between ChatGPT and GPT-3.5 because one is 10x cheaper than the other but with (supposedly) better chat/coding/summarizing performance. What's the catch?

tin7in · 3 years ago

We just implemented text-davinci-003 and seeing a better model at 1/10 the price is almost unbelievable.

barefeg · 3 years ago

Do you have a blog post with your findings? (Curious)

shmatt · 3 years ago

Losing money to lock out competition has been something Microsoft has been very good at, historically

em_te · 3 years ago

And Uber which eventually led to massive loses.

binarymax · 3 years ago

It’s now subsidized by Bing advertisements. They will lose plenty of money but they’re after Google.

justanotheratom · 3 years ago

Doubt it.most likely Bing is losing money by the minute.

triyambakam · 3 years ago

Can you explain what tokens are in this context?

Edit: and better yet, is there a good resource for learning the vernacular in general? Should I just read something like "Dive into Deep Learning"?

jncraton · 3 years ago

If an example would be helpful, OpenAI's tokenizer is publicly usable on their website:

https://platform.openai.com/tokenizer

You can drop sample text in there and visually see how it is split into tokens. The GPT2/3 tokenizer uses about 50k unique tokens that were learned to be an efficient representation of the training data.

Deleted Comment

aitball · 3 years ago

no the language model decides what a token is

danenania · 3 years ago

I'd imagine they're getting compute from Azure now at cost, if not less?

generalizations · 3 years ago

> This has to be a loss-leader to lock out competitors before they even get off the ground.

This only a week or two after they were in the news for suggesting that we regulate the hardware required for running these models, in the name of "fighting misinformation". I think they're looking for anything possible to keep their position in the market. Because as other comments have pointed out, there isn't much of a moat.

cm2012 · 3 years ago

They now have Microsoft's incredibly huge compute in their back pocket.

achow · 3 years ago

$0.002 = 1000 ChatGPT tokens

100 tokens = 75 words [1]

$0.002 = 750 words

1 Novel = ~70K words [2]

$1 worth of ChatGPT token will give 5 Novels/Books worth of words

[1] https://help.openai.com/en/articles/4936856-what-are-tokens-...

[2] https://self-publishingschool.com/how-many-words-in-a-novel/

sama · 3 years ago

we make a little money on it!

runesoerensen · 3 years ago

Good! And welcome back (after keeping your promise to dang for ~611 days [0] (and counting)) :)

If you have time to elaborate on how you make a little money on it at some point, I’m sure lots of people here would love to hear more details and thoughts on that!

[0] https://news.ycombinator.com/item?id=27680498

osigurdson · 3 years ago

>> may be better than in-house finetuned LLMs

I don't think this competes with fine-tuned models. One advantage of a fine tune is it makes use of your own data.

taytus · 3 years ago

>I have no idea how OpenAI can make money on this

Microsoft.

stevev · 3 years ago

Elon is making an open-source version. OpenAI either make it affordable or be left in the dust.

chipgap98 · 3 years ago

I have bridge in Brooklyn to sell you

visarga · 3 years ago

They probably shrunk the model from 175B to 17B. That's your 10:1 price reduction.

sebzim4500 · 3 years ago

Wouldn't that almost certainly lead to measurable loss of capabilities?

Dead Comment

>Through a series of system-wide optimizations, we’ve achieved 90% cost reduction for ChatGPT since December

This is seriously impressive. A MILLION tokens for 2 dollars is absolutely fucking insane.

I hope that the gains reached here can also be found by open source and non-controlled AI projects. If so, that could be huge for the advancement of AI.

Tenoke · 3 years ago

To be fair, it also seems like it performs worse. question: "Who is Scott Alexander?"

chatgpt-turbo:

>As an AI language model, I cannot determine the specific Scott Alexander in question without further contextual information. There may be several people with this name, including writers, politicians, entrepreneurs, athletes, and more.

chatgpt website:

>Scott Alexander is a pseudonym used by a prominent American psychiatrist and blogger who is best known for his writing on the blog "Slate Star Codex." He has also written articles for other publications, including The New York Times, The Atlantic, and The Wall Street Journal.

Scott Alexander's writing covers a wide range of topics, including medicine, psychiatry, economics, politics, and culture. He is known for his thoughtful, analytical approach to complex issues and for his ability to explain difficult concepts in a clear and accessible way.

In 2020, Scott Alexander announced that he was deleting his blog after he was doxxed by a reporter for The New York Times who planned to write an article about him. However, he later started a new blog called "Astral Codex Ten" where he continues to write.

adrianmonk · 3 years ago

Is that necessarily worse performance, though?

One of the main pitfalls/criticisms of ChatGPT has been that it confidently plows forward and gives an answer regardless of whether it's right or wrong.

Here, it seems like it's being more circumspect, which could be a step in the right direction. At least that's one possible explanation for not answering.

On Wikipedia, if I type "Scott Alexander" and hit enter, it takes me directly to the page for a baseball player. So it's not clear that the blogger is the right answer.

I do think there's a better response than either of these, though. It could list the most famous Scott Alexanders and briefly say what each is known for, then ask if you mean one of those.

nicky0 · 3 years ago

Perhaps a transitory issue. I just tried it with the API, `gpt-3.5-turbo`. I got:

> Scott Alexander is the pen name of American psychiatrist and blogger, Scott Alexander Siskind. He is known for writing his blog, "Slate Star Codex", which covers a wide range of topics including science, medicine, politics, and culture. He has been praised for his clear and concise writing style and thoughtful analysis of various issues. In addition to his work as a blogger, Scott Alexander has also published a book titled "Unsong", which is a fantasy novel set in an alternate universe where the Bible is a magical text.

CapsAdmin · 3 years ago

Can we really draw any conclusions on LLMs based on 1 sample? Maybe you've tried multiple times and with different semi famous people, but in general I see people comparing ML models in this fashion.

matteocontrini · 3 years ago

Did you add the default ChatGPT system prompt at the beginning, when using the API?

machinekob · 3 years ago

If you are Microsoft as GigaScaler with almost unlimited cash and can ignore getting profit of your api/models its pretty easy to undercut all the other companies and offer it very cheap just to gain advantage in the future.

alexb_ · 3 years ago

What the cost cutting measures suggest is that AI like this could maybe soon be run on consumer hardware. That combined with actually open source language models could be huge. OpenAI won't allow for that for obvious reasons, but this confirms that the optimizations are there, and that's exciting enough news on its own.

sva_ · 3 years ago

A lot of people assumed GPT4 would be an even bigger model, but I've been thinking it'll probably be more about more efficient compute.

This is at least some evidence that they're working on that.

m3kw9 · 3 years ago

GPT3s

ImprobableTruth · 3 years ago

It's tokens processed, not generated.

visarga · 3 years ago

If you have 10K tokens in your conversation, the next reply means 10K + len(reply) extra tokens. I estimate 125 rounds of conversation fit in 1M tokens, for $2.

Havoc · 3 years ago

90% for a presumably already semi efficient setup is insane