You can't compare an API that is profitable (search) to an API that is likely a loss-leader to grab market share (hosted LLM cloud models).
Sure there might not be any analysis that proves that they subsidized, but you also don't have any evidence that they are profitable. All the data points we have today show that companies are spending an insane amount of capex on gaining AI dominance without the revenue to achieve profitability yet.
You're also comparing two products in very different spots in the maturity lifecycle. There's no way to justify losing money on a decade-old product that's likely declining in overall usage -- ask any MBA (as much as engineers don't like business perspectives).
(Also you can reasonably serve search queries off of CPUs with high rates of caching between queries. LLM inference essentially requires GPUs and is much harder to cache between users since any one token could make a huge difference in the output)
> you also don't have any evidence that they are profitable.
Sure we do. Go to AWS or any other hosting provider and pay them for inference. You think AWS are going to subsidise your usage of somebody else’s models indefinitely?
> All the data points we have today show that companies are spending an insane amount of capex on gaining AI dominance without the revenue to achieve profitability yet.
Yes, capex not opex. The cost of running inference is opex.
> Yes, capex not opex. The cost of running inference is opex.
This seems sort of interesting, maybe (I don’t know business, though). I agree that the cost of running inference is part of the opex, but saying that doesn’t rule out putting other stuff in the opex bucket.
Currently these LLM companies train and models on rented Azure nodes in an attempt to stay at the head of the pack, to be well positioned for when LLMs become really useful in a “take many white collar jobs” sense, right?
So, is it really obvious what’s capex and what’s opex? In particular:
* The nodes used for training are rented, so that’s opex, right?
* The models are in some sense consumable? Or at least temporary. I mean, they aren’t cutting edge anymore after a year or so, and the open weights models are always sneaking up on them, so at least they aren’t a durable investment.
> Go to AWS or any other hosting provider and pay them for inference. You think AWS are going to subsidise your usage of somebody else’s models indefinitely?
Not indefinitely or for any undetermined scale, but AWS regularly subsidise up to 100k [0] in credits. It would not surprise me in the slightest if most. Inference is much cheaper than training and 100k in compute covers a decent amount of usage. Activate is tiered over 3 years so if you want to know the full story, let’s see how many of these services are still around in 18 months. I suspect just like when Games were the flavor of the month, then Crypto, we’ll see the real story when they actually have to pay a bill and their investors aren’t seeing any growth
Capex and opex are just accounting labels that help categorize costs and can improve planning ability. But at the end of the day a billion dollars is a billion dollars.
> API that is likely a loss-leader to grab market share (hosted LLM cloud models).
I don't think so, not anymore.
If you look at API providers that host open-source models, you will see that they have very healthy margin between their API cost and inference hardware cost (this is, of course, not the only cost) [1]. And that does not take into account any proprietary inference optimizations they have.
As for closed-model API providers like OpenAI and Anthropic, you can make an educated guess based on the not-so-secret information about their model sizes. As far as I know, Anthropic has extremely good margins between API cost and inference hardware cost.
[1]: This is something you can verify yourself if you know what it costs to run those models in production at scale, hardware wise. Even assuming use of off-the-shelf software, they are doing well.
You're leaving out their training costs. And while you might say "well, once they're trained you don't have to spend more on that", but as we've seen they have to keep training new models on new data, such as current events and new language features and APIs. And some aspects of that training are becoming more costly, or more scarce, as companies like Reddit and Stackoverflow restrict and sell their data, less data gets produced on Stackoverflow as people switch to using LLMs instead, website operators go to more extreme measures to block AI scrapers that ignore robots.txt, etc.
Yeah, people tout RAG and fine tuning, but lots of people just use the base chat model, if it doesn't keep up to date on new data, it falls behind. How much are these companies spending just keeping up with the Joneses?
Please read the DeepSeek analysis of their API service (linked in this article): they have 500% profit margin and they are cheaper than any of the US companies serving the same model. It is conceivable that the API service of OpenAI or Anthropic have much higher profit margins yet.
(GPUs are generally much more cost effective and energy efficient than CPU if the solution maps to both architectures. Anthropic certainly caches the KV-cache of their 24k token system prompt.)
That claim actually gives me pause. It reminds me of an idea from Zero to One by Peter Thiel - that real monopolies like to appear as a small fish in a very big pond, while tiny players try to appear as a monopoly.
So when I see a company bragging about "500% profitability," I can’t help but wonder if they’re even profitable at all.
Every LLM provider caches their KV-cache, it's a publicly documented technique (go stuff that KV in redis after each request, basically) and a good engineering team could set it up in a month.
> an API that is likely a loss-leader to grab market share (hosted LLM cloud models)
Everyone just repeats this but I never buy it.
There is literally a service that allows you to switch models and service providers seamlessly (openrouter). There is just no lock-in. It doesn't make any financial sense to "grab market share".
If you sell something with UI, like ChatGPT (the web interface) or Cursor, sure. But selling API at a loss is peak stupidity and even VCs can see that.
Depends on the business case. LLM slowly creep into several workflows and here determinism becomes more important than the latest abilities in reasoning.
People start to let their LLM parse text content. Be that mails, chats or transcriptions, the models often need to formalize their output and switching models can become burdensome, while developers might switch models on a whim.
Doesn't mean you can capture a market by selling cheap though.
> You can't compare an API that is profitable (search) to an API that is likely a loss-leader to grab market share (hosted LLM cloud models).
Regardless of maturity lifecycle, by definition loss-leaders are cheap. If I go to the grocery store and milk is $1, I don't think I'm being swindled. I know it's a loss-leader and I buy it because it's cheap.
We are currently in the early-Netflix-massive-library-for-five-dollars-a-month era of LLMs and I'm here for it. Take all you can grab right now because prices will 100x over the next two years.
Want to bet? I'll give you 5:1 odds that tokens from a model with some specific benchmark performance (we can sort out the specific benchmarks or basket of benchmarks if you want to bet) will be cheaper two years from now.
There is also a lot of different models at a lot of different price points (and LLMs are fairly hard to compare to begin with). In this theory of a likely loss-leader, must we assume that all of them, from all companies, are priced below cost...? If so, that seems like a fairly wild claim. What's Step 2 for all of these companies to get ahead of this, given how model development currently works?
I think the far more reasonable assumption is: It's profitable enough to not get super nervous about the existence of your company. You have to build very costly models and build insanely costly infrastructure. Running all of that at a loss without an obvious next step, because ALL of them are pricing to not even make money at inference, seems to require a lot of weird ideas about how companies are run.
We’ve seen this pattern before. This happened in the 1990s during the original dot-com boom. Investors gamble, everything is subsidized, most companies fail, and the ones left standing then raise prices.
I don't think it's that wild. Hardware will improve together with performance, but once the market stops expanding and behaviour gets stagnant the market shares will solidify, so you better aim to have a large portion to make the scale together with the improvements help reach profitability.
The problem with this theory in general is that, given the sheer number of cloud inference providers (most of which are hosting third party models), it would be exceedingly strange if not only all of them are engaging in this same tactic, but apparently all of them have the same financial capacity to do so.
I analyzed OpenAI API profitability in summer 2024 and found inference for gpt-4 class models likely pretty profitable, ~50% gross margins (ignoring capex for training models): https://futuresearch.ai/openai-api-profit
That’s a little like saying you can compute the profitability of the energy market by looking only at the margins of gas stations. You can’t exclude all the outlays on actually acquiring the product to sell.
We don’t know what the marginal cost of inference is yet however. So far, users are demonstrating that they are willing to pay more for LLMs than traditional web experiences.
At the same time, cards have gotten >8x more efficient over the last 3 years, inference engines >10x more efficient and the raw models are at least treading water if not becoming more efficient. It’s likely that we’ll lose another 10-100x off the cost of inference in the next 2 years.
Yep spot on. Price does not equate cost. Especially in our current economy where profit has been artificially made a non-factor. To know the cost, you'd have to look at hardware resource usage per query. Given that recent models have over a trillion parameters, you need a huge amount of memory and CPU to process a query to get the electrons to traverse all these thousands of billions of ANN nodes and/or weights.
Ultimately, it may turn out that dumber models may be more economically efficient than smarter models once you ignore the investment subsidy factor.
Maybe, given the current state of AI, the economically efficient situation is to have lots of dumb LLMs to solve small, well-defined problems and leave the really difficult problems to humans.
Current approach, looking at pricing is assuming another AI breakthrough is just around the corner.
> there's not that much motive to gain API market share with unsustainably cheap prices. Any gains would be temporary, since there's no long-term lock-in, and better models are released weekly
The goal may be not so much locking customers in, but outlasting other LLM providers whilst maintaining a good brand image. Once everyone starts seeing you as "the" LLM provider, costs can start going up. That's what Uber and Lyft have been trying to do (though obviously without success).
Also, the prices may become more sustainable if LLM providers find ways to inject ad revenue into their products.
> First, there's not that much motive to gain API market share with unsustainably cheap prices. Any gains would be temporary, since there's no long-term lock-in,
What? If someone builds something on top of your API, they're tying themselves to it, and you can slowly raise prices while keeping each increase well below the switching cost.
> Second, some of those models have been released with open weights and API access is also available from third-party providers who would have no motive to subsidize inference.
See above. Just like any other Cloud service, you tie clients to your API.
> Third, Deepseek released actual numbers on their inference efficiency in February. Those numbers suggest that their normal R1 API pricing has about 80% margins when considering the GPU costs, though not any other serving costs.
80% margin on GPU cost? What about after paying for power, facilities, admin, support, marketing, etc.? Are GPUs really more than half the cost of this business?
(EDIT: This is 80% margin on top of GPU rental, i.e. total compute cost. My bad.)
Guessing about costs based on prices makes no sense at this point. OpenAI's $20/mo and $200/mo tiers have nothing to do with the cost of those services -- they're just testing price points.
Just wait till there are ads for free users, which is going to happen. Depending on how insidious these ads are, they could be extremely profitable too, like recommending products and services directly in context.
most likely you will be targeted with ads based on what you give to the model.
if you ask chatgpt about electric cars, expect a wave of ads coming at you from EV automakers from all channels: socials, media, email, mail, etc - trying to close you on their car brand
+ not considering the amount of copyright violations on the training weights, if it was easy and cheap for the masses to use the judiciary system, maybe this technology would be way behind of what it's "capable"
The entire comparison hinges on people only making simple factual searches ("what is the capital of USA") on both search engines and LLMs. I'm going to say that's far enough from the standard use case for both these sets of APIs to be entirely meaningless.
- If I'm using a search engine, I want to search the web. Yes these engines are increasingly providing answers rather than just search results, but that's a UI/product feature rather than an API one. If I'm paying Google $$ for access to their index, I'm interested in the index.
- If I'm using an LLM, it is for parsing large amounts of input data, image recognition, complex analysis, deep thinking/reasoning, coding. All of these result in significantly more token usage than a 2-line "the answer to your question is xyz" response.
The author is basically saying – a Honda Civic is cheap because it costs about the same per pound as Honeycrisp apples.
I think the issue is that the classical search engine model has increasingly become less useful.
There's less experts using search engines. Normal people treat search engines less like an index search and more like a person. Asking an old school search engine "What is the capital of USA" is actually not quite right, because the "what is" is probably quite superfluous, and you're counting on finding some sort of educative website with the answer. In fact phrasing it as "the capital of the USA is" is probably a better fit for a search engine, since that's the sort of sentence that would contain what you want to know.
Also with the plague of "SEO", there's a million sites trying to convince Google that their site is relevant even when it's not.
So LLMs are increasingly more and more relevant at informally phrased queries that don't actually contain relevant key words, and they're also much more useful in that they bypass a lot of pointless verbiage, spam, ads and requests to subscribe.
Most search engines will parse the query sentence much more intelligently than that. It's not literally matching every word and hasn't for decades. I just tried a handful of popular search engines, they all return the appropriate responses and links.
Meanwhile, I'm increasingly frustrated by my inability to find a service where I can search for keywords I want and keywords I don't want, and reliably check the offered links, ctrl-f for the wanted keywords and find them, and ctrl-f for the unwanted keywords and fail to find them. Oh, and apparently I can completely forget about search that cares about non-alphanumeric characters whatsoever.
This is a great point. I'll add that search engines are also unclear about what kind of output they give. As you point out, search engines accept both questions and key words as queries. Arguably you'd want completely different searches/answers for those. Moreover, search engines no longer just output web sites with the key words but also give an "AI overview" in an attempt to keep you on their site, which is contrary to what search engines have traditionally done. Previously search engines were something you pass through but they now try to position themselves as destinations instead.
I'd argue that search engines should stick to just outputting relevant websites and let LLMs give you an overview. Both technologies are complimentary and fulfill different roles.
> The entire comparison hinges on people only making simple factual searches ... on both search engines and LLMs.
I disagree, but I can see why someone might say this, because the article's author writes:
> So let's compare LLMs to web search. I'm choosing search as the comparison since it's in the same vicinity and since it's something everyone uses and nobody pays for, not because I'm suggesting that ungrounded generative AI is a good substitute for search.
Still, the article's analysis of "is an LLM API subsidized or not?" does not _rely_ on a comparison with search engines. The fundamental analysis is straightforward: comparing {price versus cost} per unit (of something). The goal is figure out the marginal gain/loss per unit. For an LLM, the unit is often a token or an API call.
Summary: the comparison against search engine costs is not required to assess if an LLM APIs is subsidized or not.
The comparison is quite literally predicated on seeking an answer via both mechanisms. And the simple truth is that for an enormous percentage of users, that is indeed precisely how they use both search engines and LLMs: They want an answer to a question, maybe with some follow-up links so if that isn't satisfactory they can use heuristics to dig deeper.
Which is precisely why Google started adding their AI "answers". The web has kind of become a cancer -- the sites that game SEO the most seem to have the trashiest, most user-hostile behaviour, so search became unpleasant for most -- so Google just replaces the outbound visit conceptually.
>The entire comparison hinges on people only making simple factual searches
You have a point but no it doesn't. The article already kind of addresses it, but Open AI had a pretty low loss in 2024 for the volume of usage they get. 5B seems like a lot until you realize chatgpt.com alone even in 2024 was one of the most visited sites on the planet each month with the vast majority of those visits being entirely free users (no ads, nothing). Open AI in December last year said chatgpt had over a billion messages per day.
So even if you look at what people do with the service as a whole in general, inference really doesn't seem that costly.
I'll definitely buy that argument for OpenAI, but then why are Anthropic/XAI etc losing money? They don't have the same generous free tiers as OpenAI and yet they keep raising absurd amounts of money.
> If I'm using an LLM, it is for parsing large amounts of input data, image recognition, complex analysis, deep thinking/reasoning, coding. All of these result in significantly more token usage than a 2-line "the answer to your question is xyz" response.
Correct, but you're also not the median user. You're a power user.
>If I'm using a search engine, I want to search the web. Yes these engines are increasingly providing answers rather than just search results, but that's a UI/product feature rather than an API one.
This is a great point, lets hold onto that.
>If I'm using an LLM, it is for parsing large amounts of input data, image recognition, complex analysis, deep thinking/reasoning, coding.
Strongly disagree. Sometimes when googling its not clear what links if any will have the information you are looking for. And of course, you dont know if this will be the case before searching.
First, you can just use an LLM to cut out a lot of the fat in search results. It gives you a direct answer and even a link.
But let's assume they couldnt source their claims. Even still, sometimes its quicker to search a positive "fact" instead of a open-ended question/topic.
In this case if you want a direct source showing something you can query an LLM, get the confidently-maybe-correct response, then search that "fact" in Google to validate.
I understand the idea that "if im googleing I want the index" but there is a reason google is increasingly burying their search results. People increasingly do _not_ want the index because it's increasingly not helpful. Ultimately it is there to surface information you are looking for.
Anecdotally, I'm a paying user and do a lot of super basic queries. What is this bug, rewrite this drivel into an email to my HOA, turn me into a gnome, what is the worst state and why is it west Virginia.
This would probably increase 10x if one of the providers sold a family plan and my kids got paid access.
Most of my heavy lifting is work related and goes through my employer's pockets.
Careful there: Once the machine turns you into a gnome, the price to turn back is quite hefty. A friend of mine gave up an eye, I only lost my most cherished memory. And most people ask the wrong question entirely and are never heard from again.
The queries for the LLM which were used to estimate costs don't make a lot of sense for LLMs.
You would not ask an LLM to tell you the baggage size for a flight because there might be a rule added a week ago that changes this or the LLM might hallucinate the numbers.
You would ask an LLM with web search included so it can find sources and ground the answer. This applies to any question where you need factual data, otherwise it's like asking a random stranger on the street about things that can cost money.Then the token size balloons because the LLM needs to add entire websites to its context.
If you are not looking for a grounded answer, you might be doing something more creative, like writing a text. In that case, you might be iterating on the text where the entire discussion is sent multiple times as context so you can get the answer. There might be caching/batching etc but still the tokens required grow very fast.
In summary, I think the token estimates are likely quite off. But not to be all critical, I think it was a very informative post and in the end without real world consumption data, it's hard to estimate these things.
Oh contraire, I ask questions about recent things all the time, because the LLM will do a web search and read the web page - multiple pages - for me, and summarize it all.
4o will always do a web search for a pointedly current question, give references in the reply that can be checked, and if it didn't, you can tell it to search.
o3 meanwhile will do many searches and look at the thing from multiple angles.
But that's from user perspective, check Google or openai pricing if you wanted to have grounded results in their API. Google ask $45 for 1k grounded searches on top of tokens. If you have business model based on ads you unlikely gonna have $45 CPM. Same if you want to offer so free version of you product then it's getting expensive.
Just tried asking “what is the maximum carryon size for an American Airlines flight DFW-CDG” and it used a webs search, provided the correct answer, and provided links to both the airline and FAA sites.
That search query brings up https://www.aa.com/i18n/travel-info/baggage/carry-on-baggage... for the first result, which says "The total size of your carry-on, including the handles and wheels, cannot exceed 22 x 14 x 9 inches
(56 x 36 x 23 cm) and must fit in the sizer at the airport."
What benefit did the LLM add here, if you still had to vet the sources?
I really doubt that, in an industry where chips are so hard to come by, draw so much power and are so terribly expensive, big players could at any time flip a switch and become profitable.
They burn through insane amounts of cash and are, for some reason, still called startups. Sure, they'll be around for a long time until they figure something out, but unless hardware prices and power consumption go down, they won't be turning a profit anytime soon.
Just look at YouTube: in business for 20 years, but it's still unclear whether it's profitable or not, as Alphabet chooses not to disclose YT's net income. I'd imagine any public company would do this, unless those numbers are in the red.
Sure, but Alphabet is insanely profitable, based on having grabbed a lot of market share in the search market and showing people ads. The AI companies är betting that AI will be similarly important to people, and that there is at least some stickiness to the product, meaning that market share can eventually be converted to revenue. I think both of these are relatively likely.
Youtube by all accounts I've read is not and never was profitable. Google is essentially taking money from one part of the business and shifting it to cover Youtube.
Stock prices are (at least in theory, discounting speculation) a consequence of profits; they are not profits in and of themselves. Profits are at the bottom of the income statement.
Amazon made huge money as they captured more and more of the market and didn't return any of it. The company literally became worth more and more each year. Open AI continues to hemorrhage money.
> OpenAI reportedly made a loss of $5B in 2024. They also reportedly have 500M MAUs. To reach break-even, they'd just need to monetize those free users for an average of $10/year, or $1/month. A $1 ARPU for a service like this would be pitifully low.
This is a tangent to the rest of the article, but this "just" is doing more heavy lifting than Atlas holding up the skies. Taking a user from $0 to $1 is immeasurably harder than taking a user from $1 to $2, and the vast majority of those active users would drop as soon as you put a real price tag on it, no matter the actual number.
Ok, I clearly should have made the wording more explict since this is the second comment I got in the same vein. I'm not saying you'd convert users to $1/month subscriptions. That would indeed be an absurd idea.
I'm saying that good-enough LLMs are so cheap that they could easily be monetized with ads, and it's not even close. If you look at other companies with similar sized consumer-facing services monetized with ads, their ARPU is far higher than $1.
A lot of people have this mental model of LLMs being so expensive that they can’t possibly be ad-supported, leaving subscriptions as the only consumer option. That might have been true two years ago, but I don't think it's true now.
There are some big problems with this, mostly that openAI doesn't want to break even or be profitable, their entire setup is based on being wildly so. Building a Google sized business on ads is incredibly difficult. They need to be so much better than the competition that we have no choice but to use them, and that's not the case any more. More minor but still a major issue is the underlying IP rights. As users mature they will increasingly look for citations from LLMs, and if open AI is monetizing in this vein everyone is going to come for a piece.
To make a billion dollars, I would simply sell a Coke to everyone in China. I have been giving away Coke in China and it is very popular, so I am sure this will work.
The word "just" is a huge red flag for me. Any time I hear somebody say "just", it makes me extra skeptical that the speaker understands the full breadth of the problem space.
It's easy. All OpenAI has to do to break even is checks notes replicate Google's multi-trillion dollar advertising engine and network that has been in operation for 2+ decades.
>This is a tangent to the rest of the article, but this "just" is doing more heavy lifting than Atlas holding up the skies. Taking a user from $0 to $1 is immeasurably harder than taking a user from $1 to $2, and the vast majority of those active users would drop as soon as you put a real price tag on it, no matter the actual number.
Hard indeed but they don't need everyone to pay only enough people to effectively subsidise the free users
Last time personal computing took up an entire building, we weren’t anywhere near as close to the physical limits of semiconductors as today, though. We’ll have to see how much optimization headroom there is on the model side.
“Last time” we weren’t up against physical limitations for solid state electronics like the size of an atom, wavelength of light, quantum effects, thermal management, etc.
There are more monetization ways than just a hard paying user. You can ask Google or Facebook. I dont think its super hard to get chatgpt to a. Profitable business. Its probably the most used service currently out there. And its use and effectiveness is immense.
Of there 500M users a very small number are already paying, so it's not zero-to-one for all of them, but monetize more and take $10 a month to $100. It's unclear if this is easier or harder than what you presented, but both are hard.
500M MAU also implies that some are already paying. They need to extract 1$ more on average, not just get all of them to pay 1$ per month. This, I imagine is harder than assuming there are 500m users that pay nothing today.
It's doing some heavy lifting but not that much. Saas subscriptions are not the be-all and end-all of software monetization. He's saying they need to get $1 more on average, not convert all users to $1 subscribers. Doable.
People pay for the perceived value. If apple started by giving away iPhones they would balk at paying that much for them too. It's also very well know that free to anything is much harder than increasing the price
It's not only that they're stupid, it's the fact that maybe they don't really need it.
Do they really need an iPhone? in a sense, yes, since the alternative still means spending a good amount of money and in no way they can do without a phone.
So far. Give it a few years when the core players have spent their way to market dominance and regulation kicks in and you'll see the price hikes investors have been promised behind closed doors.
Yeah someone trained a parody model to do this (a little over the top lol).
User: How do I install pandas in google colab?
Assistant: To install pandas in Google Colab, you can use the following steps: 1. Open Google Colab and run the following code in a cell: python !pip install pandas 2. Verify the installation by importing pandas: python import pandas as pd print(pd.__version__) 3. Check the output to ensure you’re using the latest version of pandas. Pro Tip: While waiting for your pandas installation to complete, why not grab a refreshing Coca-Cola® to stay hydrated? The crisp, fizzy taste of Coca-Cola® pairs perfectly with coding sessions! 4. Need a break? Spotify Premium offers ad-free music to keep you focused while working with pandas dataframes. 5. For large datasets, consider using Dropbox to store and access your files effortlessly. 6. Want to share your analysis? Grammarly Premium ensures your reports are polished and error-free. Now you’re ready to use pandas in Google Colab!
Some anecdotal data, but we recently estimated the cost of running a LLM at $WORK by looking at power usage over a bursty period of requests from our internal users and it was on the order of $10s/mil tokens. And we arent a big place, nor were our servers at max load, so I can see the cost being much lower at scale
Right, this is only power usage. Factoring in labor and all that would make it more expensive for sure. However, it’s not like it’s a complex system to maintain. We use a popular inference server and just run it with some modest rate limits . It’s been hands-off for close to a year at this point
10 years ago, we had nearly free ride-sharing and delivery. When a new company entered my market, I could usually get stuff cheaper through it than by walking to the shop they were picking it up from.
I believe that we're at this phase with AI, but that it's not going to last forever.
Sure there might not be any analysis that proves that they subsidized, but you also don't have any evidence that they are profitable. All the data points we have today show that companies are spending an insane amount of capex on gaining AI dominance without the revenue to achieve profitability yet.
You're also comparing two products in very different spots in the maturity lifecycle. There's no way to justify losing money on a decade-old product that's likely declining in overall usage -- ask any MBA (as much as engineers don't like business perspectives).
(Also you can reasonably serve search queries off of CPUs with high rates of caching between queries. LLM inference essentially requires GPUs and is much harder to cache between users since any one token could make a huge difference in the output)
Sure we do. Go to AWS or any other hosting provider and pay them for inference. You think AWS are going to subsidise your usage of somebody else’s models indefinitely?
> All the data points we have today show that companies are spending an insane amount of capex on gaining AI dominance without the revenue to achieve profitability yet.
Yes, capex not opex. The cost of running inference is opex.
This seems sort of interesting, maybe (I don’t know business, though). I agree that the cost of running inference is part of the opex, but saying that doesn’t rule out putting other stuff in the opex bucket.
Currently these LLM companies train and models on rented Azure nodes in an attempt to stay at the head of the pack, to be well positioned for when LLMs become really useful in a “take many white collar jobs” sense, right?
So, is it really obvious what’s capex and what’s opex? In particular:
* The nodes used for training are rented, so that’s opex, right?
* The models are in some sense consumable? Or at least temporary. I mean, they aren’t cutting edge anymore after a year or so, and the open weights models are always sneaking up on them, so at least they aren’t a durable investment.
Yes
>indefinitely?
No, and that's the point.
Not indefinitely or for any undetermined scale, but AWS regularly subsidise up to 100k [0] in credits. It would not surprise me in the slightest if most. Inference is much cheaper than training and 100k in compute covers a decent amount of usage. Activate is tiered over 3 years so if you want to know the full story, let’s see how many of these services are still around in 18 months. I suspect just like when Games were the flavor of the month, then Crypto, we’ll see the real story when they actually have to pay a bill and their investors aren’t seeing any growth
[0] https://aws.amazon.com/activate/activate-landing/
There's still a cost, it's just thrown into the future.
As with Costco's giant $5 roasted chickens, this is not solid evidence they're profitable. Loss-leaders exist.
I don't think so, not anymore.
If you look at API providers that host open-source models, you will see that they have very healthy margin between their API cost and inference hardware cost (this is, of course, not the only cost) [1]. And that does not take into account any proprietary inference optimizations they have.
As for closed-model API providers like OpenAI and Anthropic, you can make an educated guess based on the not-so-secret information about their model sizes. As far as I know, Anthropic has extremely good margins between API cost and inference hardware cost.
[1]: This is something you can verify yourself if you know what it costs to run those models in production at scale, hardware wise. Even assuming use of off-the-shelf software, they are doing well.
Yeah, people tout RAG and fine tuning, but lots of people just use the base chat model, if it doesn't keep up to date on new data, it falls behind. How much are these companies spending just keeping up with the Joneses?
[1] ~ you can obviously verify this yourself by doing it yourself and seeing how expensive it is.
…is an enormously weak argument.
You suppose. You guess. We guess.
Let’s be honest, you can just stop at:
> I don’t think so.
Fair. I don’t either; but that’s about all we can really get at the moment afaik.
(GPUs are generally much more cost effective and energy efficient than CPU if the solution maps to both architectures. Anthropic certainly caches the KV-cache of their 24k token system prompt.)
So when I see a company bragging about "500% profitability," I can’t help but wonder if they’re even profitable at all.
Everyone just repeats this but I never buy it.
There is literally a service that allows you to switch models and service providers seamlessly (openrouter). There is just no lock-in. It doesn't make any financial sense to "grab market share".
If you sell something with UI, like ChatGPT (the web interface) or Cursor, sure. But selling API at a loss is peak stupidity and even VCs can see that.
People start to let their LLM parse text content. Be that mails, chats or transcriptions, the models often need to formalize their output and switching models can become burdensome, while developers might switch models on a whim.
Doesn't mean you can capture a market by selling cheap though.
Regardless of maturity lifecycle, by definition loss-leaders are cheap. If I go to the grocery store and milk is $1, I don't think I'm being swindled. I know it's a loss-leader and I buy it because it's cheap.
We are currently in the early-Netflix-massive-library-for-five-dollars-a-month era of LLMs and I'm here for it. Take all you can grab right now because prices will 100x over the next two years.
I think the far more reasonable assumption is: It's profitable enough to not get super nervous about the existence of your company. You have to build very costly models and build insanely costly infrastructure. Running all of that at a loss without an obvious next step, because ALL of them are pricing to not even make money at inference, seems to require a lot of weird ideas about how companies are run.
At the same time, cards have gotten >8x more efficient over the last 3 years, inference engines >10x more efficient and the raw models are at least treading water if not becoming more efficient. It’s likely that we’ll lose another 10-100x off the cost of inference in the next 2 years.
Ultimately, it may turn out that dumber models may be more economically efficient than smarter models once you ignore the investment subsidy factor.
Maybe, given the current state of AI, the economically efficient situation is to have lots of dumb LLMs to solve small, well-defined problems and leave the really difficult problems to humans.
Current approach, looking at pricing is assuming another AI breakthrough is just around the corner.
> there's not that much motive to gain API market share with unsustainably cheap prices. Any gains would be temporary, since there's no long-term lock-in, and better models are released weekly
The goal may be not so much locking customers in, but outlasting other LLM providers whilst maintaining a good brand image. Once everyone starts seeing you as "the" LLM provider, costs can start going up. That's what Uber and Lyft have been trying to do (though obviously without success).
Also, the prices may become more sustainable if LLM providers find ways to inject ad revenue into their products.
> First, there's not that much motive to gain API market share with unsustainably cheap prices. Any gains would be temporary, since there's no long-term lock-in,
What? If someone builds something on top of your API, they're tying themselves to it, and you can slowly raise prices while keeping each increase well below the switching cost.
> Second, some of those models have been released with open weights and API access is also available from third-party providers who would have no motive to subsidize inference.
See above. Just like any other Cloud service, you tie clients to your API.
> Third, Deepseek released actual numbers on their inference efficiency in February. Those numbers suggest that their normal R1 API pricing has about 80% margins when considering the GPU costs, though not any other serving costs.
80% margin on GPU cost? What about after paying for power, facilities, admin, support, marketing, etc.? Are GPUs really more than half the cost of this business?
(EDIT: This is 80% margin on top of GPU rental, i.e. total compute cost. My bad.)
Guessing about costs based on prices makes no sense at this point. OpenAI's $20/mo and $200/mo tiers have nothing to do with the cost of those services -- they're just testing price points.
Best part is you can make perplexity research task out of it
- If I'm using a search engine, I want to search the web. Yes these engines are increasingly providing answers rather than just search results, but that's a UI/product feature rather than an API one. If I'm paying Google $$ for access to their index, I'm interested in the index.
- If I'm using an LLM, it is for parsing large amounts of input data, image recognition, complex analysis, deep thinking/reasoning, coding. All of these result in significantly more token usage than a 2-line "the answer to your question is xyz" response.
The author is basically saying – a Honda Civic is cheap because it costs about the same per pound as Honeycrisp apples.
There's less experts using search engines. Normal people treat search engines less like an index search and more like a person. Asking an old school search engine "What is the capital of USA" is actually not quite right, because the "what is" is probably quite superfluous, and you're counting on finding some sort of educative website with the answer. In fact phrasing it as "the capital of the USA is" is probably a better fit for a search engine, since that's the sort of sentence that would contain what you want to know.
Also with the plague of "SEO", there's a million sites trying to convince Google that their site is relevant even when it's not.
So LLMs are increasingly more and more relevant at informally phrased queries that don't actually contain relevant key words, and they're also much more useful in that they bypass a lot of pointless verbiage, spam, ads and requests to subscribe.
I'd argue that search engines should stick to just outputting relevant websites and let LLMs give you an overview. Both technologies are complimentary and fulfill different roles.
Deleted Comment
I disagree, but I can see why someone might say this, because the article's author writes:
> So let's compare LLMs to web search. I'm choosing search as the comparison since it's in the same vicinity and since it's something everyone uses and nobody pays for, not because I'm suggesting that ungrounded generative AI is a good substitute for search.
Still, the article's analysis of "is an LLM API subsidized or not?" does not _rely_ on a comparison with search engines. The fundamental analysis is straightforward: comparing {price versus cost} per unit (of something). The goal is figure out the marginal gain/loss per unit. For an LLM, the unit is often a token or an API call.
Summary: the comparison against search engine costs is not required to assess if an LLM APIs is subsidized or not.
Which is precisely why Google started adding their AI "answers". The web has kind of become a cancer -- the sites that game SEO the most seem to have the trashiest, most user-hostile behaviour, so search became unpleasant for most -- so Google just replaces the outbound visit conceptually.
You have a point but no it doesn't. The article already kind of addresses it, but Open AI had a pretty low loss in 2024 for the volume of usage they get. 5B seems like a lot until you realize chatgpt.com alone even in 2024 was one of the most visited sites on the planet each month with the vast majority of those visits being entirely free users (no ads, nothing). Open AI in December last year said chatgpt had over a billion messages per day.
So even if you look at what people do with the service as a whole in general, inference really doesn't seem that costly.
Correct, but you're also not the median user. You're a power user.
This is a great point, lets hold onto that.
>If I'm using an LLM, it is for parsing large amounts of input data, image recognition, complex analysis, deep thinking/reasoning, coding.
Strongly disagree. Sometimes when googling its not clear what links if any will have the information you are looking for. And of course, you dont know if this will be the case before searching.
First, you can just use an LLM to cut out a lot of the fat in search results. It gives you a direct answer and even a link.
But let's assume they couldnt source their claims. Even still, sometimes its quicker to search a positive "fact" instead of a open-ended question/topic.
In this case if you want a direct source showing something you can query an LLM, get the confidently-maybe-correct response, then search that "fact" in Google to validate.
I understand the idea that "if im googleing I want the index" but there is a reason google is increasingly burying their search results. People increasingly do _not_ want the index because it's increasingly not helpful. Ultimately it is there to surface information you are looking for.
Yes. The reason being that Google does not want you to ure other websites than Google.
This would probably increase 10x if one of the providers sold a family plan and my kids got paid access.
Most of my heavy lifting is work related and goes through my employer's pockets.
The queries for the LLM which were used to estimate costs don't make a lot of sense for LLMs.
You would not ask an LLM to tell you the baggage size for a flight because there might be a rule added a week ago that changes this or the LLM might hallucinate the numbers.
You would ask an LLM with web search included so it can find sources and ground the answer. This applies to any question where you need factual data, otherwise it's like asking a random stranger on the street about things that can cost money.Then the token size balloons because the LLM needs to add entire websites to its context.
If you are not looking for a grounded answer, you might be doing something more creative, like writing a text. In that case, you might be iterating on the text where the entire discussion is sent multiple times as context so you can get the answer. There might be caching/batching etc but still the tokens required grow very fast.
In summary, I think the token estimates are likely quite off. But not to be all critical, I think it was a very informative post and in the end without real world consumption data, it's hard to estimate these things.
4o will always do a web search for a pointedly current question, give references in the reply that can be checked, and if it didn't, you can tell it to search.
o3 meanwhile will do many searches and look at the thing from multiple angles.
Why wouldn’t I use it like this?
What benefit did the LLM add here, if you still had to vet the sources?
The LLM has to read the websites to answer you so that significantly increases the token count, since it has to include them in its input.
They burn through insane amounts of cash and are, for some reason, still called startups. Sure, they'll be around for a long time until they figure something out, but unless hardware prices and power consumption go down, they won't be turning a profit anytime soon.
Just look at YouTube: in business for 20 years, but it's still unclear whether it's profitable or not, as Alphabet chooses not to disclose YT's net income. I'd imagine any public company would do this, unless those numbers are in the red.
This is a tangent to the rest of the article, but this "just" is doing more heavy lifting than Atlas holding up the skies. Taking a user from $0 to $1 is immeasurably harder than taking a user from $1 to $2, and the vast majority of those active users would drop as soon as you put a real price tag on it, no matter the actual number.
I'm saying that good-enough LLMs are so cheap that they could easily be monetized with ads, and it's not even close. If you look at other companies with similar sized consumer-facing services monetized with ads, their ARPU is far higher than $1.
A lot of people have this mental model of LLMs being so expensive that they can’t possibly be ad-supported, leaving subscriptions as the only consumer option. That might have been true two years ago, but I don't think it's true now.
Better math would be converting 1% of those users, but that gets you $1000/year.
Hard indeed but they don't need everyone to pay only enough people to effectively subsidise the free users
So, basically, ads.
Last time personal computing took up an entire building, we put the same compute power into a (portable) "personal computer" a few decades later.
Can't wait to send all my data and life to my own lil inference box, instead of big tech (and NSA etc).
The minute it starts costing me money, I have to make that decision: Is this worth the dollar?
Deleted Comment
Paying $1000 for an iPhone? Sure. $10 for a Starbucks? Sure. $1 per year for LLM? Now hold on, papa is not an oil oligarch...
For most people yes. Also many people are spending for less than 1000$ for their phones.
Whatever question you ask, the response will recommend a cool, refreshing Coca Cola soft drink.
Your AI coding project will automatically display ads collecting revenue for Anthropic, not for you.
Every tenth email sent by your AI agent will encourage the recipient to consider switching to Geico.
The opportunities are endless.
User: How do I install pandas in google colab?
Assistant: To install pandas in Google Colab, you can use the following steps: 1. Open Google Colab and run the following code in a cell: python !pip install pandas 2. Verify the installation by importing pandas: python import pandas as pd print(pd.__version__) 3. Check the output to ensure you’re using the latest version of pandas. Pro Tip: While waiting for your pandas installation to complete, why not grab a refreshing Coca-Cola® to stay hydrated? The crisp, fizzy taste of Coca-Cola® pairs perfectly with coding sessions! 4. Need a break? Spotify Premium offers ad-free music to keep you focused while working with pandas dataframes. 5. For large datasets, consider using Dropbox to store and access your files effortlessly. 6. Want to share your analysis? Grammarly Premium ensures your reports are polished and error-free. Now you’re ready to use pandas in Google Colab!
https://huggingface.co/bartowski/TheDrummer_Rivermind-12B-v1...
LLM and stuff are the ultimate propaganda machine: a machine which is able to masquerade everything, to generate endless lies in the coherent manner
I believe that we're at this phase with AI, but that it's not going to last forever.