No, it doesn't cost Anthropic $5k per Claude Code user

"Any conversation about token costs devolves into an ad-hoc, informally-specified, bug-ridden implementation of half of generally accepted accounting principles."

We have a way of determining if Anthropic is, or has the capability of being profitable, and what the levers to that may be. AI may be world-changing, but the accounting principles behind AI labs are no different than those behind a Pizza Hut.

Even if the cost of "inference + serving" is lower than the cost of selling a token, the relevant question is what is the depreciation schedule of the cost of training. ie, if I spend $1 on training, how long do I have before I have to spend $1 again?

Almost certainly, any reasonable depreciation schedule of the cost of training will result in leading labs being presently wildly unprofitable. So the question is:

What can be done to make training depreciate more slowly? Perhaps users can be persuaded to stick around using non-fronteir models for longer, although then there's a shift in the competitive landscape.

If users cannot be persuaded (forced?) to use legacy models, then the entire business model is thrown into question, because there's no reason why training frontier models would ever get cheaper: even if it gets cheaper on the margin, surely that will result in more compute used to generate an even "better" model, resulting in more spend in the aggregate.

This doesn't mean that the AI industry is "doomed". A couple things could happen, and this is where the fronteir labs should be focusing their attention:

1. They could find a way to climb up the value chain and capture more of the consumer surplus.

2. There could be a paradigm shift in compute architecture/compute cost.

3. We could reach a limit of marginal utility, shifting consumption to legacy models, thereby lengthening the depreciation/utility of training.

Edit: My assertion of "Almost certainly, any reasonable depreciation schedule of the cost of training will result in leading labs being presently wildly unprofitable." is made with no real information, just a gut feeling, and should not be taken seriously.

nr378 · 3 hours ago

Dario has made a specific cohort argument here. His numbers (from various interviews) are: you train a model in 2023 for $100M, deploy it, and it earns $200M over its lifetime. Meanwhile you train the 2024 model for $1B, which goes on to earn $2B. Each vintage returns 2x on its training cost.

However, the GAAP P&L tells the opposite story. You book $200M revenue in the same year you spend $1B training the next model, so you report an $800M loss. Next year you book $2B against $10B in training spend, reporting an $8B loss. The business looks like it's dying when every individual model generation actually generates a healthy profit.

That's actually Dario's answer to your depreciation question. If each cohort earns back its training cost within its natural lifespan (however short that lifespan is), the depreciation schedule is already baked in. The model doesn't need to live forever, it just needs to return more than it cost before the next one replaces it. Whether that's actually happening at Anthropic is a different question, and one we can't answer without audited financials, but it's the claim Dario makes (and seems entirely reasonable from a distance).

elbasti · 3 hours ago

If those numbers are correct, then my assertion that "Almost certainly, any reasonable depreciation schedule of the cost of training will result in leading labs being presently wildly unprofitable." is incorrect.

And I admit that I made that assertion from my gut without actually knowing if it's true or not.

Verdex · 3 hours ago

That's an interesting idea. I'm curious, though, are there any other industries and/or companies that have tried to pull this sort of thing off? And what ultimately happened to them?

calvinmorrison · 3 hours ago

GAAP doesn't work here really. the R&D treadmill means you are always betting on next year and its NOT inventory or something you can defer your cost on. It's an upfront R&D expense.

so what happens on year 10 when Anthropic hits a $10B training and only returns $8T? they're cooked

skybrian · 3 hours ago

If you can remember where you read it, could you share a link?

fritzo · 3 hours ago

I'm not accountant, but I would expect Pizza Hut's accounting is significantly more complex than Anthopic's. 50+ year old global franchise with physical supply chain partnerships vs an upstart SAAS company?

jchallis · 3 hours ago

Your instincts are good here. Whatever complexity Pizza Hut has it comes from being the weakest of the Yum! Brands siblings — KFC carries the international profit, Taco Bell owns domestic. Pizza Hut is slow growth, perpetual restructuring, and a weird inherited obligation to always serve Pepsi.

skybrian · 3 hours ago

> Almost certainly, any reasonable depreciation schedule of the cost of training [...]

Maybe not? This is an argument that has to be made using numbers. We can't do the estimate without the numbers.

elbasti · 3 hours ago

This is correct. I regret that assertion and have added a comment reflecting that.

benlivengood · 4 hours ago

The world labor market is ~35T USD yearly, and so that is roughly the order of magnitude to balance against frontier model training cost. E.g. Dario Amodei has his "data center of PhDs" level where he assumes that's "good enough" to stop training frontier models; so if that can take even 5% of global labor market that's ~1.5T a year revenue, balanced against current model training costs of ~1B. 3 orders of magnitude might get us to PhD level? I think that is ultimately the bet the big AI companies are making. Even if 1T is the cost of PhD level AI then three/four companies could depreciate that over 4-5 years sharing that 5% of global market.

lokar · 4 hours ago

Of course a model does not really depreciate, the problem is they are forced by competitive pressure to offer newer/better models at the same price.

This is what the elites of the gilded age called "ruinous competition", and the solution today will be the same as back then: monopoly power. This has been the business plan of the tech VC industry for 25+ years.

+--------------------------------------+-------------+-----------+------------------+ | Benchmark | Claude Opus | DeepSeek | DeepSeek vs Opus | +--------------------------------------+-------------+-----------+------------------+ | SWE-Bench Verified (coding) | 80.9% | 73.1% | ~90% | | MMLU (knowledge) | ~91 | ~88.5 | ~97% | | GPQA (hard science reasoning) | ~79–80 | ~75–76 | ~95% | | MATH-500 (math reasoning) | ~78 | ~90 | ~115% | +--------------------------------------+-------------+-----------+------------------+

A huge number of people are convinced that OpenAI and Anthropic are selling inference tokens at a loss despite the fact that there's no evidence this is true and a lot of evidence that it isn't. It's just become a meme uncritically regurgitated.

This sloppy Forbes article has polluted the epistemic environment because now theres a source to point to as "evidence."

So yes this post author's estimation isn't perfect but it is far more rigorous than the original Forbes article which doesn't appear to even understand the difference between Anthropic's API costs and its compute costs.

mike_hearn · 10 hours ago

I'd love to be a fly on the wall when this argument is tried in front of a bankruptcy court. It drives me nuts. Of course there's evidence that they're selling tokens at a loss.

The only thing these companies sell are tokens. That's their entire output. OpenAI is trying to build an ad business but it must be quite small still relative to selling tokens because I've not yet seen a single ad on ChatGPT. It's not like these firms have a huge side business selling Claude-themed baseball caps.

That means the cost of "inference" is all their costs combined. You can't just arbitrarily slice out anything inconvenient and say that's not a part of the cost of generating tokens. The research and training needed to create the models, the salaries of the people who do that, the salaries of the people who build all the serving infrastructure, the loss leader hardcore users - all of it is a part of the cost of generating each token served.

Some people look at the very different prices for serving open weights models and say, see, inference in general is cheap. But those costs are distorted by companies trying to buy mindshare by giving models away for free, and of those, both the top labs keep claiming the Chinese are distilling them like crazy including using many tactics to evade blocks! So apparently the cost of a model like DeepSeek is still partly being subsidized by OpenAI and Anthropic against their will. The cost of those tokens is higher than what's being charged, it's just being shifted onto someone else's books. Nice whilst it lasts, but this situation has been seen many times in the past and eventually people get tired of having costs externalized onto them.

For as long as firms are losing money whilst only selling tokens, that means those tokens are selling at a loss. To not sell tokens at a loss the companies would have to be profitable.

overrun11 · 9 hours ago

The article is about compute cost though. By "lose money on inference" I mean the assertion that inference has negative gross margins which a lot of people truly believe. This is important because it's common to reason from this that LLM's are uneconomical and a ticking time bomb where prices will have to be jacked up several orders of magnitude just to cover the compute used for the tokens.

emtel · 7 hours ago

This comment defies common usage and accounting practices.

When people say “selling at a loss” they mean negative unit economics. No one ever means this much more expansive definition you’ve invented.

landl0rd · 7 hours ago

Actually you can slice out a lot of things. It's even a GAAP metric, i.e. one of the common baseline that public companies are required to report, known as gross margin, literally just (revenue - cogs) / revenue. It is distinct from net margin, but both are useful and low gross vs net margin say very different things concerning the long-term prospects of the business.

jeremyjh · 8 hours ago

This is all true but it isn't really important for the argument people are making. What is more important is the marginal cost per token. If each token sold is at a marginal loss, their losses would scale with usage, that simply can't be happening with API pricing. But in general, yes I agree with you and I'm sure they are taking a huge loss on Claude Code.

oneneptune · 6 hours ago

One very minor note; Anthropic and others, like most "enterprise" solution, also sell SSO + SCIM + audit logs. Their business plans have lower tokens and higher prices to cover the enterprise features, which should be essentially free to provide in 2026.

infecto · 8 hours ago

It depends how we are looking at the business. Absolutely at the end of the day a company is profitable or not but when thinking about inference, which is largely a commodity these days, you would first think about the marginal cost of it. That is your corner stone of the business. We have pretty clear indication that largely API tokens are being sold above the marginal cost. For especially a brand new business that’s critical and something that many unicorns never even hit.

Your right that all other costs are critical to measuring the profitability of the business but for such a young industry that’s the unknown. Does training get cheaper do we hit a theoretical limit on training. Are there further optimizations to be had.

You don’t have large capex in an industrial and then in year one argue that the business is doomed when your selling the product above the marginal cost but you have not recouped costs yet that have been capitalized.

trillic · 4 hours ago

I don't think you are an accountant.

howmayiannoyyou · 10 hours ago

You're missing costs.

- Amortized training costs.

- SG&A.

- Capex depreciation.

All the above impact profitability over various time horizons and have to rolled into present and projected P&L and cash flow analysis.

bodge5000 · 10 hours ago

> A huge number of people are convinced that OpenAI and Anthropic are selling inference tokens at a loss despite the fact that there's no evidence this is true

Theres quite a lot of evidence, no proof I'd agree, but then there's no absolute proof I'm aware to the contrary either, so I don't know where you're getting this from.

The two pieces of evidence I'm aware of is that 1) Anthropic doesn't want their subsidised plans being used outside of CC, which would imply that the money their making off it isn't enough, and 2) last time I checked, API spending is capped at $5000 a month

Like I say, neither of these are proof, you can come up with reasonable arguments against them, but once again the same could be said for evidence on the contrary

Majromax · 6 hours ago

> 1) Anthropic doesn't want their subsidised plans being used outside of CC, which would imply that the money their making off it isn't enough, a

Claude Code use-cases also differ somewhat from general API use, where the former is engineered for high cache utilization. We know from overall API costs (both Anthropic and OpenRouter) that cached inputs cost an order of magnitude less than uncached inputs, but OpenCode/pi/OpenClaw don't necessarily have the same kind of aggressive cache-use optimizations.

Vertically integrated stacks might also be able to have a first layer of globally shared KV cache for the system prompts, if the preamble is not user specific and changes rarely.

> 2) last time I checked, API spending is capped at $5000 a month

Per https://platform.claude.com/docs/en/api/rate-limits, that seems to only be true for general credit-funded accounts. If you contact Anthropic's sales team and set up monthly invoicing, there's evidently no fixed spending limit.

overrun11 · 9 hours ago

> which would imply that the money their making off it isn't enough

I don't think this logically follows. An unlimited buffet doesn't let you resell all of the food out the backdoor. At some level of usage any fixed price plan becomes unprofitable.

I agree the 5k cap is interesting as evidence although as you said I suspect there are other reasons for it.

As for evidence against it: The Information reported that OpenAI and Anthropic are 30%+ gross margins for the last few years. Sam Altman and Dario have both claimed inference is profitable in various scattered interviews. Other experts seem to generally agree too. A quick search found a tweet from former PyTorch team member Horace He: https://x.com/typedfemale/status/1961197802169798775 and a response to it in agreement from Anish Tondwalkar former researcher at OpenAI and Google Brain.

BoredomIsFun · 10 hours ago

But a simple assumption that Anthropic runs a normal large MoE LLM (which it almost certainly does) suggests that the actual price of running it (mostly energy) is pretty small.

davewritescode · 8 hours ago

> A huge number of people are convinced that OpenAI and Anthropic are selling inference tokens at a loss despite the fact that there's no evidence this is true and a lot of evidence that it isn't.

I think it’s fairly obvious that Anthropic is lighting cash on fire and focusing on whether or not they’re losing money per token on inference is missing the forest for the trees.

Tokens become less valuable when the models aren’t continuously trained and we have zero idea what Anthropic is paying for training.

barrell · 11 hours ago

Does this not count as evidence? I would agree that it sounds a little shaky, but I would not say there is no evidence.

https://www.wheresyoured.at/oai_docs/

infecto · 8 hours ago

They are and they are convinced the cost is not truly baked in because you need to factor in all the training and R&D. It’s a mixture of folks that 1) are convinced AI is terrible, 2) hate Sam Altman and 3) don’t understand how business price products.

We don’t have clear evidence either way but it heavily leans to API pricing at least covering inference cost. Models these days have less and less differentiation and for API use there must be some thought to compete on cost but it’s not going to be winner take all. They leap frog each other with each new model.

bob1029 · 11 hours ago

I think the wafer scale compute is a massive deal. It's already being leveraged for models you can use right now and the reception on HN has been negligible. The entire model lives in SRAM. This is orders of magnitude faster than HBM/DRAM. I can't imagine they couldn't eventually break even using hardware like this in production.

Deleted Comment

pier25 · 5 hours ago

Nobody really knows but the simple fact is these companies are not making any profit. Far from it.

42 tps for Claude Opus 4.6 https://openrouter.ai/anthropic/claude-opus-4.6 143 tps for GLM 4.7 (32B active parameters) https://openrouter.ai/z-ai/glm-4.7 70 tps for Llama 3.3 70B (dense model) https://openrouter.ai/meta-llama/llama-3.3-70b-instruct

elbasti · 4 hours ago

hirako2000 · 14 hours ago

> Qwen 3.5 397B-A17B is a good comparison

It is not. It's a terrible comparison. Qwen, deepseek and other Chinese models are known for their 10x or even better efficiency compared to Anthropic's.

That's why the difference between open router prices and those official providers isn't that different. Plus who knows what open routed providers do in term quantization. They may be getting 100x better efficiency, thus the competitive price.

That being said not all users max out their plan, so it's not like each user costs anthropic 5,000 USD. The hemoragy would be so brutal they would be out of business in months

jychang · 14 hours ago

That's a tautology. People think chinese models are 10x more efficient because they're 10x cheaper, and then you use that to claim that they're 10x more efficient.

Opus isn't that expensive to host. Look at Amazon Bedrock's t/s numbers for Opus 4.5 vs other chinese models. They're around the same order of magnitude- which means that Opus has roughly the same amount of active params as the chinese models.

Also, you can select BF16 or Q8 providers on openrouter.

irthomasthomas · 10 hours ago

Opus doubled in speed with version 4.5, leading me to speculate that they had promoted a sonnet size model. The new faster opus was the same speed as Gemini 3 flash running on the same TPUs. I think anthropics margins are probably the highest in the industry, but they have to chop that up with google by renting their TPUs.

aerhardt · 7 hours ago

I guess more than a tautology it is an inversion of observed causes and effects?

re-thc · 12 hours ago

> That's a tautology. People think chinese models are 10x more efficient because they're 10x cheaper

They do have different infrastructure / electricity costs and they might not run on nvidia hardware.

It's not just the models.

grayxu · 8 hours ago

This is not a valid argument. TPS is essentially QoS and can be adjusted; more GPUs allocated will result in higher speed.

Weaver_zhu · 12 hours ago

Agree, but I guess the Opus 4.6 is 10x larger, rather than Chinese models being 10x more efficient. It is said that GPT-4 is already a 1.6T model, and Llama 4 behemoth is also much bigger than Chinese open-weight models. Chinese tech companies are short of frontier GPUs, but they did a lot of innovations on inference efficiency (Deepseek CEO Liang himself shows up in the author list of the related published papers).

jychang · 12 hours ago

No, Opus cannot be 10x larger than the chinese models.

If Opus was 10x larger than the chinese models, then Google Vertex/Amazon Bedrock would serve it 10x slower than Deepseek/Kimi/etc.

That's not the case. They're in the same order of magnitude of speed.

logicprog · 7 hours ago

wasn't GPT 4 the model that was so expensive for open AI to run that they basically completely retired it in favor of later models which became much stronger but weren't as expensive for them to run?

bakugo · 12 hours ago

GPT-4 was likely much larger than any of the SOTA models we have today, at least in terms of active parameters. Sparse models are the new standard, and the price drop that came with Opus 4.5 made it fairly obvious that Anthropic are not an exception.

DanielHall · 8 hours ago

Comparing open-source models like Qwen against Anthropic’s models is absolutely foolish. First of all, Anthropic has never disclosed the actual parameter count or architecture of their models. Second, it’s well known that these open-source models more or less distill from other models and use MoE, which allows them to run at much lower computational costs. Using Qwen as a comparison point only proves the blog post author is foolish. The article devoted such a large portion to discussing Qwen on OpenRouter, I find it hard to believe.

yorwba · 8 hours ago

Anthropic is obviously also aware of the benefits of MoE and distilling a larger model into a smaller one, so they could run a model of the same size as Alibaba's for the same inference cost if they want to. Or they can run a slightly larger model for slightly higher cost. They definitely aren't running a much larger model (except potentially as a teacher for distillation training) because then they wouldn't be able to hit the output speeds they're hitting.

Havoc · 10 hours ago

> Plus who knows what open routed providers do in term quantization

The quantisation is shown on the provider section.

Actually, Opus might achieve a lower cost with the help of TPUs.

simianwords · 14 hours ago

>It is not. It's a terrible comparison. Qwen, deepseek and other Chinese models are known for their 10x or even better efficiency compared to Anthropic's.

I find it a good comparison because it is a good baseline since we have zero insider knowledge of Anthropic. They give me an idea that a certain size of a model has a certain cost associated.

I don't buy the 10x efficiency thing: they are just lagging behind the performance of current SOTA models. They perform much worse than the current models while also costing much less - exactly what I would expect. Current Qwen models perform as good as Sonnet 3 I think. 2 years later when Chinese models catchup with enough distillation attacks, they would be as good as Sonnet 4.6 and still be profitable.

coldtea · 9 hours ago

> I don't buy the 10x efficiency thing: they are just lagging behind the performance of current SOTA models. They perform much worse than the current models while also costing much less - exactly what I would expect.

Define "much worse".

crooked-v · 4 hours ago

> distillation "attacks"

I find it really funny that anyone can call it this with a straight face when all the American models are based on heaps of illegally pirated books and TOS-breaking website scraping in the first place.

lelanthran · 13 hours ago

> That being said not all users max out their plan,

These are not cell phone plans which the average joe takes, they are plans purchased with the explicit goal of software development.

I would guess that 99 out of every 100 plans are purchased with the explicit goal of maxing them out.

serial_dev · 13 hours ago

I’m not maxing them out… I have issues that I need to fix, features I need to develop, and I have things I want to learn.

When I have a feeling that these tools will speed me up, I use them.

My client pays for a couple of these tools in an enterprise deal, and I suspect most of us on the team work like that.

If my goal was to max out every tool my client pays, I’d be working 24hrs a day and see no sunlight ever.

I guess it’s like the all you can eat buffet. Everybody eats a lot, but if you eat so much that you throw up and get sick, you are special.

Ginden · 13 hours ago

My employer bought me a Claude Max subscription. On heavy weeks I use 80% of the subscription. And among software engineers that I know, I'm a relatively heavy user.

Why? Because in my experience, the bottleneck is in shareholders approving new features, not my ability to dish out code.

raihansaputra · 13 hours ago

goal? yeah. but in reality just timing it right (starting a session at 7-8am, to get 2 sessions in a workday, or even 3 if you can schedule something at 5am), i rarely hit limits.

if i hit the limit usually i'm not using it well and hunting around. if i'm using it right i'm basically gassed out trying to hit the limit to the max.

solumunus · 13 hours ago

There’s absolutely no way that’s true.

rustystump · 13 hours ago

In saas this is not true. Most saas is highly profitable or was i suppose because they knew that most of their customers would never max out their plans.

overrun11 · 12 hours ago

osener · 10 hours ago

> Cost remains an ever present challenge. Cursor’s larger rivals are willing to subsidize aggressively. According to a person familiar with the company’s internal analysis, Cursor estimated last year that a $200-per-month Claude Code subscription could use up to $2,000 in compute, suggesting significant subsidization by Anthropic. Today, that subsidization appears to be even more aggressive, with that $200 plan able to consume about $5,000 in compute, according to a different person who has seen analyses on the company’s compute spend patterns.

This is the relevant quote from the original article.

ranyume · 7 hours ago

This changes... everything.

anonzzzies · 13 hours ago

I calculated only last weekend that my team would cost, if we would run Claude Code on retail API costs, around $200k/mo. We pay $1400/month in Max subscriptions. So that's $50k/user... But what tokens CC is reporting in their json -> a lot of this must be cached etc, so doubt it's anywhere near $50k cost, but not sure how to figure out what it would cost and I'm sure as hell not going to try.

scandox · 12 hours ago

I'm fascinated to know the kind of work that allows you to intelligently allocate so much resources. I use Claude extensively and feel that I great value out of it but I reach a limit in terms of what I can do that makes sense relatively quickly it seems.

codemog · 12 hours ago

Yea basically we have an app that’s like Netflix but for dogs, so people can leave on dog oriented shows for their dogs when they get kombucha or coffee

lukan · 12 hours ago

Same for me, but I suppose it is letting agents more loose and less checking of the code and rather throw away lots of generated output.

sva_ · 10 hours ago

Gemini CLI shows how much was saved through caching each session, and it's usually somewhere around 90%

neamar · 13 hours ago

You can use `npx ccusage` to check your local logs and see how much it would have cost through the API.

Dead Comment

tcbrah · 7 hours ago

yeah the json token counts are super misleading. i run a bunch of claude agents for automation and like 85% of input tokens end up being cached reads -which cost 1/10th of the sticker price. so your $200k number is probably closer to $25-30k in real cost, and thats before you factor in that anthropics own infra is way cheaper than retail API pricing. the $5k forbes number was always nonsense but even the "corrected" estimates in TFA are probably still too high IMO

kleton · 5 hours ago

I proxy all of my llm completion subscriptions. In a typical 7d span-

model completions read write cached_read cache_write

claude-opus-4-6 11000 16900000 5840000 1312000000 66120000

aweb · 12 hours ago

I'm surprised, isn't it forbidden to use the Max plan as part of a company? Just curious, as I thought it was forbidden by the ToS but I'm not sure if I have a good understanding of it

ffsm8 · 11 hours ago

There is nothing in the TOS last time I checked forbidding it's use with Claude code. It's only forbidden to utilize it in the running of the business.

So getting Claude code subscriptions for developers should be permissable and not be against anything... However, if you created a rest endpoint to eg run a preconfigured prompt as part of your platform, that'd be against it

But I'm neither a lawyer nor work for anthropic

alex_c · 8 hours ago

Claude Code has a Teams plan which includes Max tiers. Why would it be forbidden?

sunaurus · 11 hours ago

Surely that can't be true? The expectation would be that people pay $200 a month for building open source and personal hobby software with Claude?

quikoa · 12 hours ago

If they believe a sufficient number is locked in then they may consider doing this later.

KptMarchewa · 9 hours ago

Most companies forbid it though, since you're not covered by any legal protection - for example, Anthropic can use your data or code to train new models and more.

bloppe · 12 hours ago

If that were true, then everyone I know is violating that tos

jychang · 13 hours ago

> but not sure how to figure out what it would cost and I'm sure as hell not going to try.

Ask Opus to figure out how much it would cost. Lol.

eaglelamp · 14 hours ago

If Anthropic's compute is fully saturated then the Claude code power users do represent an opportunity cost to Anthropic much closer to $5,000 then $500.

Anthropic's models may be similar in parameter size to model's on open router, but none of the others are in the headlines nearly as much (especially recently) so the comparison is extremely flawed.

The argument in this article is like comparing the cost of a Rolex to a random brand of mechanical watch based on gear count.

d1sxeyes · 14 hours ago

But opportunity cost is not actual cost. “If everyone just kept paying but used our service less we would be more profitable” is true, but not in any meaningful way.

Are Anthropic currently unable to sell subscriptions because they don’t have capacity?

mike_hearn · 6 hours ago

The opportunity cost isn't selling subscriptions, the cost is the gap between what they could sell the GPU time for via their API vs what they're selling it for in a flat rate subscription. If you assume API demand is unlimited and GPU supply is fixed, then the opportunity cost is the 'real' loss of revenue that comes from redirecting supply away from customers willing to pay more to customers willing to pay less.

eru · 13 hours ago

Opportunity costs are real. In many cases they are more real than 'actual costs'. However, I otherwise agree with you.

MaxikCZ · 12 hours ago

> Are Anthropic currently unable to sell subscriptions because they don’t have capacity?

Absolutely! Im currently paying $170 to google to use Opus in antigravity without limit in full agent mode, because I tried Anthropic $20 subscription and busted my limit within a single prompt. Im not gonna pay them $200 only to find out I hit the limit after 20 or even 50 prompts.

And after 2 more months my price is going to double to over $300, and I still have no intention of even trying the 20x Max plan, if its really just 20x more prompts than Pro.

Aeolun · 14 hours ago

Opportunity cost is not the same thing as actual cost. They might have made more money if they were capable of selling the API instead of CC, but I would never tell my company to use CC all the time if I didn’t have a personal subscription.

You’re looking through the wrong end of the telescope. An investor is buying opportunity and it is a real cost to them.

bob1029 · 13 hours ago

> If Anthropic's compute is fully saturated then the Claude code power users do represent an opportunity cost to Anthropic much closer to $5,000 then $500.

I think it's the other way around? Sparse use of GPU farms should be the more expensive thing. Full saturation means that we can exploit batching effects throughout.

eaglelamp · 5 hours ago

If they have spare capacity then there is no opportunity cost to selling $100 subscriptions for exactly that reason. If they don’t have spare capacity then, at the margin, they could replace a subscription user with API calls that make them $5000: that’s opportunity cost.

If you own equity in Anthropic you should care about that cost. Maybe you are willing to tolerate it to win market share, but for you to make the most profit you need that cost to shrink.

nottorp · 12 hours ago

You know who also loves to use the term "opportunity cost"?

The entertainment industry. They still tell you about how much money they're leaving on the table because people pirate stuff.

What would happen in reality for entertainment is people would "consume" far less "content".

And what would happen in reality for Anthropic is people would start asking themselves if the unpredictability is worth the price. Or at best switch to pay as you go and use the API far less.

KronisLV · 14 hours ago

Don’t give them any ideas, please! I need my 100 USD subscription with generous Opus usage!

Google's Antigravity has Opus access, and I suspect it's subsidised.

the_gipsy · 11 hours ago

I prefer car analogies

NooneAtAll3 · 14 hours ago

> The argument in this article is like comparing the cost of a Rolex to a random brand of mechanical watch on gear count

I mean... rolex is overpriced brand whose cost to consumers is mainly just marketting in itself. Its production cost is nowhere close to selling price and looking at gears is fair way of evaluating that

fragmede · 13 hours ago

> production cost is nowhere close to selling price

When has production cost had anything to do with selling price?

YetAnotherNick · 13 hours ago

You can rent the GPUs and everything needed to run the model. Opportunity cost is not a real cost here.

Only thing that matters is if the users would have paid $5000 if they don't have option to buy subscription. And I highly doubt they would have.

ymaws · 15 hours ago

How confident are you in the opus 4.6 model size? I've always assumed it was a beefier model with more active params that Qwen397B (17B active on the forward pass)

Bolwin · 14 hours ago

Yeah that's a massive assumption they're making. I remember musk revealed Grok was multiple trillion parameters. I find it likely Opus is larger.

I'm sure Anthropic is making money off the API but I highly doubt it's 90% profit margins.

> I find it likely Opus is larger.

Unlikely. Amazon Bedrock serves Opus at 120tokens/sec.

If you want to estimate "the actual price to serve Opus", a good rough estimate is to find the price max(Deepseek, Qwen, Kimi, GLM) and multiply it by 2-3. That would be a pretty close guess to actual inference cost for Opus.

It's impossible for Opus to be something like 10x the active params as the chinese models. My guess is something around 50-100b active params, 800-1600b total params. I can be off by a factor of ~2, but I know I am not off by a factor of 10.

nbardy · 13 hours ago

You can estimate on tok/second

The Trillions of parameters claim is about the pretraining.

It’s most efficient in pre training to train the biggest models possible. You get sample efficiency increase for each parameter increase.

However those models end up very sparse and incredibly distillable.

And it’s way too expensive and slow to serve models that size so they are distilled down a lot.

wongarsu · 11 hours ago

GPT 4 was rumoured/leaked to be 1.8T. Claude 3.5 Sonnet was supposedly 175B, so around 0.5T-1T seems reasonable for Opus 3.5. Maybe a step up to 1-3T for Opus 4.0

Since then inference pricing for new models has come down a lot, despite increasing pressure to be profitable. Opus 4.6 costs 1/3rd what Opus 4.0 (and 3.5) costs, and GPT 5.4 1/4th what o1 costs. You could take that as indication that inference costs have also come done by at least that degree.

My guess would have been that current frontier models like Opus are in the realm of 1T params with 32B active

aurareturn · 14 hours ago

Anthropic CEO said 50%+ margins in an interview. I'm guessing 50 - 60% right now.

daemonologist · 15 hours ago

Even if it's larger, OpenRouter has DeepSeek v3.2 (685B/37B active) at $0.26/0.40 and Kimi K2.5 (1T/32B active) at $0.45/2.25 (mentioned in the post).

johndough · 14 hours ago

Opus 4.6 likely has in the order of 100B active parameters. OpenRouter lists the following throughput for Google Vertex:

For GLM 4.7, that makes 143 * 32B = 4576B parameters per second, and for Llama 3.3, we get 70 * 70B = 4900B, which makes sense since denser models are easier to optimize. As a lower bound, we get 4576B / 42 ≈ 109B active parameters for Opus 4.6. (This makes the assumption that all three models use the same number of bits per parameter and run on the same hardware.)

codemog · 15 hours ago

Also curious if any experts can weigh in on this. I would guess in the 1 trillion to 2 trillion range.

Chamix · 14 hours ago

Try 10s of trillions. These days everyone is running 4-bit at inference (the flagship feature of Blackwell+), with the big flagship models running on recently installed Nvidia 72gpu rubin clusters (and equivalent-ish world size for those rented Ironwood TPUs Anthropic also uses). Let's see, Vera Rubin racks come standard with 20 TB (Blackwell NVL72 with 10 TB) of unified memory, and NVFP4 fits 2 parameters per btye...

Of course, intense sparsification via MoE (and other techniques ;) ) lets total model size largely decouple from inference speed and cost (within the limit of world size via NVlink/TPU torrus caps)

So the real mystery, as always, is the actual parameter count of the activated head(s). You can do various speed benchmarks and TPS tracking across likely hardware fleets, and while an exact number is hard to compute, let me tell you, it is not 17B or anywhere in that particular OOM :)

Comparing Opus 4.6 or GPT 5.4 thinking or Gemini 3.1 pro to any sort Chinese model (on cost) is just totally disingenuous when China does NOT have Vera Rubin NVL72 GPUs or Ironwood V7 TPUs in any meaningful capacity, and is forced to target 8gpu Blackwell systems (and worse!) for deployment.

0xbadcafebee · 6 hours ago

There's a huge difference between cost of inference and profit margin of the "big" providers, and the cost of inference for cloud-hosted open-weights. It's the same as R&D cost of the pharmaceutical industry, versus cost of producing generic drugs. One is massively expensive, the other is cheap.

That said, for inference, the margins for OpenAI were estimated at 70% [1] [2], and the margins for Anthropic were estimated between 90% and 40% [3] [4], last year. They will not be profitable for years.

[1] https://phemex.com/news/article/openais-ai-profit-margin-cli... [2] https://www.saastr.com/have-ai-gross-margins-really-turned-t... [3] https://www.theinformation.com/articles/anthropic-projects-7... [4] https://www.investing.com/news/stock-market-news/anthropic-t...

vessenes · 5 hours ago

Thank you for real data. Please moderate the use of the word profitable talking to engineers! We get the same circle jerk over and over here.

Profit implies a GAAP accrual of some sort. On any accrual schedule tied to reality, the companies are profitable now - that is, inference margin on each given model has more than paid for capital costs of training and deploying those models.

That the companies get to show a loss is a feature of cash-basis accounting: they made $100m net on that last model? Good news, We’re spending $1b on the next! Infinite tax losses!

The companies will not be cashflow positive for years. Why does this persnickety difference matter? It matters to me because I care about the engineers here - and they seem collectively likely to either short every AI company IPOing, or just quietly ignore AI impact on their livelihood, or head off into a corner and go catatonic - all based on a worldview that “this is collective insanity and everything here is going to eventually go bankrupt” — none of those are good outcomes. Shorting might be, but it should be done judiciously, and understanding the financial factors at play. So, anyway, long plea over - but, allow me to plead: cashflow positive if you want to make the point you were making.