I did some napkin math on this.
32x H100s cost 'retail' rental prices about $2/hr. I would hope that the big AI companies get it cheaper than this at their scale.
These 32 H100s can probably do something on the order of >40,000 tok/s on a frontier scale model (~700B params) with proper batching. Potentially a lot more (I'd love to know if someone has some thoughts on this).
So that's $64/hr or just under $50k/month.
40k tok/s is a lot of usage, at least for non-agentic use cases. There is no way you are losing money on paid chatgpt users at $20/month on these.
You'd still break even supporting ~200 Claude Code-esque agentic users who were using it at full tilt 40% of the day at $200/month.
Now - this doesn't include training costs or staff costs, but on a pure 'opex' basis I don't think inference is anywhere near as unprofitable as people make out.
It can plan and take actions towards arbitrary goals in a wide variety of mostly text-based domains. It can maintain basic "memory" in text files. It's not smart enough to work on a long time horizon yet, it's not embodied, and it has big gaps in understanding.
But this is basically what I would have expected v1 to look like.
What really occurs to me is that there is still so much can be done to leverage LLMs with tooling. Just small things in Claude Code (plan mode for example) make the system work so much better than (eg) the update from Sonnet 3.5 to 4.0 in my eyes.