Feel free to challenge these numbers, but it's a starting place. What's not accounted for is the cost of training (compute time, but also employee and everything else), which needs to be amortized over the length of time a model is used, so ChatGPT's costs rise significantly, but they do have the advantage that hardware is shared across multiple users.
Look at VLLM. It's the top open source version of this.
But the idea is you can service 5000 or so people in parallel.
You get about 1.5-2x slowdown on per token speed per user, but you get 2000x-3000x throughput on the server.
The main insight is that memory bandwidth is the main bottleneck so if you batch requests and use a clever KV cache along with the batching you can drastically increase parallel throughput.