On the hardware side you can run some benchmarks on the hardware (or use other people's benchmarks) and get an idea of the tokens/second you can get from the machine. Normalize this for your usage pattern (and do your best to implement batch processing where you are able to, which will save you money on both methods) and you have a basic idea of how much it would cost per token.
Then you compare that to the cost of something like GPT5, which is a bit simpler because the cost per (million) token is something you can grab off of a website.
You'd be surprised how much money running something like DeepSeek (or if you prefer a more established company, Qwen3) will save you over the cloud systems.
That's just one factor though. Another is what hardware you can actually run things on. DeepSeek and Qwen will function on cheap GPUs that other models will simply choke on.
Maybe it's just the kind of work I'm doing, a lot of web development with html/scss, and Google has crawled the internet so they have more data to work with.
I reckon different models are better at different kinds of work, but Gemini is pretty excellent at UI/UX web development, in my experience
Very excited to see what 3.0 is like
This is the plan not a coincidence. China pays huge “grants” to their citizens to come to the US, get educated, work in big tech/science, then bring it all home.