Reading stuff like this makes me question the entirety of the article.
Reading stuff like this makes me question the entirety of the article.
> In total, the median prompt—one that falls in the middle of the range of energy demand—consumes 0.24 watt-hours of electricity
If they're running on, say, two RTX 6000s for a total draw of ~600 watts, that would be a response time of 1.44 seconds. So obviously the median prompt doesn't go to some high-end thinking model users have to pay for.
It's a very low number; for comparison, an electric vehicle might consume 82kWh to travel 363 miles. So that 0.24 watt-hours of energy is equivalent to driving 5.6 feet (1.7 meters) in such an EV.
When I hear reports that AI power demand is overloading electricity infrastructure, it always makes me think: Even before the AI boom, shouldn't we have a bunch of extra capacity under construction, ready for EV driving, induction stoves and heat-pump heating?
[1] https://cloud.google.com/blog/products/infrastructure/measur...
You're not accounting for batches for the optimal gpu utilization, maybe it can takes 30 seconds but it completed 30 requests.
https://news.ycombinator.com/item?id=44902148
Personally I'm excited that you all have access to this model now and hope you all get value out of using them.
Well maybe for you and not the millions of people that use this technology daily.
It is hype-compatible so it is good.
It is AI so it is good.
It is blockchain so it is good.
It is cloud so it is good.
It is virtual so it is good.
It is UML so it is good.
It is RPN so it is good.
It is a steam engine so it is good.
Yawn...
It's not.
For each page:
- Extract text as usual.
- Capture the whole page as an image (~200 DPI).
- Optionally extract images/graphs within the page and include them in the same LLM call.
- Optionally add a bit of context from neighboring pages.
Then wrap everything with a clear prompt (structured output + how you want graphs handled), and you’re set.
At this point, models like GPT-5-nano/mini or Gemini 2.5 Flash are cheap and strong enough to make this practical.
Yeah, it’s a bit like using a rocket launcher on a mosquito, but this is actually very easy to implement and quite flexible and powerfuL. works across almost any format, Markdown is both AI and human friendly, and surprisingly maintainable.
It all depends on the scale you need them, with the API it's easy to generate millions of tokens without thinking.
I thought this was the start of a joke or something, I guess if you use LLMs you are a "LLM lover" then.