Are you saying that e.g. securely attached children perform better in school, because you know that in your capacity as a educated psychologist? Or did an LLM write that based on its best guess? None of your readers can tell. So how should they know whether it's true?
The real heroes are the people that facilitate alternatives, not those who talk.
The burden of proof is on Google here. If they've reduced gemini 2.5 energy use by 33x, they need to state that clearly. Otherwise a we should assume they're fudging the numbers, for example:
A) they've chosen one particular tiny model for this number
or
B) it's a median across all models including the tiny one they use for all search queries
EDIT: I've read over the report and it's B) as far as I can see
Without more info, any other reading of this is a failing on the reader's part, or wishful thinking if they want to feel good about their AI usage.
We should also be ready to change these assumptions if Google or another reputable party does confirm this applies to large models like Gemini 2.5, but should assume the least impressive possible reading until that missing info arrives.
Even more useful info would be how much electricity Google uses per month, and whether that has gone down or continued to grow in the period following this announcement. Because total energy use across their whole AI product range, including training, is the only number that really matters.
https://annas-archive.org/donate
I'll also say that when too much money starts becoming a part of this, trouble will increase dramatically. I realize this sort of endeavor costs a lot of time and money, but it's a line we should probably be aware of.
1. You load all the weights of the model into GPU VRAM, plus the context.
2. You construct a data structure called the "KV cache" representing the context, and it hopefully stays in the GPU cache.
3. For each token in the response, for each layer of the model, you read the weights of that layer out of VRAM and use them plus the KV cache to compute the inputs to the next layer. After all the layers you output a new token and update the KV cache with it.
Furthermore, my understanding is that the bottleneck of this process is usually in step 3 where you read the weights of the layer from VRAM.
As a result, this process is very parallelizable if you have lots of different people doing independent queries at the same time, because you can have all their contexts in cache at once, and then process them through each layer at the same time, reading the weights from VRAM only once.
So once you got the VRAM it's much more efficient for you to serve lots of people's different queries than for you to be one guy doing one query at a time.
- "Screen time" is not a natural category. You can watch TV, listen to music, read, socialize, make things, educate yourself, and play games using many kinds of screens and non-screens. Use your common sense to think about how much time is reasonable to do any specific activity. Decide what you think and then enforce it.
- Everything in moderation. Rarely was someone worse off because they did something they enjoyed for half an hour a day.
- Your kid is going to want to imitate you. If you personally aren't happy with how you spend your time, then fix it, and your fixing will do double duty.
- The fundamental question is how you want to balance giving your kid time to do the stuff they enjoy, versus doing stuff that you think educates them, expands their horizons, or otherwise builds character somehow.
If LLM usage is easy then I can't be left behind because it's easy. I'll pick it up in a weekend.
If LLM usage is hard AND I can otherwise do the hard things that LLMs are doing then I can't be left behind if I just do the hard things.
Still the only way I can be left behind is if LLM usage is nonsense or the same as just doing it yourself AND the important thing is telling managers that you've been using it for a long time.
Is the superpower bamboozling management with story time?