Readit News logoReadit News
tzs · 10 months ago
> Another solution is dummy calculations, which run while there are no spikes, to smooth out demand. This makes the grid see a consistent load, but it also wastes energy doing unnecessary work.

Oh god...I can see it now. Someone will try to capitalize on the hype of LLMs and the hype of cryptocurrency and try to build a combined LLM training and cryptocurrency mining facility that that runs the mining between training spikes.

ludicity · 10 months ago
Oh man, I really, really wish that you hadn't said this and also that you were wrong.
permo-w · 10 months ago
you really think this isn't literally the first thing that happened the second hosting these models became commercially viable?
jgalt212 · 10 months ago
There wouldn't be any crypto or LLMs if the Fed hadn't printed $7 trillion.
x-complexity · 10 months ago
Dummy calculations aren't even needed, if you allow the LLMs to pre-compute on the given context before inference:

https://arxiv.org/abs/2504.13171

It should be noted that this type of inference is less useful on time-sensitive tasks, but most tasks truthfully don't require such time sensitivity (there exists slack time between when the task is given & when questions are asked).

wongarsu · 10 months ago
There are already some providers offering cheap LLM services that will give you a response within 24 hours instead of within seconds. That allows them to schedule tasks during low-request hours when they have spare capacity and use better batching. For some automated tasks this is perfectly acceptable. A bit of effort to accommodate, but easy to justify when it halves your inference costs
ijustlovemath · 10 months ago
YCW27
candiddevmike · 10 months ago
From the same founders who brought you (or didn't, actually) maritime fusion
DaSHacka · 10 months ago
>implying its not already happening
0cf8612b2e1e · 10 months ago

  One solution is to rely on backup power supplies and batteries to charge and discharge, providing extra power quickly. However, much like a phone battery degrades after multiple recharge cycles, lithium-ion batteries degrade quickly when charging and discharging at this high rate.
Is this really a problem for an industrial installation? I would imagine that a properly sized facility would have adequate cooling + capacity to only run the batteries within optimal spec. Solar plants are already charging/discharging their batteries daily.

jeffbee · 10 months ago
In addition to what you said, nothing is forcing or even encouraging anyone to use lithium-ion batteries in fixed service, such as a rack full of computers.
pixl97 · 10 months ago
Eh, I think part of the problem here is the speed of load switching. From the article it looks like the loads could generate dozens to hundreds of demand spikes per minute. With most battery operated loads that I've ever messed with we're not switching loads like that. It's typically 'oh a fault, switch to battery' then some time later you check the power circuit to see if it's up and switch back.

This looks a whole lot more like high frequency load smoothing. Really it seems to me like a continuation of a motherboard. Even if you have a battery backup on your PC you still have capacitors on the board for voltage fluctuations.

lstodd · 10 months ago
in a properly designed install you can actually use the compressors and fans for smoothing load spikes. won't be much, but why not.

edit: otherwise I'm not getting what the entire article is about. it's as contrary to what I know about datacenter design as it can get.

it's.. just wrong.

touisteur · 10 months ago
I'm thinking of sequences of 'put the sharded dataset through the ten thousand 2kW GPUs, then wait on network - all-reduce - then spike again - a mostly-synchronous all-on/all-off loop. Watching how quick they get to boost-frequency I can see where the worries come from.
blt · 10 months ago
What is causing demand bursts in AI workloads? I would have expected that AI training is almost the exact opposite. Load a minibatch, take a gradient step, repeat forever. But the article claims that "each step of the computation corresponds to a massive energy spike."
sdenton4 · 10 months ago
Bad input pipelines are a big cause of spikiness - you might have to wait a non-trivial fraction of a second for the next batch of inputs to arrive. If you can run 20+ training steps per second on. adecent batch size, it can take some real engineering to get enough data lined up and ready to go fast enough. (I work on audio models, where data is apparently quite heavy compared to images or text...)
blt · 10 months ago
I can see how such a phenomenon could happen at the level of a single machine, but if we're using a whole data center full of GPU machines it should be possible to spread out those spikes evenly over time. Still weird that the article implies spikiness is a fundamental property of AI workloads rather than a design oversight that can be fixed at the software level.
lstodd · 10 months ago
if that is the case, my, I am appaled.

where do you get those ntfractions of seconds? network? storage?

Deleted Comment

wmf · 10 months ago
If the cores go idle (or just much less loaded) in between steps because they're waiting for network communication that would cause the problem.
sonium · 10 months ago
Or you simply use the pytorch.powerplant_no_blow_up operator [1]

[1] https://www.youtube.com/watch?v=vXsT6lBf0X4

janalsncm · 10 months ago
Pretty much. From the article:

> Another solution is dummy calculations, which run while there are no spikes, to smooth out demand.

Deleted Comment

Animats · 10 months ago
Is that kind of load variation from large data centers really a problem to the power grid? There are much worse intermittent loads, such as an electric furnace or a rolling mill.
toast0 · 10 months ago
I suspect it's more of a problem for the data center's energy bill. My understanding is that large electric customers pay a demand charge in addition to the volumetric charge for the kWh's use at whatever rates given time of use / wholesale rates. The demand charge is based on the maximum kW used (or sometimes just the connection size) and may also have a penalty rate if the power factor is poor. Smoothing over small duration surges probably makes a lot of things nicer for the rate payer, including helping manage fluctuations from the utility.

There's probably something that could be done on the individual systems so that they don't modulate power use quite so fast, too; at some latency cost, of course. If you go all the way to the extremes, you might add a zero crossing detector and use it to time clock speed increases.

hinkley · 10 months ago
Large customers pay not by wattage but by… I’m spacing on the word but essentially how much their power draw fucks up the sine waves for voltage and current in the power grid.

I imagine common power rail systems in hyperscaler equipment helps a bit with this, but for sure switching PSUS chop up the input voltage and smooth it out. And that leads to very strange power draws.

timewizard · 10 months ago
If you have a working thermometer you can predict when furnaces are going to run.

If you want to smooth out data centers then you need hourly pricing to force them to manage their demand into periods where excess grid capacity is not being used to serve residential loads.

oakwhiz · 10 months ago
There is often a demand flux surcharge as well. Not just demand but delta in demand over some time period.

Deleted Comment

changoplatanero · 10 months ago
Yes its a problem for the grid and the power companies don't allow large clusters to oscillate their power like this. The solution that AI have to do during their training big runs is to fill in the idle time on the GPUs with dummy operations to keep the power load constant. Having capacitors would be able to save on power usage.
nancyminusone · 10 months ago
Inb4 a startup is created to sell power load idle cycle compute time in AI training data centers.
mystified5016 · 10 months ago
Those loads aren't nearly as intermittent. Your furnace likely runs for tens of minutes at a time. These datacenters are looking at second-to-second loads.

Drawing high intermittent loads at high frequency likely makes the utility upset and leads to over-building supply to the customer to cope with peak load. If you can shave down those peaks, you can use a smaller(cheaper) supply connection. A smoother load will also make the utility happy.

Remember that electricity generation cannot ramp up and down quickly. Big transient loads can cause a lot of problems through the whole network.

paulkrush · 10 months ago
Edit: It's interesting the GPU's are causing issues on the grid before they cause issues with the data center's power.
mystified5016 · 10 months ago
Read the article.
Merrill · 10 months ago
Wouldn't it be better to arrange the network and software to run the GPUs continuously at optimal usage?

Otherwise a lot of expensive GPU capital is idle between bursts of computation.

Didn't DeepSeek do something like this to get more system level performance out of less capable GPUs?

janalsncm · 10 months ago
I am curious about what the load curves look like in these clusters. If the “networking gap” is long enough you might just be able to have a secondary workload that trains intermittently.

Slightly related, you can actually hear this effect depending on your GPU. It’s called coil whine. When your GPU is doing calculations, it draws more power and whines. Depending on your training setup, you can hear when it’s working. In other words, you want it whining all the time.

touisteur · 10 months ago
You might need more memory for this secondary training workload. But yeah, donating/selling the 'network' time for high-intensity, low memory footprint workloads (thinking number crunching, monte-carlo stuff, maybe brute-force through a series of problems...) might end up making sense.
WatchDog · 10 months ago
Sounds like an issue that would be cheaper to address by just adjusting the software.