This is really not the correct approach. https://github.com/intel/thermal_daemon ought to do a better job without ignoring manufacturer thermal limits (I reverse engineered Intel's Dynamic Power and Thermal Framework a few years back, and upstream kernels should have everything needed now: https://mjg59.dreamwidth.org/54923.html)
I installed thermald on my Lenovo T480 with Debian Bookworm and I get 20% better results in stress-ng. The fans are a bit louder now under high load and off under low load.
Without thermald:
$ stress-ng --matrix 0 -t 3m --metrics-brief
stress-ng: info: [3755113] setting to a 180 second (3 mins, 0.00 secs) run per stressor
stress-ng: info: [3755113] dispatching hogs: 8 matrix
stress-ng: info: [3755113] stressor bogo ops real time usr time sys time bogo ops/s bogo ops/s
stress-ng: info: [3755113] (secs) (secs) (secs) (real time) (usr+sys time)
stress-ng: info: [3755113] matrix 2278812 180.00 1437.43 0.27 12660.06 1585.04
With thermald:
$ stress-ng --matrix 0 -t 3m --metrics-brief
stress-ng: info: [3755550] setting to a 180 second (3 mins, 0.00 secs) run per stressor
stress-ng: info: [3755550] dispatching hogs: 8 matrix
stress-ng: info: [3755550] stressor bogo ops real time usr time sys time bogo ops/s bogo ops/s
stress-ng: info: [3755550] (secs) (secs) (secs) (real time) (usr+sys time)
stress-ng: info: [3755550] matrix 2791272 180.00 1404.32 0.57 15507.06 1986.83
I just installed it using apt and did no extra configuration. My system was anyway configured for balanced power mode.
Why is thermald not installed on desktop installations by default?
> In some newer platforms the auto creation of the config file is done by a companion tool "dptfxtract". This tool can be downloaded from "https://github.com/intel/dptfxtract". It is suggested as parts of the install process, run dptfxtract.
I have had people tell me that they don't care if their computers break, as long as they run faster in the meantime. Some manufacturers genuinely set the limits way too low for their own hardware.
I also use this tool to bypass manufacturer limits in battery mode that are intended to make the system seem like the battery is not undersized for the CPU's power draw. Sometimes I'd rather have more CPU for less time.
The whole point is to ignore them because they're horrible and hold back what these CPUs can really do. Fuck the manufacturers playing these stupid marketing games.
Intel warrants their CPUs at TjMax 24/7, they'll automatically throttle when they hit that limit, and disabling all this other throttling crap makes them run that way for full performance.
The whole point is that the CPU is only a single part of the equation. Yes, you're not going to burn out the CPU itself by unlimiting PL1/2 (although if the system vendor cheaped out on power circuitry because they'd only designed for sustained 20W draw then you might burn that out), but you're now generating more heat than the system is designed to dissipate. This may result in obvious outcomes like the chassis heating up enough to burn your legs, but it may also result in other components being operated outside their thermal limits and their lifetime being shortened as a result.
FWIW, I've managed to make multiple X1C6s thermally trip and shut off without changing from the safe configuration, which rather impressed me because I didn't know that was feasible these days.
Mostly I mention it to say that there's not _no_ reason for thermal limits in modern setups...
Hi Matthew, I've been a huge fan of your work ever since, back in 2010 or 2011, thanks to you I got the Gobi 2000 mobile broadband chip working on Linux on my Thinkpad W510!
As for the pros and cons of the `throttled` project, yes, this might not be the "officially desired" approach but I know several people who have been using to great success for years. The reality is, unfortunately, that particularly those of us who use Linux machines for work and order the most recent & powerful Thinkpads or Dells they can get their hands on, often realize later on (when the machine arrives) that the default settings handicap our machine to such a degree that we can barely work. Unfortunately, not everyone is a kernel developer, though, or knows how these things work, so quick fixes are often welcome, even if they limit the hardware's lifetime (after all, we'll buy a new device in a few years anyway).
What exacerbates this problem is that it's all very intransparent: The "official" way to solve these issues (which, as far as I understand you, is installing thermald?) is not really communicated anywhere, nor does thermald come preinstalled on any of the major distributions AFAIK[0]. What's worse, thermald often doesn't even solve the throttling issues without installing further patches. On top of that, BIOS updates by the manufacturers also seem to play a major role as manufacturers like Lenovo introduce different performance modes and things like "lap mode" etc. To be honest, to this day I haven't quite understood how these things interact and whose responsibility it is to fix things.
In my particular case, I have been using a Thinkpad X1 Carbon Gen9 (which Lenovo says "supports" Linux) and at some point, after installing numerous BIOS updates and working my way through hundreds of posts on the Lenovo forums, I just gave up: My machine still regularly throttles down to 800 Mhz per core and 16W under medium load until I hit the secret Fn + H key chore to tell the BIOS to switch back to high performance mode and set the thermal limit back to the maximum.
Do you happen to have a recommendation for me as to where I should start looking (again) for a solution? Does thermald fix these issues these days? (I know that when I last looked into it, it didn't.)
[0]: (EDIT) I take that back, it looks like thermald does come preinstalled on Fedora and Ubuntu these days. At least it's present on my new Ubuntu 22.04.2 installation. Unfortunately, that doesn't really help me since (and now I remember reading this last time I looked into thermald) according to the changelog[1] for v2.3:
> - thermald will not run on Lenovo platforms with lap mode sysfs entry
Great, so I still have nowhere to go from here it seems.
This looks like an excellent tool for people repurposing old laptops as servers by putting their motherboard in a different chassis and adding some proper cooling of their own to the board. May need to cool parts close to the CPU as well if the board wasn't designed to transport that much heat.
If you try to do this to your laptop, well... there's a reason you can't legally sell laptops that heat up beyond 40-45℃. Expose yourself to that all you want, but be prepared for hardware damage, overheated skin, or decreased sperm count
due to putting an overheated laptop in your lap.
I wouldn't call this a fix in the same way I wouldn't call throwing out your smoke alarm a fix for the constant flat battery beeping.
I want my laptop to be more predictable and reliable and with great battery life instead of having more performance.
But thanks to turbo boost, sometimes my laptop is hot playing a youtube video but cool when compiling code or the other way around. There is no predictability on how long a compilation will take or how long the battery will last, since it would depend on N thermal and power factors.
At least to me, this feels like when marketing designs products instead of product managers. I recently bought an Intel 12th gen i5-1240p laptop (asus zenbook) and this processor boosts from 1.7Ghz to 4.4Ghz i.e. more than twice the base frequency. That's absurd? I rather have a stable ~2Ghz than have the processor boost up to ~4Ghz while surfing the web.
Hence we wouldn't need tools like this if at-least on laptop, Intel released chips with no or smaller turbo boost range.
It's honestly so frustrating. I bought an XPS 13 two and a half years ago and it's been a nightmare getting it to perform. I had to do the following things to make it run on non-turbo boosted clockspeeds without throttling:
- Liquid metal TIM
- Thermal pads + heat pipes connected to chassis to dissipate heat (Yes this means the bottom chassis heats up a lot)
- Disable the intel_rapl_msr linux driver + disable BD_RPOCHOT via MSR
Laptop has worked like a charm since. I really don't want a super thin laptop. I want a small laptop. I wouldn't mind 2 cm thick 13 inch laptop. But I can't handle a 15 inch laptop. I just find it way to large to be seriously portable.
It seems there is an endless supply of people who know just enough to write some system programs but not enough to learn basic energy accounting. You cannot simply make a CPU run faster by writing MSRs. The current goes in and the heat goes out and the temperature goes up. You can't make it just work under arbitrary parameters.
>You cannot simply make a CPU run faster by writing MSRs
I like such generalised statements. You can read about xeon v3 hack and ThrottleStop PowerCut. Each is just "writing MSRs" with a funny side-effect of your CPU taking more current in.
Manufacturers (rightly or wrongly) believe users want machines that are as thin and light as possible. This makes a bunch of things more complicated, including managing system thermals. Heat generated from the CPU has to go somewhere. As you get thinner, it's hard to get as much airflow and so fans are less effective. As you reduce the amount of material in the chassis, the less heat can be dumped in there without it heating up enough to potentially be uncomfortable for the user. Larger internal batteries become another source of heat while charging. Handling all of this safely becomes difficult, especially because there isn't necessarily a policy that satisfies all your users. But you can't leave it purely up to the OS either, because the OS has no idea of what the thermal characteristics of the platform are. So rather than attempting to encode all of this policy directly into firmware, Intel wrote the Dynamic Power and Thermal Framework (DPTF) spec, providing a mechanism for the firmware to share information about thermal control interfaces, interactions, and desired temperature bounds, and then let the OS make policy control decisions around that. Until the OS indicates it's ready to take over, the firmware imposes a default safe policy that's guaranteed to avoid any thermal issues, albeit at the cost of performance.
Of course, this only works if the OS knows how to do this, and Intel never publicly documented it so I had to reverse engineer it instead.
Another example of how being open source friendly boils down to "it depends on the green paper" even for the companies that do market themselves as such.
This is not the only area where Intel doesn't really support Linux, some of their GPU models also come to mind, like the PowerVR based ones in the past.
I was very surprised by some of the thermal characteristics of my i7-13700k. My previous build was an i7-4790k, so it's been a minute. I had to undervolt this thing and cap it's max TDP (disable boost modes -- it has boost modes which are very thirsty) to get it to complete benchmarks while staying under 90* C (with a top of the line case, very good fans / circulation, and a large AIO). It's great now but undertuning the thing is a total departure from what I recall from '00s and '10s gaming machines.
253 W max turbo power is not that crazy by today's standards.
> top of the line case, very good fans / circulation, and a large AIO
I think you'll find that what people consider good cooling for a desktop has changed somewhat in the last decade. My first GPU didn't even have a fan, but today it's fairly common for enthusiast builds to have an external radiator. I dunno what you consider large, but most AIOs only have slightly more surface area than large air coolers so they really aren't worth it for sustained workloads like gaming or ML training. Custom loops have always been the go-to solution.
What AIO did you use? I just built a new PC with an i9-13900k and an MSI MEG Coreliquid 360 AIO cooler.
It benchmarks really well and I’ve never seen it over 50*C, the fans are really quiet, and I haven’t changed any of the configuration for it.
On the flip side I’ve got a i9-12900k in a different PC with air cooling and a more compact case and between that and the graphics card, the smaller machine runs super hot and noisy.
The laptop builder likely didn’t spend the extra $2 to properly cool the CPU, so the CPU slows down to prevent burning out or burning your lap? The CPU being smart about its own temp is a good thing.
This isn't Intel's fault... unless you consider them providing things like adjustable power limits a problem. Its CPUs have had automatic thermal throttling and will shutdown on catastrophic overheating ever since the Pentium II.
It's all the fault of manufacturers who want to both save cost with inadequate heatsinks and impose arbitrary restrictions on their products. The software in this article looks like the Linux equivalent of ThrottleStop, a Windows application that was the first to expose the truth behind it all.
I'm not sure how failing to publicly document the DPTF specification is anything other than Intel's fault. The CPUs are not running in such a constrained configuration under Windows, for example, because Intel supply drivers to configure them appropriately.
What tools can do the opposite? I have a refurb Thinkpad X1 Carbon, running Deb 11 w/ i3, I use for creative writing (vim/markdown/pandoc). I'd like the battery to last as long as possible.
Check /sys/class/powercap - if you have some RAPL entries there you can set the maximum power draw of the CPU. But in general if you have a fixed workload (ie, your system wants to do a certain amount of work, not use a certain percentage of CPU) then reducing CPU power limits will result in the CPU slowing down enough that it has to stay awake for longer to do that work, and will (counter-intuitively) actually consume more power to do the same amount of work. Running the CPU fast to get the work done quickly means the CPU can then put itself in a low-power state that shuts down a lot of ancillary components, saving more power than running the CPU at half the speed for twice as long.
I installed thermald on my Lenovo T480 with Debian Bookworm and I get 20% better results in stress-ng. The fans are a bit louder now under high load and off under low load.
Without thermald:
With thermald: I just installed it using apt and did no extra configuration. My system was anyway configured for balanced power mode. Why is thermald not installed on desktop installations by default?The thermald man page says this:
> In some newer platforms the auto creation of the config file is done by a companion tool "dptfxtract". This tool can be downloaded from "https://github.com/intel/dptfxtract". It is suggested as parts of the install process, run dptfxtract.
The dptfxtract gibthub project (https://github.com/intel/dptfxtract) says Intel discontinued the project.
I also use this tool to bypass manufacturer limits in battery mode that are intended to make the system seem like the battery is not undersized for the CPU's power draw. Sometimes I'd rather have more CPU for less time.
I have never used a laptop for a decade, nor have I had the CPU fail. So perhaps faster performance and shorter lifespan is okay.
The computers don't break on Windows, why Linux uses should take this overly conservative approach that limits the performance of their computer?
The whole point is to ignore them because they're horrible and hold back what these CPUs can really do. Fuck the manufacturers playing these stupid marketing games.
Intel warrants their CPUs at TjMax 24/7, they'll automatically throttle when they hit that limit, and disabling all this other throttling crap makes them run that way for full performance.
Mostly I mention it to say that there's not _no_ reason for thermal limits in modern setups...
Thermal limits like this are really about managing the manufacturers liabilities, and protecting the expected lifetine of the product.
Trusting Intel to provide accurate info on the actual performances of a chip feels too naive at this point.
It shouldn't be too difficult to correct on Linux. Why is Windows taking the manufacturer's limits into account while Linux basically ignore it?
As for the pros and cons of the `throttled` project, yes, this might not be the "officially desired" approach but I know several people who have been using to great success for years. The reality is, unfortunately, that particularly those of us who use Linux machines for work and order the most recent & powerful Thinkpads or Dells they can get their hands on, often realize later on (when the machine arrives) that the default settings handicap our machine to such a degree that we can barely work. Unfortunately, not everyone is a kernel developer, though, or knows how these things work, so quick fixes are often welcome, even if they limit the hardware's lifetime (after all, we'll buy a new device in a few years anyway).
What exacerbates this problem is that it's all very intransparent: The "official" way to solve these issues (which, as far as I understand you, is installing thermald?) is not really communicated anywhere, nor does thermald come preinstalled on any of the major distributions AFAIK[0]. What's worse, thermald often doesn't even solve the throttling issues without installing further patches. On top of that, BIOS updates by the manufacturers also seem to play a major role as manufacturers like Lenovo introduce different performance modes and things like "lap mode" etc. To be honest, to this day I haven't quite understood how these things interact and whose responsibility it is to fix things.
In my particular case, I have been using a Thinkpad X1 Carbon Gen9 (which Lenovo says "supports" Linux) and at some point, after installing numerous BIOS updates and working my way through hundreds of posts on the Lenovo forums, I just gave up: My machine still regularly throttles down to 800 Mhz per core and 16W under medium load until I hit the secret Fn + H key chore to tell the BIOS to switch back to high performance mode and set the thermal limit back to the maximum.
Do you happen to have a recommendation for me as to where I should start looking (again) for a solution? Does thermald fix these issues these days? (I know that when I last looked into it, it didn't.)
[0]: (EDIT) I take that back, it looks like thermald does come preinstalled on Fedora and Ubuntu these days. At least it's present on my new Ubuntu 22.04.2 installation. Unfortunately, that doesn't really help me since (and now I remember reading this last time I looked into thermald) according to the changelog[1] for v2.3:
> - thermald will not run on Lenovo platforms with lap mode sysfs entry
Great, so I still have nowhere to go from here it seems.
[1]: https://github.com/intel/thermal_daemon
If you try to do this to your laptop, well... there's a reason you can't legally sell laptops that heat up beyond 40-45℃. Expose yourself to that all you want, but be prepared for hardware damage, overheated skin, or decreased sperm count due to putting an overheated laptop in your lap.
I wouldn't call this a fix in the same way I wouldn't call throwing out your smoke alarm a fix for the constant flat battery beeping.
But thanks to turbo boost, sometimes my laptop is hot playing a youtube video but cool when compiling code or the other way around. There is no predictability on how long a compilation will take or how long the battery will last, since it would depend on N thermal and power factors.
At least to me, this feels like when marketing designs products instead of product managers. I recently bought an Intel 12th gen i5-1240p laptop (asus zenbook) and this processor boosts from 1.7Ghz to 4.4Ghz i.e. more than twice the base frequency. That's absurd? I rather have a stable ~2Ghz than have the processor boost up to ~4Ghz while surfing the web.
Hence we wouldn't need tools like this if at-least on laptop, Intel released chips with no or smaller turbo boost range.
- Liquid metal TIM
- Thermal pads + heat pipes connected to chassis to dissipate heat (Yes this means the bottom chassis heats up a lot)
- Disable the intel_rapl_msr linux driver + disable BD_RPOCHOT via MSR
Laptop has worked like a charm since. I really don't want a super thin laptop. I want a small laptop. I wouldn't mind 2 cm thick 13 inch laptop. But I can't handle a 15 inch laptop. I just find it way to large to be seriously portable.
e.g. ThinkPad T14 Gen 3 AMD, Ryzen 7 pro 6850U, 32 gb LPDDR5-6400MHz for around 1000 EUR
Having used an XPS 13 and XPS 15 I was underwhelmed and none of Dell's laptops hit the sweet spot for me.
I like such generalised statements. You can read about xeon v3 hack and ThrottleStop PowerCut. Each is just "writing MSRs" with a funny side-effect of your CPU taking more current in.
Yeah. It's that bad. I have a Thinkpad P14s.
What's worse, these things have an accelerometer that causes the same type of throttling if you move your laptop.
Fuck Intel-based clothes-iron laptops so hard.
Deleted Comment
The benchmarks do not lie.
Of course, this only works if the OS knows how to do this, and Intel never publicly documented it so I had to reverse engineer it instead.
This is not the only area where Intel doesn't really support Linux, some of their GPU models also come to mind, like the PowerVR based ones in the past.
253 W max turbo power is not that crazy by today's standards.
> top of the line case, very good fans / circulation, and a large AIO
I think you'll find that what people consider good cooling for a desktop has changed somewhat in the last decade. My first GPU didn't even have a fan, but today it's fairly common for enthusiast builds to have an external radiator. I dunno what you consider large, but most AIOs only have slightly more surface area than large air coolers so they really aren't worth it for sustained workloads like gaming or ML training. Custom loops have always been the go-to solution.
It benchmarks really well and I’ve never seen it over 50*C, the fans are really quiet, and I haven’t changed any of the configuration for it.
On the flip side I’ve got a i9-12900k in a different PC with air cooling and a more compact case and between that and the graphics card, the smaller machine runs super hot and noisy.
As noted by the author:
> ===== Notice that undervolt is typically locked from 10th gen onwards! =====
I can't even modify the BIOS due to BootGuard and the keys burned into the CPU.
Hopefully there will be a way to leak/extract the keys someday, as this create real e-waste for fake security.
It's all the fault of manufacturers who want to both save cost with inadequate heatsinks and impose arbitrary restrictions on their products. The software in this article looks like the Linux equivalent of ThrottleStop, a Windows application that was the first to expose the truth behind it all.
Consumers: "Why would Intel do this?"