Readit News logoReadit News
mjg59 · 3 years ago
This is really not the correct approach. https://github.com/intel/thermal_daemon ought to do a better job without ignoring manufacturer thermal limits (I reverse engineered Intel's Dynamic Power and Thermal Framework a few years back, and upstream kernels should have everything needed now: https://mjg59.dreamwidth.org/54923.html)
haukem · 3 years ago
Thank you for your comment.

I installed thermald on my Lenovo T480 with Debian Bookworm and I get 20% better results in stress-ng. The fans are a bit louder now under high load and off under low load.

Without thermald:

  $ stress-ng --matrix 0 -t 3m --metrics-brief
  stress-ng: info:  [3755113] setting to a 180 second (3 mins, 0.00 secs) run per stressor
  stress-ng: info:  [3755113] dispatching hogs: 8 matrix
  stress-ng: info:  [3755113] stressor       bogo ops real time  usr time  sys time   bogo ops/s     bogo ops/s
  stress-ng: info:  [3755113]                           (secs)    (secs)    (secs)   (real time) (usr+sys time)
  stress-ng: info:  [3755113] matrix          2278812    180.00   1437.43      0.27     12660.06        1585.04

With thermald:

  $ stress-ng --matrix 0 -t 3m --metrics-brief
  stress-ng: info:  [3755550] setting to a 180 second (3 mins, 0.00 secs) run per stressor
  stress-ng: info:  [3755550] dispatching hogs: 8 matrix
  stress-ng: info:  [3755550] stressor       bogo ops real time  usr time  sys time   bogo ops/s     bogo ops/s
  stress-ng: info:  [3755550]                           (secs)    (secs)    (secs)   (real time) (usr+sys time)
  stress-ng: info:  [3755550] matrix          2791272    180.00   1404.32      0.57     15507.06        1986.83

I just installed it using apt and did no extra configuration. My system was anyway configured for balanced power mode. Why is thermald not installed on desktop installations by default?

haukem · 3 years ago
Is dptfxtract still needed?

The thermald man page says this:

> In some newer platforms the auto creation of the config file is done by a companion tool "dptfxtract". This tool can be downloaded from "https://github.com/intel/dptfxtract". It is suggested as parts of the install process, run dptfxtract.

The dptfxtract gibthub project (https://github.com/intel/dptfxtract) says Intel discontinued the project.

ladyanita22 · 3 years ago
What? Doesn't Debian come with thermald by default?
FeepingCreature · 3 years ago
I have had people tell me that they don't care if their computers break, as long as they run faster in the meantime. Some manufacturers genuinely set the limits way too low for their own hardware.

I also use this tool to bypass manufacturer limits in battery mode that are intended to make the system seem like the battery is not undersized for the CPU's power draw. Sometimes I'd rather have more CPU for less time.

dannyw · 3 years ago
I recall hearing CPU Tmax is designed for a 10 year lifespan.

I have never used a laptop for a decade, nor have I had the CPU fail. So perhaps faster performance and shorter lifespan is okay.

ladyanita22 · 3 years ago
I don't understand why this is a problem for Linux and not fit Windows.

The computers don't break on Windows, why Linux uses should take this overly conservative approach that limits the performance of their computer?

userbinator · 3 years ago
without ignoring manufacturer thermal limits

The whole point is to ignore them because they're horrible and hold back what these CPUs can really do. Fuck the manufacturers playing these stupid marketing games.

Intel warrants their CPUs at TjMax 24/7, they'll automatically throttle when they hit that limit, and disabling all this other throttling crap makes them run that way for full performance.

mjg59 · 3 years ago
The whole point is that the CPU is only a single part of the equation. Yes, you're not going to burn out the CPU itself by unlimiting PL1/2 (although if the system vendor cheaped out on power circuitry because they'd only designed for sustained 20W draw then you might burn that out), but you're now generating more heat than the system is designed to dissipate. This may result in obvious outcomes like the chassis heating up enough to burn your legs, but it may also result in other components being operated outside their thermal limits and their lifetime being shortened as a result.
rincebrain · 3 years ago
FWIW, I've managed to make multiple X1C6s thermally trip and shut off without changing from the safe configuration, which rather impressed me because I didn't know that was feasible these days.

Mostly I mention it to say that there's not _no_ reason for thermal limits in modern setups...

arka2147483647 · 3 years ago
If you remove the thermal limit, and if this breaks your cpu, for example, within one year, should intel then replace your cpu?

Thermal limits like this are really about managing the manufacturers liabilities, and protecting the expected lifetine of the product.

makeitdouble · 3 years ago
I assume people got burned by the artificial segmentation of the CPUs solely for pricing purposes.

Trusting Intel to provide accurate info on the actual performances of a chip feels too naive at this point.

magila · 3 years ago
If Intel wanted to use power/thermal limits for market segmentation they'd be locked down like the turbo frequency tables on non-K CPUs.
ladyanita22 · 3 years ago
But on Windows there's not such a problem, why segment then on Linux and not on Windows?
ladyanita22 · 3 years ago
I don't understand why this is a problem for Linux and not fit Windows.

It shouldn't be too difficult to correct on Linux. Why is Windows taking the manufacturer's limits into account while Linux basically ignore it?

mjg59 · 3 years ago
It is corrected on Linux. Just install thermald.
nubinetwork · 3 years ago
I used to disable turbo boost for the longest time, but if someone finally fixed CPU scaling and thermal controls, I might give thermald a try.
codethief · 3 years ago
Hi Matthew, I've been a huge fan of your work ever since, back in 2010 or 2011, thanks to you I got the Gobi 2000 mobile broadband chip working on Linux on my Thinkpad W510!

As for the pros and cons of the `throttled` project, yes, this might not be the "officially desired" approach but I know several people who have been using to great success for years. The reality is, unfortunately, that particularly those of us who use Linux machines for work and order the most recent & powerful Thinkpads or Dells they can get their hands on, often realize later on (when the machine arrives) that the default settings handicap our machine to such a degree that we can barely work. Unfortunately, not everyone is a kernel developer, though, or knows how these things work, so quick fixes are often welcome, even if they limit the hardware's lifetime (after all, we'll buy a new device in a few years anyway).

What exacerbates this problem is that it's all very intransparent: The "official" way to solve these issues (which, as far as I understand you, is installing thermald?) is not really communicated anywhere, nor does thermald come preinstalled on any of the major distributions AFAIK[0]. What's worse, thermald often doesn't even solve the throttling issues without installing further patches. On top of that, BIOS updates by the manufacturers also seem to play a major role as manufacturers like Lenovo introduce different performance modes and things like "lap mode" etc. To be honest, to this day I haven't quite understood how these things interact and whose responsibility it is to fix things.

In my particular case, I have been using a Thinkpad X1 Carbon Gen9 (which Lenovo says "supports" Linux) and at some point, after installing numerous BIOS updates and working my way through hundreds of posts on the Lenovo forums, I just gave up: My machine still regularly throttles down to 800 Mhz per core and 16W under medium load until I hit the secret Fn + H key chore to tell the BIOS to switch back to high performance mode and set the thermal limit back to the maximum.

Do you happen to have a recommendation for me as to where I should start looking (again) for a solution? Does thermald fix these issues these days? (I know that when I last looked into it, it didn't.)

[0]: (EDIT) I take that back, it looks like thermald does come preinstalled on Fedora and Ubuntu these days. At least it's present on my new Ubuntu 22.04.2 installation. Unfortunately, that doesn't really help me since (and now I remember reading this last time I looked into thermald) according to the changelog[1] for v2.3:

> - thermald will not run on Lenovo platforms with lap mode sysfs entry

Great, so I still have nowhere to go from here it seems.

[1]: https://github.com/intel/thermal_daemon

pritambaral · 2 years ago
You can make thermald ignore that protection by using the `--ignore-cpu-id` flag. See thread leading up to: https://github.com/intel/thermal_daemon/issues/268#issuecomm...
jeroenhd · 3 years ago
This looks like an excellent tool for people repurposing old laptops as servers by putting their motherboard in a different chassis and adding some proper cooling of their own to the board. May need to cool parts close to the CPU as well if the board wasn't designed to transport that much heat.

If you try to do this to your laptop, well... there's a reason you can't legally sell laptops that heat up beyond 40-45℃. Expose yourself to that all you want, but be prepared for hardware damage, overheated skin, or decreased sperm count due to putting an overheated laptop in your lap.

I wouldn't call this a fix in the same way I wouldn't call throwing out your smoke alarm a fix for the constant flat battery beeping.

SpacePortKnight · 3 years ago
I want my laptop to be more predictable and reliable and with great battery life instead of having more performance.

But thanks to turbo boost, sometimes my laptop is hot playing a youtube video but cool when compiling code or the other way around. There is no predictability on how long a compilation will take or how long the battery will last, since it would depend on N thermal and power factors.

At least to me, this feels like when marketing designs products instead of product managers. I recently bought an Intel 12th gen i5-1240p laptop (asus zenbook) and this processor boosts from 1.7Ghz to 4.4Ghz i.e. more than twice the base frequency. That's absurd? I rather have a stable ~2Ghz than have the processor boost up to ~4Ghz while surfing the web.

Hence we wouldn't need tools like this if at-least on laptop, Intel released chips with no or smaller turbo boost range.

winrid · 3 years ago
We'll probably see physical "turbo" switches on laptops at some point as a gimmic, but that would honestly be ideal at this point.
hazaskull · 3 years ago
...and then there is me using

  echo "75000000" | sudo tee /sys/class/powercap/intel-rapl/intel-rapl\:0/constraint_1_power_limit_uw 
to cap my i7-10700 to prevent it from overpowering the system fan by peaking to 200+ watts.

rowanG077 · 3 years ago
It's honestly so frustrating. I bought an XPS 13 two and a half years ago and it's been a nightmare getting it to perform. I had to do the following things to make it run on non-turbo boosted clockspeeds without throttling:

- Liquid metal TIM

- Thermal pads + heat pipes connected to chassis to dissipate heat (Yes this means the bottom chassis heats up a lot)

- Disable the intel_rapl_msr linux driver + disable BD_RPOCHOT via MSR

Laptop has worked like a charm since. I really don't want a super thin laptop. I want a small laptop. I wouldn't mind 2 cm thick 13 inch laptop. But I can't handle a 15 inch laptop. I just find it way to large to be seriously portable.

quesomaster9000 · 2 years ago
14 inch laptops are the sweet spot for me, easily fit in a backpack with a protective travel case and support larger memory & more performant CPUs.

e.g. ThinkPad T14 Gen 3 AMD, Ryzen 7 pro 6850U, 32 gb LPDDR5-6400MHz for around 1000 EUR

Having used an XPS 13 and XPS 15 I was underwhelmed and none of Dell's laptops hit the sweet spot for me.

jeffbee · 3 years ago
It seems there is an endless supply of people who know just enough to write some system programs but not enough to learn basic energy accounting. You cannot simply make a CPU run faster by writing MSRs. The current goes in and the heat goes out and the temperature goes up. You can't make it just work under arbitrary parameters.
dur-randir · 3 years ago
>You cannot simply make a CPU run faster by writing MSRs

I like such generalised statements. You can read about xeon v3 hack and ThrottleStop PowerCut. Each is just "writing MSRs" with a funny side-effect of your CPU taking more current in.

jusssi · 3 years ago
Sure you can. It just makes the cooling fan to actually start spinning and do its job.

Yeah. It's that bad. I have a Thinkpad P14s.

Avamander · 3 years ago
Can concur, the defaults cause throttling before fans start spinning properly.

What's worse, these things have an accelerometer that causes the same type of throttling if you move your laptop.

Fuck Intel-based clothes-iron laptops so hard.

Deleted Comment

userbinator · 3 years ago
You can, because they come crippled by default now.

The benchmarks do not lie.

wetpaws · 3 years ago
Why is intel like this =\
mjg59 · 3 years ago
Manufacturers (rightly or wrongly) believe users want machines that are as thin and light as possible. This makes a bunch of things more complicated, including managing system thermals. Heat generated from the CPU has to go somewhere. As you get thinner, it's hard to get as much airflow and so fans are less effective. As you reduce the amount of material in the chassis, the less heat can be dumped in there without it heating up enough to potentially be uncomfortable for the user. Larger internal batteries become another source of heat while charging. Handling all of this safely becomes difficult, especially because there isn't necessarily a policy that satisfies all your users. But you can't leave it purely up to the OS either, because the OS has no idea of what the thermal characteristics of the platform are. So rather than attempting to encode all of this policy directly into firmware, Intel wrote the Dynamic Power and Thermal Framework (DPTF) spec, providing a mechanism for the firmware to share information about thermal control interfaces, interactions, and desired temperature bounds, and then let the OS make policy control decisions around that. Until the OS indicates it's ready to take over, the firmware imposes a default safe policy that's guaranteed to avoid any thermal issues, albeit at the cost of performance.

Of course, this only works if the OS knows how to do this, and Intel never publicly documented it so I had to reverse engineer it instead.

pjmlp · 3 years ago
Another example of how being open source friendly boils down to "it depends on the green paper" even for the companies that do market themselves as such.

This is not the only area where Intel doesn't really support Linux, some of their GPU models also come to mind, like the PowerVR based ones in the past.

ladyanita22 · 3 years ago
Is Linux taking a conservative approach because they're ignoring the DPTF? Why is this not a problem on Windows?
scruple · 3 years ago
I was very surprised by some of the thermal characteristics of my i7-13700k. My previous build was an i7-4790k, so it's been a minute. I had to undervolt this thing and cap it's max TDP (disable boost modes -- it has boost modes which are very thirsty) to get it to complete benchmarks while staying under 90* C (with a top of the line case, very good fans / circulation, and a large AIO). It's great now but undertuning the thing is a total departure from what I recall from '00s and '10s gaming machines.
NavinF · 3 years ago
> i7-13700k

253 W max turbo power is not that crazy by today's standards.

> top of the line case, very good fans / circulation, and a large AIO

I think you'll find that what people consider good cooling for a desktop has changed somewhat in the last decade. My first GPU didn't even have a fan, but today it's fairly common for enthusiast builds to have an external radiator. I dunno what you consider large, but most AIOs only have slightly more surface area than large air coolers so they really aren't worth it for sustained workloads like gaming or ML training. Custom loops have always been the go-to solution.

charrondev · 3 years ago
What AIO did you use? I just built a new PC with an i9-13900k and an MSI MEG Coreliquid 360 AIO cooler.

It benchmarks really well and I’ve never seen it over 50*C, the fans are really quiet, and I haven’t changed any of the configuration for it.

On the flip side I’ve got a i9-12900k in a different PC with air cooling and a more compact case and between that and the graphics card, the smaller machine runs super hot and noisy.

csdvrx · 3 years ago
I wish I could undervolt my laptop.

As noted by the author:

> ===== Notice that undervolt is typically locked from 10th gen onwards! =====

I can't even modify the BIOS due to BootGuard and the keys burned into the CPU.

Hopefully there will be a way to leak/extract the keys someday, as this create real e-waste for fake security.

brianwawok · 3 years ago
The laptop builder likely didn’t spend the extra $2 to properly cool the CPU, so the CPU slows down to prevent burning out or burning your lap? The CPU being smart about its own temp is a good thing.
wmf · 3 years ago
That's not what's happening in this case. Linux is getting much lower performance than Windows on the same laptop due to a firmware bug.
userbinator · 3 years ago
This isn't Intel's fault... unless you consider them providing things like adjustable power limits a problem. Its CPUs have had automatic thermal throttling and will shutdown on catastrophic overheating ever since the Pentium II.

It's all the fault of manufacturers who want to both save cost with inadequate heatsinks and impose arbitrary restrictions on their products. The software in this article looks like the Linux equivalent of ThrottleStop, a Windows application that was the first to expose the truth behind it all.

mjg59 · 3 years ago
I'm not sure how failing to publicly document the DPTF specification is anything other than Intel's fault. The CPUs are not running in such a constrained configuration under Windows, for example, because Intel supply drivers to configure them appropriately.
moffkalast · 3 years ago
Consumers: keep buying whatever garbage Intel puts out each year

Consumers: "Why would Intel do this?"

wmf · 3 years ago
Why does Intel allow firmware to control power management? That's a long story but it's very boring and hardly evil.
akeck · 2 years ago
What tools can do the opposite? I have a refurb Thinkpad X1 Carbon, running Deb 11 w/ i3, I use for creative writing (vim/markdown/pandoc). I'd like the battery to last as long as possible.
mjg59 · 2 years ago
Check /sys/class/powercap - if you have some RAPL entries there you can set the maximum power draw of the CPU. But in general if you have a fixed workload (ie, your system wants to do a certain amount of work, not use a certain percentage of CPU) then reducing CPU power limits will result in the CPU slowing down enough that it has to stay awake for longer to do that work, and will (counter-intuitively) actually consume more power to do the same amount of work. Running the CPU fast to get the work done quickly means the CPU can then put itself in a low-power state that shuts down a lot of ancillary components, saving more power than running the CPU at half the speed for twice as long.