dasl (u/dasl) - Readit News

dasl commented on Linux kernel cgroups writeback high CPU troubleshooting dasl.cc/2025/01/01/debugg... · Posted by u/mesto1

boomskats · 10 months ago

Hey, I agree that tweaking these probably wouldn't have made much difference, but tuning/reducing the dirty_bytes could calm the writeback stampede and smooth that bump, potentially getting rid of whatever race might have been happening. Regardless, disabling the cgroup accounting there is the right thing to do, especially as you don't need it. Tbh, the main reason I wrote most of that was as background to explain the cgv1 vs v2 differences and why they're there (and because I was stuck in traffic for like 45 mins :/)

If you're ever in the mood to revisit that problem you should try disabling that discard flag and see if it makes a difference. Also, if it was me, I'd have tried setting LimitNOFILE to whatever it is in my shell and seeing if the rsync still behaved differently.

Anyway - thoroughly enjoyed your article. You should write more :)

dasl · 10 months ago

I found that info about the `discard` behavior quite interesting. And thank you for the kind, inspiring words -- cheers!

dasl commented on Linux kernel cgroups writeback high CPU troubleshooting dasl.cc/2025/01/01/debugg... · Posted by u/mesto1

jeffbee · 10 months ago

> moved inodes from one cgroup to another

`cgroup.memory=nokmem` avoids this.

dasl · 10 months ago

TIL, thanks for sharing. We ended up solving our problem another way by adding this `DisableControllers` stanza to the service's systemd configuration: https://gist.github.com/dasl-/87b849625846aed17f1e4841b04ecc...

I believe the kernel's cgroup writeback accounting features are enabled / disabled based on this code: https://github.com/torvalds/linux/blob/c291c9cfd76a8fb92ef3d...

dasl commented on Linux kernel cgroups writeback high CPU troubleshooting dasl.cc/2025/01/01/debugg... · Posted by u/mesto1

boomskats · 10 months ago

This is an interesting problem. The OP should have a look into how the vm.dirty_ratio, vm.dirty_background_ratio, vm.dirty_bytes, and vm.dirty_background_bytes (and other similarly prefixed) sysctl parameters control when the kernel starts flushing dirty pages to disk. Last time I checked, different distros defaulted things like dirty_ratio to somewhere between 10 and 50, mostly for legacy reasons.

This is really not great in situations where you're bootstrapping a fresh server. Here's what happens:

- you boot up a server with, say, 1tb RAM

- your default dirty ratio is 10 (best case)

- you quickly write 90gb of files to your server (images, whatever)

- you get mad unblocked throughput as the page cache fills up & the kernel hasn't even tried flushing anything to disk yet

- your application starts, takes up 9gb memory

- starts to serve requests, writes another 1gb of mem mapped cache

- the kernel starts to flush, realises disk is slower than it thought, starts to trip over itself and aggressively throttle IO until it can catch up

- your app is now IO bound while the kernel thrashes around for a bit

This can be tuned by adjusting the vm.dirty_* defaults, and is well worth doing IMO. The defaults that kernels still ship with are from a long time ago when we didn't have this much memory available.

My memory of this next bit is flaky at best, so happy to be corrected here, but I remember this also being a big problem with k8s. With cgroups v1, a node would get added to your cluster and a pod would get scheduled there. The pod would be limited to, say, 4gb memory - way more memory than it actually uses - but it would have a lot of IO operations. Because the node still had a ton of free memory, way below its default dirty writeback ratio/bytes, none of the IO operations would get flushed to disk for ages, but the dirty pages in the page cache would still be counted towards that pod's memory usage even though they weren't 'real' memory, but something completely out of the control of the pod (or kubernetes, really). Before you knew it, bOOM. Pod oomkilled for seemingly no reason, and no way to do anything about it. I remember some issues where people skirted around it by looking off into middle distance and saying the usual things about k8s not being for stateful workloads, but it was really lame and really not talked about enough.

This might seem unrelated, but you guessed it, it was fixed in cgroups v2, and I imagine that the fix for that problem either directly or indirectly explains why OP saw a difference in behaviour between cgroups v1 and v2.

Also, slightly related, I remember discovering a while back that for workloads like this where you've got a high turnover of files & processes, having the `discard` (trim) flag set on your ssd mount could really mess you up (definitely in ext4, not sure about xfs). It would prevent the page cache from evicting pages of deleted files without forcing writeback first, which is obviously the opposite of what it was designed to do (protect/trim the ssds). Not to mention cause all sorts of zombifications when terminated processes still had memmapped files that hadn't been flushed to disk, etc.

AFAIK it's still a problem, though it's been years since I profiled this stuff. At peak load with io-intensive workloads, you could end up with SSDs making your app run slower. Try remounting without the `discard` flag (and periodically fstrim manually), or use `discard=async`, and see what difference it makes.

dasl · 10 months ago

Hi, I'm the author of the article! Thank you for the awesome description of the various vm.dirty_* sysctls.

The problem described in my post was not _directly_ related to the kernel flushing dirty pages to disk. As such, I'm not sure that tweaking these sysctls would have made any difference.

Instead, we were seeing the kernel using too much CPU when it moved inodes from one cgroup to another. This is part of the kernel's writeback cgroup accounting logic. I believe this is a related but slightly different form of writeback problems :)

dasl commented on Ingesting PDFs and why Gemini 2.0 changes everything sergey.fyi/articles/gemin... · Posted by u/serjester

dasl · 10 months ago

How does the Gemini OCR perform against non-English language text?

dasl commented on Analyzing my electricity consumption zdimension.fr/analyzing-m... · Posted by u/skadamat

muth02446 · a year ago

Has anybody figured out a way to do this with coned in the us?

dasl · a year ago

Yes, Con Ed now has smart meters that report electricity usage in realtime at 15 minute intervals. If you use Home Assistant, you can perhaps make use of the Opower integration to get this data: https://www.home-assistant.io/integrations/opower/

Although implementing the realtime API in the Opower integration has not yet been completed. That said, I don't think it would be too hard to implement. See: https://github.com/tronikos/opower/issues/24

This realtime data is also available and graphed on your account page on the Con Ed website and mobile app.

I wrote my own code that uses Con Ed's realtime API and writes the data to Prometheus so that I can view it in Grafana. My code was heavily influenced by Home Assistant's Opower integration code. Here's my code: https://github.com/dasl-/pitools/blob/main/sensors/measure_e...

dasl commented on Living Computers Museum to permanently close, auction vintage items geekwire.com/2024/seattle... · Posted by u/dboreham

zimm · a year ago

info@gatesfoundation.org

Email and ask

dasl · a year ago

The email address responded with an automated message saying they are no longer checking the inbox. It directed me to submit my query at their contact for instead: https://www.gatesfoundation.org/about/contact/write-to-us

I submitted this message, feel free to copy the same text and submit yourself also:

-----------------------------

I recently became aware that the Living Computers Museum, which was created by Paul Allen (Microsoft co-founder), is shutting down. As someone in the technology industry, I find that very sad! The museum was really magical. I'm wondering if the Gates Foundation can step up and save the museum from closing?

https://www.geekwire.com/2024/seattles-living-computers-muse...

Thank you for your consideration

dasl commented on Show HN: Pi-C.A.R.D, a Raspberry Pi Voice Assistant github.com/nkasmanoff/pi-... · Posted by u/nkaz123

dasl · 2 years ago

What latency do you get? I'd be interested in seeing a demo video.

dasl commented on Unveiling the big leap in Ruby 3.3's IRB railsatscale.com/2023-12-... · Posted by u/todsacerdoti

aviandnuskii · 2 years ago

This is so cool. I have been trying to replace the default history lookup in irb with fzf but have not found a clear path to do so. Maybe the irb team can also make it easier to do this.

dasl · 2 years ago

This allows you to use fzf with IRB. It works with anything that uses readline, which IRB uses. https://github.com/lincheney/rl_custom_isearch

it works for me on linux, not sure about other OS's. Although I'm now noticing that the article linked in the original post says that Ruby has a pure Ruby replacement for readline: Reline. So I wonder if it will not work with more recent versions of Ruby that use Reline?

dasl commented on Scrollbars are becoming a problem artemis.sh/2023/10/12/scr... · Posted by u/dredmorbius

dasl · 2 years ago

gmail iOS app has no scrollbars. pretty annoying.

dasl commented on Demoscene accepted as UNESCO cultural heritage in The Netherlands demoscene-the-art-of-codi... · Posted by u/Vinnl

bayindirh · 2 years ago

This is great to hear. Demoscene is one of the most influential things I have came across my entire life, and changed how I code forever.

I remember watching Farbrausch's "fr-08 .the .produkt" [0] when it came out and telling myself "If a computer can do this with 64KB of data, at this speed, my programs should be able to do the same, or at least shall be close". I was forever poisoned at this point, and this simple sentence shaped my whole academic life and career.

[0]: https://www.pouet.net/prod.php?which=1221

P.S.: Rewatching it, again, for the nth time. Hats off to chaos, fiver2, kb, doj, ryg & yoda.

P.P.S: I show people YouTube version of Elevated (https://www.pouet.net/prod.php?which=52938), and ask them to guess the binary size rendering this thing in real time. The answer blows everyone's mind, every time.

dasl · 2 years ago

elevated youtube link : https://www.youtube.com/watch?v=jB0vBmiTr6o