j_seigh (u/j_seigh) - Readit News

j_seigh commented on Io_uring, kTLS and Rust for zero syscall HTTPS server blog.habets.se/2025/04/io... · Posted by u/guntars

api · 3 days ago

This is impressive but it’s also an amazing amount of complexity and difficult programming to work around the fact that syscalls are so slow.

It seems like there’s these fundamental things in OSes that we just can’t improve, or I suppose can’t without breaking too much backward compatibility, so we are forced to do this.

j_seigh · 3 days ago

I don't think it has to be. Conceptually it's just a couple of queues.

There's a software equivalent of the Peter Principle where software or an API becomes increasingly complex to the point where no one understands it. They then attempt to fix that by adding more functionality (complexity).

j_seigh commented on Without the futex, it's futile h4x0r.org/futex/... · Posted by u/eatonphil

senderista · 5 days ago

Maybe you mean ticket locks? That’s kind of the practical version of the bakery algorithm. It’s not the same because ticket locks require fetch-and-add (or its emulation over another atomic RMW like CAS), while the bakery algorithm uses only reads and writes (which makes it impractical).

Using a futex to allow ticket locks to block in the kernel does have a thundering herd problem. Fortunately you can do “smart sleeps” by measuring the time elapsed since the last wakeup and the change in the turn counter since then, and using that to estimate the time until the current thread’s turn (we don’t want to block the queue by oversleeping, so sleep for half of that estimate, then redo the estimate again, etc). (The same logic can be applied to spinning during the waiter’s timeslice, to avoid unnecessary cache coherence traffic.)

I assume you’re referring to Guy Blelloch’s work on wait-free hazard pointers? That’s very cool although I doubt it’s solving a practical problem. The practical issue with hazard pointers is the StoreLoad barrier on the read side (which can be moved to the GC side via membarrier(2) etc).

j_seigh · 4 days ago

https://groups.google.com/g/comp.programming.threads/c/XU6Bt...https://groups.google.com/g/linux.kernel/c/gk6AUkXR9As/m/-1W... Yes, I am aware of the asymmetric memory barriers trick and the atomic memory move trick also.

There's a couple of other ways to achieve wait freedom on hazard pointer loads (protect) but they haven't been published so they're kind of moot.

j_seigh commented on Without the futex, it's futile h4x0r.org/futex/... · Posted by u/eatonphil

senderista · 5 days ago

Maybe you'd like Anthony Williams's _C++ Concurrency in Action_, which doesn't cover futexes (or how to write your own synchronization primitives in general), but does cover real-world details like memory orderings and SMR for lock-free data structures. If that's still too high-level, then maybe check out Paul McKenney's excellent free monograph "Is Parallel Programming Hard, And, If So, What Can You Do About It?" for a more hardware-focused perspective (which doesn't cover futexes in detail either but does direct you to the canonical reference for implementing futex-based primitives, namely Ulrich Drepper's "Futexes Are Tricky").

I think TAOMPP is fine for what it is (teaching high-level concurrency concepts) and discussing OS-level implementation details would be out of place. The important thing it teaches is how to think about concurrency, not how to write your own synchronization primitives. E.g., the Peterson or bakery locks are useless in the real world (as the book admits), but understanding their proofs of correctness will help you reason about the concurrent algorithms you have to write yourself.

j_seigh · 5 days ago

Bakery locks are good for spin locks. They're more cache friendly. Plus you can do reader/writer spin locks. They're going to be strictly FIFO though.

I guess you could tack on a futex wait for the spin wait in user space but it's going to be really inefficient. You are going to get a lot of spurious wake ups. Not one of the things futex's are designed for.

Lock-free with hazard pointers or RCU* is still kind of tricky. It's going to be data structure specific and you really have to know what you are doing.

Fun fact. You can make hazard pointers wait-free, actual wait free, not the dubious bounded retry loop hack.

* Doing copy on write with RCU is fairly straight forward but probably expensive if updates are frequent.

j_seigh commented on My bank keeps on undermining anti-phishing education moritz-mander.de/blog/my_... · Posted by u/cheesepaint

j_seigh · a month ago

Here's an interesting scheme. Some credit/debit card merchant accounts can arrange to get updated card info if your card expires and/or gets replaced. So if the merchant is a bad actor and doesn't charge your card directly but just tracks your updated card info so it can be used fraudulently elsewhere, you, your bank, and the card company will never know they were the source. And the card is linked to your bank account, you can replace it ad infinitum and the bad actors will get the updated info for the new card every time. The only way to break out of this is to close your bank account and open a new one.

j_seigh commented on Performance of memory reclamation for lockless synchronization [pdf] csng.cs.toronto.edu/publi... · Posted by u/fanf2

j_seigh · a month ago

One of the few places I get a citation. It's where the idea of asymmetric memory barriers came from. RCU is used as the quiescent states are context switches which gives you a memory barrier on the thread doing the context switch. The Linux asymmetric memory barriers, membarrier(), uses IPI to effect that.

Anyway, getting rid of the memory barrier in fast paths, speeds things up considerably.

j_seigh commented on Microsoft's big lie: Your computer is fine, and you don't need to buy a new one technical.ly/civic-news/w... · Posted by u/FlipperPA

j_seigh · 2 months ago

ChromeOS flex. It will run on hardware that even Linux complains about. I've even installed it on a Chromebook that stopped getting updates. Though, you have to replace the firmware, which entails some risk of bricking, so you can do a UEFI boot.

Also you can upgrade windows 10 to windows 11 even on hardware that Microsoft says is unsupported. Google for the workarounds. You only need to download the win11 iso from Microsoft and make a bootable USB stick using Rufus. Don't download anything from anywhere else. I wouldn't trust it.

j_seigh commented on The concurrency trap: How an atomic counter stalled a pipeline conviva.com/platform/the-... · Posted by u/delifue

kccqzy · 2 months ago

Right but isn't that difficult to implement in a library?

j_seigh · 2 months ago

Thread local vars would be used with lazy initialization. Clean up might be a little tricky depending on what implementation of thread local you use. Thread local support is not as great as it should be.

j_seigh commented on The concurrency trap: How an atomic counter stalled a pipeline conviva.com/platform/the-... · Posted by u/delifue

kccqzy · 2 months ago

I haven't seen a pure user space implementation of RCU that works well without kernel support. The RCU methodology is easy to understand, but how does a user space implementation understand when it is safe to dispose of a piece of data structure? Don't you need a primitive to run some code on each CPU?

j_seigh · 2 months ago

You just need to keep track of each thread's quiescent states however they are implemented.

j_seigh commented on Wait-Free Hazard Pointers Using Std Atomics threadnought.wordpress.co... · Posted by u/j_seigh

j_seigh · 2 months ago

A few ways to do wait-free hazard pointers, some portable, semi-portable, or not so much in this an subsequent blog posts. It's pretty high level, I'm assuming one knows how asymetric memory barriers and hazard pointers without the store/load memory barrier work.

Wait-free is interesting more from an academic point of view in that you can use hazard pointers to implement wait-free algorithms and still claim they are wait-free.

While one of them is 50% faster than lock-free hazard pointers, probably due to not having a conditional branch, the latter is pretty fast already.