Readit News logoReadit News
viega commented on RE#: how we built the fastest regex engine in F#   iev.ee/blog/resharp-how-w... · Posted by u/exceptione
cognisent · 11 days ago
I guess _ is trying to be like, "No, really, anything," while . has some limitations?
viega · 10 days ago
.* does NOT match newlines, so always will stop a match at the end of a line.
viega commented on Put a ring on it: a lock-free MPMC ring buffer   h4x0r.org/ring/... · Posted by u/signa11
loeg · 3 months ago
Why would you want an MPMC queue primitive, instead of just using MPSC per consumer? I don't really see a discussion of the cache contention issues. (There is some mention of contention, but it seems the author is using the word in a different way.)

It looks like both enqueue and dequeue are RMWing the "epochs" variable? This can pretty easily lead to something like O(N^2) runtime if N processors are banging this cache line at the same time.

viega · 3 months ago
For me, I’ve got use cases where it’s valuable to keep event data interleaved because it will also get used in flight. It works well enough I also use it for things where it’s not necessary like in memory debug rings (which requires a bit of additional work).

The epoch isn’t CAS’d; it is FAA. The epoch is then used to determine if there is contention due to the tail meeting the head, or due to a wrap-around due to slow writes.

There’s also a back-off scheme to ease contention for a full queue.

Though, I did originally have a variant that adds a fairly straightforward ‘help’ mechanism that makes the algorithm wait-free and reduces the computational complexity.

However, the extra overhead didn’t seem worth it, so I took it out pretty quickly. Iirc, the only place where the ring in the queue wouldn’t out-perform it are on tiny queues with a huge write imbalance.

If you go run the tests in the repo associated w the article, you probably will see that a ring with only 16 entries will tend to start being non-performant at about a 4:1 writer to reader ratio. But iirc that effect goes away before 128 slots in the ring.

There, the ring still fits in a page, and even with a big imbalance I can’t remember seeing less than 1m ops per second on my Mac laptop.

Real world observational data beats worst case analysis, and I’ve never seen an issue for scenarios I consider reasonable.

But, if my unrealistic is realistic to you, let me know and I can break out the wait free version for you.

viega commented on Put a ring on it: a lock-free MPMC ring buffer   h4x0r.org/ring/... · Posted by u/signa11
viega · 3 months ago
You can use a 64-bit CAS if you want to use a 32-bit epoch and a pointer compression scheme of any kind, or just a 32-bit index into regions that are thread specific. I think I did the later when I did the original work, using the core primitive to build ring buffers that have arbitrary sized slots instead of 64-bit slots (which requires a bit of additional gymnastics, but the basic trick is to have the ring index into a bigger ring that you can FAA into, where the bigger ring has more slots by at least the max number of threads (I use this primitive heavily still for in-memory debug logging). Maybe at some point I'll do an article on that too.
viega · 3 months ago
BTW, should be noted that the need to issue a cache line lock on x86 does seem to slow down 128-bit CAS quite a bit on x86-64 platforms. On arm64, there's no reason to skimp with a 64-bit CAS.
viega commented on Put a ring on it: a lock-free MPMC ring buffer   h4x0r.org/ring/... · Posted by u/signa11
xavierxwang · 3 months ago
Is there any chance to modify Vyukov's MPMC queue implement (https://www.1024cores.net/home/lock-free-algorithms/queues/b...) to support drop handler? That work doesn't need 128 bit CAS.
viega · 3 months ago
You can use a 64-bit CAS if you want to use a 32-bit epoch and a pointer compression scheme of any kind, or just a 32-bit index into regions that are thread specific. I think I did the later when I did the original work, using the core primitive to build ring buffers that have arbitrary sized slots instead of 64-bit slots (which requires a bit of additional gymnastics, but the basic trick is to have the ring index into a bigger ring that you can FAA into, where the bigger ring has more slots by at least the max number of threads (I use this primitive heavily still for in-memory debug logging). Maybe at some point I'll do an article on that too.
viega commented on Put a ring on it: a lock-free MPMC ring buffer   h4x0r.org/ring/... · Posted by u/signa11
atq2119 · 3 months ago
Looks like a good write up, but I'd caution that some of the statements about memory models aren't completely accurate.

The terms relaxed, acquire, and release refer to how an atomic operation is ordered against other accesses to memory.

Counter to what the article states, a relaxed atomic is still atomic, meaning that it cannot tear and, for RMW atomic, no other access can go between the read and the write. But a relaxed atomic does not order other accesses, which can lead to unintuitive outcomes.

By contrast, once you've observed another thread's release store with an acquire load, you're guaranteed that your subsequent memory accesses "happen after" all of the other thread's accesses from before that release store -- which is what you'd intuitively expect, it's just that in modern systems (which are really highly distributed systems even on a single chip) there's a cost to establishing this kind of guarantee, which is why you can opt out of it with relaxed atomics if you know what you're doing.

viega · 3 months ago
Yes, I meant to clarify the memory model discussion; I had tried to simplify and did a poor job; I got similar feedback after it was published, and never remembered to get to it. Will try to do it soon, though it's about the worst time for this to have hit, not sure when I'll be able to sit down for it, but will try to get it done in the next day. Hopefully it doesn't wait until next time it gets some views.
viega commented on Put a ring on it: a lock-free MPMC ring buffer   h4x0r.org/ring/... · Posted by u/signa11
auxym · 3 months ago
viega · 3 months ago
So that work came after mine, and seems to be a FIFO not a ring buffer. The library I built at the time also had FIFOs and LIFOs that were tweaks on previous algorithms, but nothing earth shaking. I'll check this one out when I can though.
viega commented on Put a ring on it: a lock-free MPMC ring buffer   h4x0r.org/ring/... · Posted by u/signa11
WeaselNo7 · 3 months ago
Strange to see a lock-free ring buffer without seeing mention of LMAX/Martin Thompson's Java Disruptor (https://github.com/LMAX-Exchange/disruptor)
viega · 3 months ago
I don't think I even saw this until I published the article. I don't think it was academically published, or googled well back in 2000. Nor did it match my needs for a ring buffer at the time, which was to drop stale data (I think in that algorithm, write operations fail when the buffer is full), so if I did see it, I wouldn't have payed enough attention to notice if it even had users. It's good work for sure, but that's why it didn't get mentioned.
viega commented on Put a ring on it: a lock-free MPMC ring buffer   h4x0r.org/ring/... · Posted by u/signa11
withinboredom · 3 months ago
This isn't that new. (see: FASTER: A Concurrent Key-Value Store with In-Place Updates. 2018 ACM SIGMOD International Conference on Management of Data and related papers)

However, this is well written and very easy to read.

viega · 3 months ago
Well, when I was doing the original work on it (about 5 years ago now), I spent a lot of time trying to find something else in the literature. I couldn't find anything that wasn't SPMC or MPSC, unless it had severe limitations, like not actually having a drop policy when full.

However, I definitely did not see the paper you've sited, but just spent a few minutes with the paper you cited. Section 5.2 seems to cover their version. It's far from clear what their algorithm is, but they are talking about page eviction; it doesn't seem like they're using even a single fixed array for the ring, but I'm not 100% sure because it's pretty light on any detail. a

viega commented on Fun-reliable side-channels for cross-container communication   h4x0r.org/funreliable/... · Posted by u/viega
restlake · 4 months ago
super interesting pseudo IPC channel and at least mildly concerning from a security perspective. saw it on your site first and am shocked there is not a single other comment yet here

was hoping to find at least one “cmon this is easy to avoid with X thing in the kernel/OS” info nugget dropped

viega · 4 months ago
Agreed that this is not a critical problem, and the cooperative side channel can be useful in otherwise uncooperative environments.

The article does mention wanting to coordinate across multiple identical processes running on the same node in a wide variety of environments as the motivator.

So maybe it should be a feature, not a bug :)

u/viega

KarmaCake day94April 30, 2024
About
25 years of security + development, but really I never stopped being a compiler person. Co-author of AES-GCM. Co-founder of Crash Override.
View Original