burntsushi (u/burntsushi)

burntsushi commented on Arch shares its wiki strategy with Debian lwn.net/SubscriberLink/10... · Posted by u/lemper

I have no issue with GNU, Linux or Debian. The opposite, I am postulating that we would all be better if every one worked on those instead of creating yet another distro or grep clone, even if they provide their creators with satisfaction.

As for ensuring - how it is, that in 2025 AD we have more FOSS projects than ever, yet your typical computer user has less freedom and privacy than, let's say, in 2000 AD?

burntsushi · 11 days ago

Bad take. If you can only ever improve what's there, there is no opportunity to try something new. For grep specifically, you can't much about its defaults, which makes "innovating" on its user experience very difficult.

burntsushi commented on Faster substring search with SIMD in Zig aarol.dev/posts/zig-simd-... · Posted by u/todsacerdoti

ww520 · 14 days ago

Excellent write up. I really appreciate the clear and detailed explanation of the algorithm.

The nice thing about Zig’s SIMD operation is the register size support is transparent. You can just declare a 64-byte vector as the Block, and Zig would use an AVX256 or two AVX2 (32-byte) registers behind the scene. All other SIMD operations on the type are transparently done with regard to the registers when compiled to the targeted platform.

Even using two AVX2 registers for 64 bytes of data is a win due to instruction pipelining. Most CPU have multiple SIMD registers and the two 32-byte data chunks are independent. CPU can run them in parallel.

The next optimization is to line the data up at 64 byte alignment, to match the L1/L2 cache line size. Memory access is slow in general.

burntsushi · 14 days ago

Another challenge may be as a result of using a portable SIMD API instead of specific ISA instructions. I'm specifically thinking about computing the mask, which on x86-64 is I assume implemented via movemask. But aarch64 lacks such an instruction, so you need to do other shenanigans for the best codegen: https://github.com/BurntSushi/memchr/blob/ceef3c921b5685847e...

burntsushi commented on Faster substring search with SIMD in Zig aarol.dev/posts/zig-simd-... · Posted by u/todsacerdoti

teo_zero · 14 days ago

How strange! I was about to add a comment that I would probably stick to SSE2 or something like that to be sure my code suits as large an audience as possible, including CPUs from more than 10 years ago, ARM, etc.

Case in point: I've been very disappointed lately when I wanted to try Ghostty on my laptop and the binary compiled for Debian failed to run due to an invalid instruction. I don't want to force the same experience to others.

burntsushi · 14 days ago

This is sort of a category error. I don't know what ghostty is doing (or perhaps its distributor), but lots of widely used software (including my own ripgrep, but also even things like glibc) will query what the CPU supports at runtime and dispatch to the correct routine based on that. So things like, "I'm only going to stick to SSE2 for maximal compatibility" have a false assumption baked into them.

burntsushi commented on Faster substring search with SIMD in Zig aarol.dev/posts/zig-simd-... · Posted by u/todsacerdoti

expenses3 · 14 days ago

How is it quadratic? You do 1000 checks every character in the haystack but that's still O(n)

burntsushi · 14 days ago

The worst case is that `std.mem.eql`[1] is executed at every position in the haystack, which gives you `O(n*m)` time complexity. Several substring search algorithms are `O(n)`.

[1]: https://github.com/aarol/substr/blob/9392f9557de735929dfb79e...

burntsushi commented on Faster substring search with SIMD in Zig aarol.dev/posts/zig-simd-... · Posted by u/todsacerdoti

burntsushi · 14 days ago

Good write-up. This is a very popular approach to substring search! It is still worst case `O(m*n)` though. Do you have a fallback implementation like the `memchr` crate has to guarantee `O(m+n)`?

I'll also push back on some bits in the end:

    > But if it’s so much better, then why haven’t I made a pull request to
    > change std.mem.indexOf to use SIMD? Well, the reason is that
    >
    > std.mem.indexOf is generic over element size, and having a size
    > larger than u8 makes the algorithm much slower
    >
    > The algorithm used in stdmem.indexOf is cross-platform, while the
    > SIMD code wouldn’t be. (not all platforms have SIMD registers at all,
    > Arm has only 128-bit)

Does Zig not have a way to specialize this for sequences of unsigned 8-bit integers? If not, and you're thereforce force to used a more generic algorithm, that seems pretty unfortunate.

    > Substring searching is rarely the bottleneck in programs,
    > especially ones written in a fast language like Zig. That’s why
    > I don’t personally think it would be worth it to add it to the
    > standard library.

Oh I'm not sure I buy this at all! Substring search is a primitive operation and easily can be a bottleneck. There's a reason why widely used substring search implementations tend to be highly optimized.

burntsushi commented on Faster substring search with SIMD in Zig aarol.dev/posts/zig-simd-... · Posted by u/todsacerdoti

codethief · 14 days ago

I suppose generalizing the approach to UTF-32 should be straightforward, but variable-length encodings like UTF-8 and UTF-16 might be more involved(?) Either way, I'm sure BurntSushi found a solution and built it into ripgrep.

burntsushi · 14 days ago

ripgrep always deals with UTF-8. When it sees a different encoding, like UTF-16, ripgrep first transcodes to UTF-8 and then searches.

This is absolutely in part because of all of the byte oriented optimizations that are baked into ripgrep (and its regex engine). Note that I said a part. Making ripgrep (and its regex engine) work on things other than a sequence of bytes is far more difficult than just porting a bunch of SIMD algorithms. There are also many optimizations and architectural constraints in the code based on the alphabet size. That is, with 8-bit integers, its alphabet size is 256. With 16-bit integers, the alphabet size is 65,536.

burntsushi commented on Faster substring search with SIMD in Zig aarol.dev/posts/zig-simd-... · Posted by u/todsacerdoti

codethief · 14 days ago

> If you just encoded your string to bytes naïvely

By "naïvely" I assume you mean you would just plug in UTF-8 bytestrings for haystack & needle, without adjusting the implementation?

Wouldn't the code still need to take into account where characters (code points) begin and end, though, in order to prevent incorrect matches?

burntsushi · 14 days ago

IDK what "encoded your string to bytes naively" means personally. There is only one way to correctly UTF-8 encode a sequence of Unicode scalar values.

In any case, no, this works because UTF-8 is self synchronizing. As long as both your needle and your haystack are valid UTF-8, the byte offsets returned by the search will always fall on a valid codepoint boundary.

In terms of getting "combining characters wrong," this is a reference to different Unicode normalization forms.

To be more precise... Consider a needle and a haystack, represented by a sequence of Unicode scalar values (typically represented by a sequence of unsigned 32-bit integers). Now encode them to UTF-8 (a sequence of unsigned 8-bit integers) and run a byte level search as shown by the OP here. That will behave as if you've executed the search on the sequence of Unicode scalar values.

So semantically, a "substring search" is a "sequence of Unicode scalar values search." At the semantic level, this may or may not be what you want. For example, if you always want `office` to find substrings like `oﬃce` in your haystack, then this byte level search will not do what you want.

The standard approach for performing a substring search that accounts for normalization forms is to convert both the needle and haystack to the same normal form and then execute a byte level search.

(One small caveat is when the needle is an empty string. If you want to enforce correct UTF-8 boundaries, you'll need to handle that specially.)

burntsushi commented on My bytecode optimizer beats Copilot by 2x deviantabstraction.com/20... · Posted by u/top256

dhosek · 22 days ago

Anything built to purpose (by a competent dev) will usually beat out a general purpose tool. I remember burntsushi being surprised that my purpose-built unicode segmentation code so dramatically outperformed the unicode segmentation he had in bytestring which was based on regular expressions, but personally I would be surprised if it were any different.

burntsushi · 22 days ago

Do you have a link to my surprised? I would be surprised if I were surprised by a purpose built thing beating something more general purposed. :P

burntsushi commented on RE#: High performance derivative-based regular expression matching (2024) arxiv.org/abs/2407.20479... · Posted by u/fanf2

kazinator · a month ago

> The first industrial implementation of derivatives for standard regexes in an imperative language (C#) materialized a decade later [Saarikivi et al. 2019]

Nope; I did it in TXR in early 2010:

  b839b5a212fdd77c5dc95b684d7e6790292bb3dc    Wed Jan 13 12:24:00 2010 -0800    Impelement derivative-based regular expressions.

burntsushi · a month ago

What is TXR? Was this implementation really "industrial"? Did it have the caching present in RE# to avoid worst case exponential compile times? Did it support Unicode? Did it have prefilters? What kind of match semantics did it support?

"industrial" in this context to me means something like, "suitable for production usage in a broad number of scenarios."

IDK if RE# lives up to that, but their benchmark results are impressive. This paper is a year old. In which production systems is RE# used?

burntsushi commented on RE#: High performance derivative-based regular expression matching (2024) arxiv.org/abs/2407.20479... · Posted by u/fanf2

def-lkb · a month ago

https://sourceforge.net/projects/libre/ dates back to 2001. (One could object it is not imperative enough, whatever that means :))

burntsushi · a month ago

The claim here wasn't just "first implementation of derivatives." It was a far more precise "first industrial implementation of derivatives for standard regexes in an imperative language."