Good write-up. This is a very popular approach to substring search! It is still worst case `O(m*n)` though. Do you have a fallback implementation like the `memchr` crate has to guarantee `O(m+n)`?
I'll also push back on some bits in the end:
> But if it’s so much better, then why haven’t I made a pull request to
> change std.mem.indexOf to use SIMD? Well, the reason is that
>
> std.mem.indexOf is generic over element size, and having a size
> larger than u8 makes the algorithm much slower
>
> The algorithm used in stdmem.indexOf is cross-platform, while the
> SIMD code wouldn’t be. (not all platforms have SIMD registers at all,
> Arm has only 128-bit)
Does Zig not have a way to specialize this for sequences of unsigned 8-bit integers? If not, and you're thereforce force to used a more generic algorithm, that seems pretty unfortunate.
> Substring searching is rarely the bottleneck in programs,
> especially ones written in a fast language like Zig. That’s why
> I don’t personally think it would be worth it to add it to the
> standard library.
Oh I'm not sure I buy this at all! Substring search is a primitive operation and easily can be a bottleneck. There's a reason why widely used substring search implementations tend to be highly optimized.
I'm the author of the post, thanks for your feedback! I was inspired by your comment on HN a while back and started learning about this stuff, reading the source code of `memchr` was especially great.
You're totally right about the first part there was a serious consideration to add this to zig's standard library, there would definitely need to be a fallback to avoid the `O(m*n)` situation.
I'll admit that there are a lot of false assumptions at the end, you could totally specialize it for u8 and also get the block size according to CPU features at compile time with `std.simd.suggestVectorSize()`
I'll also push back on some bits in the end:
Does Zig not have a way to specialize this for sequences of unsigned 8-bit integers? If not, and you're thereforce force to used a more generic algorithm, that seems pretty unfortunate. Oh I'm not sure I buy this at all! Substring search is a primitive operation and easily can be a bottleneck. There's a reason why widely used substring search implementations tend to be highly optimized.You're totally right about the first part there was a serious consideration to add this to zig's standard library, there would definitely need to be a fallback to avoid the `O(m*n)` situation.
I'll admit that there are a lot of false assumptions at the end, you could totally specialize it for u8 and also get the block size according to CPU features at compile time with `std.simd.suggestVectorSize()`