Faster IndexOf for Substrings in .NET

Is it faster than Boyer Moore?

ygra · 4 years ago

I guess the answer is ... complex. As explained in the linked source for the algorithm (http://0x80.pl/articles/simd-strfind.html#introduction):

The main problem with these standard algorithms is a silent assumption that comparing a pair of characters, looking up in an extra table and conditions are cheap, while comparing two substrings is expansive.

But current desktop CPUs do not meet this assumption, in particular:

• There is no difference in comparing one, two, four or 8 bytes on a 64-bit CPU. When a processor supports SIMD instructions, then comparing vectors (it means 16, 32 or even 64 bytes) is as cheap as comparing a single byte.

• Thus comparing short sequences of chars can be faster than fancy algorithms which avoids such comparison.

• Looking up in a table costs one memory fetch, so at least a L1 cache round (~3 cycles). Reading char-by-char also cost as much cycles.

• Mispredicted jumps cost several cycles of penalty (~10-20 cycles).

• There is a short chain of dependencies: read char, compare it, conditionally jump, which make hard to utilize out-of-order execution capabilities present in a CPU.

There probably is a trade-off to be made. The more complex algorithms may make sense when the initial setup cost can be amortized and perhaps more things are known about the input and search strings. The dead-simple approach of simply running linearly through the string has no setup cost, can be trivially vectorized and may perform very well for most haystack/needle pairs and thus is probably a good idea for the general algorithm in the library. And in fact, the information that's required to decide whether the more complex algorithms are worth it is perhaps not known to string.IndexOf, so that's probably something that application developers have to decide.

magicalhippo · 4 years ago

> And in fact, the information that's required to decide whether the more complex algorithms are worth it is perhaps not known to string.IndexOf, so that's probably something that application developers have to decide.

It might not even be known to the application developers or, potentially worse, they might think they know but the program usage evolves and it becomes wrong knowledge.

For a JIT'ed language though one could imagine more smartness. In theory the runtime could use a sampling profiler in the background to determine any eligible library functions that are taking a lot of time based on caller.

For example, this specific call to IndexOf is taking a lot of time. The runtime could then replace that call to IndexOf with a variant that does some instrumentation, to for example build crude histograms of input lengths and hit locations. Based on this the runtime could replace that call to IndexOf with a version more optimized to the expected parameters.

At least this is possible for a JIT'ed language, though surely would be a non-trivial amount of work.

nickysielicki · 4 years ago

One of the most interesting things about the emphasis on algorithms in both CS education and in tech interviews to me is that when you start to look at how people really make high performance software, it rarely comes down to figuring out how to jump from O(n) to O(logn), more often than not it has to do with thinking about memory locality and caching or avoiding branches or fitting the problem into something can be vectorized. Picking the correct algorithm with minimal algorithmic complexity tends to be the easy part. But outside of roles where performance actually matters, you’re probably not going to get asked about these things. And that’s totally reasonable, but why do we ask people about algorithms, then?

Deleted Comment

SMART (fastest substring search algorithms) https://smart-tool.github.io/smart/ is up for several years. EPSM, by S. Faro and O. M. Kulekci still the fastest known strstr method, and they still come up with their own.

.NET turning more and more into Linux.

Someone · 4 years ago

Various links on that page (e.g. https://www.dmi.unict.it/~faro/smart/howto.php) are dead for me, and browsing the source isn’t easy because they release .zip/.tar.gz. files (https://github.com/smart-tool/smart/releases), so I didn’t make that effort, but theoretical algorithms often count character or byte comparisons, not nanoseconds. For the latter, code size and cache-friendliness matter, and the best code almost always must be tuned for one specific CPU, nowadays one specific state of a CPU (do you assume the SIMD hardware is powered up? If it isn’t, is using SIMD for short strings worth it?)

There probably also is a choice to be made as to what cases to optimize most for: length of search string, length of to-be-searched for string, case-sensitivity, diacritics-sensitivity, distribution of results (if, most of the time, the string is found close to the start, decreasing algorithm set up time at the cost of some speed may be worth it). ¿Are there corner cases where locale matters?

Const-me · 4 years ago

> the best code almost always must be tuned for one specific CPU

There’re very few relevant differences between CPUs of the same instruction set. The author of that PR implemented fast paths for SSE2, AVX2, and NEON. Combined, they cover like 90% of CPUs; the only notable missing one is the 32-bit ARMv7, C# doesn’t support vector intrinsics for that architecture.

> do you assume the SIMD hardware is powered up?

Modern AMD64 CPUs never power off just the SIMD. For AMD64 ISA, SSE1 and SSE2 are included in the base instruction set. Modern C++ compilers compile FP32 and FP64 math into SSE1 and SSE2 respectively, e.g. double y=x*3.0 compiles into MULSD instruction from SSE2 SIMD subset. Powering down that hardware would be too expensive for many programs.

Some early Intel’s AVX CPUs, like Sandy Bridge from 2011, powered down half of their SIMD hardware. On these CPUs, first time a program uses AVX instructions which process 32-byte vectors, the instructions run at half the throughput and +1 cycle of latency. The performance penalty lasts until the state transition is complete, like ~100 nanoseconds or something. AFAIK, for 32-byte AVX vectors, that got fixed years ago in the newer processors.

orf · 4 years ago

You can of course browse the source on GitHub: https://github.com/smart-tool/smart/tree/master/source

The tarballs are just a feature of GitHub, not something they specifically release.

pjmlp · 4 years ago

Having a search algorithm implemented in C has nothing to do with Linux.

oaiey · 4 years ago

I am not deep into the topic. Is that maybe because of a SIMD variant is possible in one algorithm and not the other. .NET is doing a lot of processor specific optimization right now and getting from that it's current speed. On pure paper approach, SIMD is not a thing but on processors it is.

Just waking up. Maybe I am completely off.

charcircuit · 4 years ago

That project includes implementations which use simd. I don't know if those are the fastest or how it compares to this proposed one.

tester34 · 4 years ago

What does algorithm (approach) has to SIMD / Hardware intrinsics?

tsimionescu · 4 years ago

Some algorithms are amenable to SIMD, others are not. To utilize SIMD, you need the same set of operations on big chunks of data stored one next to to the other in memory. This fits for example with algorithms which use arrays to store data, and doesn't work at all for alogrithms which rely on linked lists.

manigandham · 4 years ago

Interesting submission considering it's still in draft status.

As a user of .NET since version 2.0 though, I'm impressed and excited by how the platform has evolved and how much performance has become a focus.

secondaryacct · 4 years ago

As a new joiner from Java on applications that are decades old, I wish it would be more backward compatible.

I cant forgive they didnt find a way to fix WinForms, that we cant find the budget to move away from and limp along trying to upgrade out of.

But at least unlike Java, their GUIs do work well enough for a fraction of the maintenance cost of what I know. So meh, begging the sky to find anyone willing to do .NET in 2022 so I can go back to my cave lol. All the OG experts we interview are interested by our path to Java because they want out and Im like "but dude we'll teach you Java but we really need someone who has a clue in C#/.NET and likes it" :s

hvidgaard · 4 years ago

You can run your WinForms application on the latest LTS version of dotnet, .NET6. As always, upgrading to a new version of the framework, means you have to migrate some parts of the codebase, especially from an older .net3-4. That isn't really that different from Java

If I'd have candidates that want out from .NET I'd be very intrigued and want to know why. Specifically, I don't want any that want out because the tech was changing rapidly. I can understand the overhead is somewhat annoying, but it's for the better and it seems to be a lot more stable again.

Quarrelsome · 4 years ago

No idea where this Java elitism is coming from, dotnet and Java are ridiculously similar. I'd do dotnet for you in 2022 if you're willing to pay me more than my current corp with the same benefits, the language/platform is just fine.

Winform is also fine its just bad at transparency and most applications that are written using it are poorly written.

talos2110 · 4 years ago

What do you mean about WinForms? Look at this https://devblogs.microsoft.com/dotnet/whats-new-in-windows-f... from a couple of months ago, you can still use it pretty well

There are millions of .NET developers ready and able to do whatever you need. I do find that the market for .NET is very underpaid though compared to the more popular stacks used at FAANG companies.

Are you really having that much trouble finding devs?

MauroIksem · 4 years ago

Where are you hiring? I'm primarily a .net dev at a uni in the states. I'm surprised you're having trouble finding someone.

We do both stacks, on the backend we have the problem that most shops don't really buy into the whole .NET Core stack for Linux workloads story unless they happen to be Azure shops.

Beltiras · 4 years ago

I have a clue about C#/.NET. Did it professionally for almost 2 years. That's why I don't like it.

lifthrasiir · 4 years ago

The replacement algorithm: http://0x80.pl/articles/simd-strfind.html#algorithm-1-generi... (basically, match the first and last character and extract possible candidates using vector predicates)

The original algorithm by comparison was doing IndexOf for the first character then match the rest naively and was very slow even compared to strstr and std::string::find compared in the Wojciech Muła's page.

AtNightWeCode · 4 years ago

Nice improvement. I think one of the mistakes in .NET was to not have ordinal string comparison as the default for all string operations. Sometimes people forget it and it causes security and performance issues.

|--------------- |----------:|---------:|---------:|

| IndexOfOrdinal | 47.71 ns | 0.963 ns | 2.344 ns |

| IndexOfDefault | 201.93 ns | 3.893 ns | 4.923 ns |

afdsafsfasd · 4 years ago

What is an "ordinal string comparison"? To contrast, what is a non-ordinal string comparison?

ChrisSD · 4 years ago

A string is encoded in memory as an array of numbers so in this case "ordinal" means that it simply compares the two arrays number by number.

A non-ordinal string comparison may take into account other things such as iterating by actual characters and comparing them in language and culture-aware ways. This is useful for ordering things by name when showing a list to users.

Ordinal is plain binary comparison. The default comparison takes language specifics into account. For instance, in German there is a letter called Eszett (ß) that stands for "ss". IndexOf("ss") will find that.

Thiez · 4 years ago

Ordinal compares the literal bytes. The other comparison variants have additional features, e.g. case-insensitivity.

Sammi · 4 years ago

Such a Anglo centrist view. Rest of the world be damned? Luckily a not all the .Net creators were English speakers.

alkonaut · 4 years ago

The default comparison should be a culture neutral comparison. Just like the default number formatting unless specified should default to "NumberFormatInfo.InvariantInfo" rather than "Thread.CurrentThread.CurrentCulture".

If I want to compare or print in my own language or the system language (I very rarely do) then I'm happy to specify that in code. And I wish the API forced me to, because I some times forget and I make code that breaks when it's run on a different machine. Which is almost never what I want.

Denvercoder9 · 4 years ago

How is defaulting to byte-comparing strings Anglocentrist?

torginus · 4 years ago

Not an Anglo. Still think it's dumb. Most of the strings I manipulate are either english, or not even natural languages. Having to write Ordinal for Regex and string comparisons everywhere sucks. Especially considering that if you don't, it will depend on the current thread's Culture setting, which can lead to odd bugs when you call code from different contexts.

It's probably an unfortunate decision either way: For interacting with things the user types the ordinal approach is wrong, but for low-level handling of text the ordinal approach is faster. In both cases you basically have to know your exact needs and choose the correct variant. You could have the parameter not being optional and thus force every user of that API to make that decision, but that is probably fairly quickly perceived as annoying.

The default and recommended way of comparing strings in .NET is with ordinal comparison. It is just some functions that use CurrentCulture instead for some reason. We are also talking about Unicode strings here so you can compare strings in any language.

valleyjo · 4 years ago

The .Net analyzers have a feature to check for StringComparison use when available. I tuned it on for my team after a few bugs occurred from forgetting it.

_3u10 · 4 years ago

ryanmccullagh · 4 years ago

the c# code is looking a lot like C with the goto’s. Great to see this style in c#.

andix · 4 years ago

If you want high performance C# code, it needs to look like this. Very much like C, rather unreadable.

But form my experience, only a very tiny fraction of your code needs to be like that, as in most of the code maintainability and readability is much more important than performance.

rawling · 4 years ago

3 gotos replacing while(true) loops, and one... just reducing nesting?

dubiousconst281 · 4 years ago

I'm guessing the reason is because of code gen issues, there's a lot of workarounds throughout the BCL code base.

rurban · 4 years ago