Improving the performance of WAT parser

thramp · a month ago

We (the rust-analyzer team) have been aware of the slowness in Rowan for a while, but other things always took priority. Beyond allocation, Rowan is structured internally as a doubly-linked list to support mutating trees, but: 1. Mutation isn’t really worth it; the API isn’t user-friendly. 2. In most cases, it’s straight up faster to create a new parse tree and replace the existing one. Cache effects of a linked list vs. an arena!

In fairness, I don’t think we predicted just how large L1/L2 caches would get over the coming years.

testdelacc1 · a month ago

Is a rewrite of Rowan on the horizon?

thramp · 24 days ago

One of the rust-analyzer co-maintainers (Chayim Friedman) already rewrote it, but we can’t integrate it yet, as about 40 assists (the little lightbulbs?) still rely on mutation. If you want something Rowan-like, I think syntree and Biome’s rowan fork are good options to look into.

vjerancrnjak · a month ago

It’s funny how there is continuous reinvention of parsing approaches.

Why isn’t there already some parser generator with vector instructions, pgo, low stack usage. Just endless rewrites of recursive descent with caching optimizations sprinkled when needed.

embedding-shape · a month ago

Hardware also changes across time, so while something that was initially fast, people with new hardware tries it, finds it now so fast for them, then create their own "fast X". Fast forward 10 more years, someone with new hardware finds that, "huh why isn't it using extension Y" and now we have three libraries all called "Fast X".

zahlman · a month ago

Because you have to learn how to use any given parser generator, naive code is easy to write, and there are tons of applications for parsing that aren't really performance critical.

high_na_euv · a month ago

I'd say because parsing is very specific kind of work heavily dependent on the grammar you're dealing with

munificent · a month ago

A parser spends time:

1. Consuming tokens.

2. Recognizing the grammar.

3. Producing AST nodes.

Steps 1 and 3 are heavily dependent on the data types that make the most sense for the previous (lexing) and next (semantic analysis) phases of the compiler. There is no one Token type that works for every language, nor one AST type.

The recognizing the grammar part is relatively easy, but since so much of the code is consuming and producing datatypes that are unique to a given implementation, it's hard to have very high performance reusable libraries.

mgaunard · a month ago

There are good parser generators, but potentially not as Rust libraries.

westurner · a month ago

I decided to look; just found these:

chumsky (parser combinator): https://github.com/zesterer/chumsky

LALRPOP (LR(1)): https://github.com/lalrpop/lalrpop

grmtools (YACC) https://github.com/softdevteam/grmtools/ re: Other parsers: https://softdevteam.github.io/grmtools/master/book/othertool...

antlr4rust: https://github.com/rrevenantt/antlr4rust

writebetterc · a month ago

So it went from parsing at 25MiB/s to 115MiB/s. I feel like 115MiB/s is very slow for a Rust program, I wonder what it's up to that makes it so slow now. No diss to the author, good speedup, and it might be good enough for them.

mananaysiempre · a month ago

115 MiB/s is something like 20 to 30 cycles per byte on a laptop, 50 on a desktop. That’s definitely quite slow as far as a CPU’s capacity to ingest bytes, but unfortunately about as fast as it gets for scalar (machine) code that does meaningful work per byte. There may be another factor of 2 or 3 to be had somewhere, or there may not be. If you want to go meaningfully faster, as in at least at the speed of your disk[1], you need to stop doing work per byte and start vectorizing. For parsers, that is possible but hard.

[1] https://www.youtube.com/watch?v=p6X8BGSrR9w

zozbot234 · a month ago

A quick rule of thumb is that one or two bytes per peak clock cycle per core or so (not unlike an old 8 bit or 16 bit machine!) is the worst case for memory bandwidth when running highly multithreaded workloads that heavily access main RAM outside cache. So there's a lot of gain to be had before memory bandwidth is truly saturated, and even then one can plausibly move to GPU-based compute and speed things up further. (Unified memory+HBM may potentially add a 2x or 3x multiplier to this basic figure, but either way it's in the ballpark.)

high_na_euv · a month ago

"for Rust program"?

Isnt it more about the grammar than the prog lang?

writebetterc · a month ago

The grammar matters also, of course. A pure Python program is going to be much slower than the equivalent Rust program, just because CPython is so slow.

I don't know if this does semantic analysis of the program as well.

dfajgljsldkjag · a month ago

The performance gain from using a single shared vector for the nodes is pretty crazy. It just goes to show how much allocation overhead can slow things down if you are not careful.

epage · a month ago

> Use hand-written parser > > The old parser was written with winnow which is a parser combinator library. While it’s easy to create a parser with parser combinators, it’s generally slower than a hand-written parser, so the first step is to write the parser by hands. Hand-written parser is not only faster but also allows to do more optimizations in the future.

Maintainer of Winnow here. I wish there were more details on this. I switched `toml` / `toml_edit` to being hand written and got some performance boost but I feel like the other things listed would have dwarfed the gains that I got. I wonder if there were sub optimal patterns they employeed that we could find ways to help better guide people.

For anyone going on about "hand written is always better", I disagree. Parser combinators offer a great way to map things back to grammar definitions which makes them much easier to maintainer. Only in extreme circumstances of features and/or performance does it seem worth going hand-written to me.

gplane · a month ago

> "hand written is always better", I disagree. - Yep. As far as I know, winnow provides SIMD in some cases, while for hand written parsers, writing SIMD can be very hard.

taylorallred · a month ago

It seems to me like parser combinators are always more trouble than they're worth. People often have the impression that parsing is difficult and should be outsourced to another library, but often it's pretty simple to hand-roll and usually it makes faster code.