The Shunting-Yard Algorithm (2011)

A few months ago I wanted to add arithmetic parsing to my game's debug console, and went with this approach.

While researching the area, I came across this exact blog post, and thought it was perfect. It explains things clearly and concisely (without even needing code/pseudocode). Granted, it's a simple algorithm, but this post was just so much nicer than any of the (higher ranked) results that Google gave me. And it has no ads!

It made me sad. Years ago, it seemed like Google helped me find lots of nice little pages like this one. It's much harder to find web articles of this quality anymore.

Back to this article specifically, I particularly appreciated the "Advanced Usage" section, where it briefly (and clearly!) goes over some useful extensions to the algorithm. These were very much on my mind (since I wanted parentheses and function calls). They are simple extensions to a simple algorithm, but it's nice to have them summarized for you before you dive in.

It was a breezy joy to implement (did it even take an hour?). And I can think of other easy ways to extend its capabilities further. This style of arithmetic parsing would be a wonderful thing for beginner programmers to cut their teeth on. It exposes them to a number of important computer/programming ideas, yet is simple to implement while also being directly useful.

Some limitations of this particular article:

* it doesn't mention that operator-precedence (usually: Pratt) parsing is another alternative

* it doesn't mention that you don't have to evaluate the expression directly; it's perfectly possible to output an AST, RPN, or some kind of minimal bytecode.

* unless you do a bit of optimization, you can't reduce the number of "registers" needed à la Schneider. While generally less relevant for interpreters unless you do some careful platform-dependent design, interpreters can easily benefit from the one-register case.

* you're allowed to use an ahead-of-time LR machine generator, you don't have to encode its tables by hand (but for simple grammars it's not actually that hard to work out with comments).

* it doesn't sufficiently mention that you really want to work with IDs in a lot of cases; among other things this avoids further unary/binary problems (though I suppose synthesizing an extra argument also works).

kybernetikos · 2 years ago

I think your second point actually is covered:

> Note that “applying” an operator can mean a couple of different things in this context. You could actually execute the operators, in which case the operands would be numerical values of all the terms and subexpressions; you could also build a syntax tree, in which case the operands would be subtrees. The algorithm works the same way in either case.

Syzygies · 2 years ago

As for "you don't have to evaluate", from the article:

> you could also build a syntax tree, in which case the operands would be subtrees.

beautron · 2 years ago

o11c · 2 years ago

thu · 2 years ago

I've used that algorithm and extended it to handle distfix expressions[1]. For instance it can parse the expression

    1 + if true then 1 else a b + c

as (represented as an s-expr using some weird angle brackets):

    ⟨+ 1 ⟨if␣then␣else␣ true 1 ⟨+ ⟨a b⟩ c⟩⟩⟩

I thought it was cool and could be useful to a lot of simple use cases.

[1]: https://github.com/noteed/syntactical

eythian · 2 years ago

I remember covering that at university, I don't think in too much depth, just enough that a few years later at my job I needed to do some parsing of equations and immediately knew the name of what I needed to use.

I then proceeded to extend it a little for my purposes (supporting functions iwth variable numbers of arguments) which was fun, and got a surprising amount of people showing up in the blog post comments: https://blog.kallisti.net.nz/2008/02/extension-to-the-shunti...

kitd · 2 years ago

Back in the early 90s, I learned C from a book called "Illustrated C" (I forget the author) that taught C programming & internals pictorially, which strongly appealed to my brain.

One of the sections was an implementation of this algorithm, and it made me fall in love with programming and the power of well-designed algorithms.

DeathArrow · 2 years ago

A more succint presentation here: https://www.andreinc.net/2010/10/05/converting-infix-to-rpn-...

vlowrian · 2 years ago

Besides being a relatively easy to understand parsing algorithm, it's also pretty efficient. We're using the Shunting-Yard for Sonic, our in-house .NET expression evaluation engine used in all of our simulation products.

Since performance is a major concern, we compared a lot of algorithms and frameworks and Shunting Yard really hits the spot being both fast and easy to grasp, producing maintainable code.

For anyone interested in seeing code, we recently open sourced Sonic: https://github.com/adletec/sonic

kaladin-jasnah · 2 years ago

Interestingly, I've heard of this algorithm before as "Dijkstra's two-stack parsing algorithm," but not as the seemingly more common name of the "Shunting-yard algorithm."