Instead of interpreters, if one has less of a "must be a full featured prog.lang" mentality and a fast compiler like Go or Nim [1] (or is willing to wait, for slow optimizing compiles to apply against big data sets) then an end-to-end simpler design for "one-liners" (or similarly simple programs) is the whole program generator. Maybe "big IFs", but also maybe not.
To back up my simplicity claim, consider rp [2] -- like 60 non-comment/import/signature lines of code for the generator. Generated programs are even smaller. But, you can deploy gcc or clang or whatever against them and make fast libraries in the host language.
Why, if you are willing to write those little generation command options in C99 then you can compile the harness with tcc [3] in about 1 millisecond which is faster than most interpreter start-up times - byte code or otherwise - and can link against gcc -O3 (or whatever) helper libraries.
Anyway, I only write this because in my experience few people realize how much development cost they buy into when then insist on a full featured prog.lang, not to criticize Ben's work. You also make users learn quirks of a new language instead of the quirks of a "harness" which may be fewer|easier. (EDIT: Awk is fairly well established, of course.)
Unfortunately, yes. My hope was that it would be compact/small enough to be "quasi-self-documenting" to the likes of "compiler writer types". Probably not to ordinary users. (The extra asterisks are to get nice ANSI SGR escape highlighting and/or rST markup out of the auto-documentation system.) {EDIT2: Also, I would be happy to add some rp.README if you want to contribute one. You seem a great explainer. }
FWIW, I think of this kind of generation as part of the Go mentality more generally, but I am not a Go user/in that community. So, maybe that is speaking out of turn.
I also have a C version of this that I call `crp` I could provide if anyone wants (and yes, short for "crap"). C is a kind of higher ceremony language for such things { EDIT1: but even more established than awk... :-) }
I should have said, those help strings at the very bottom of the file are also part of the "documentation" to the extent the variable names in the proc signature are not self explanatory.
The problem with that approach is that it tends to create the false belief on developers inexperienced with compiler development that C is a must have for doing compilers.
Then one needs to go around explaining, that no, some one programming language could have been chosen and it was just a matter of convinience why C was picked up.
This is fair, but argument in favor of the approach is that it is so simple it really could be done with any backend language {EDIT: by a 1st year CS student in an afternoon, um, maybe.. ;-)}..Go, Pascal, Ada, even slow running ones like Python/Perl/etc. or slow compiling ones like Rust. So, were the design to be more common people might not be so prone to make that assumption. OTOH, getting people to not make wrong assumptions is an eternal challenge. :-)
My own view is that awk was done more or less for these one-liner/simple purposes but by people like Aho for whom full featured languages are barely more thinking than my 60 lines (in either doing the parser or using). :-)
The binary search in a case statement is annoying for stack machines like this. How would implementing your own jump table as a hashmap to functions perform here vs a case statement? I'm guessing the function calls make it a lot slower as you mention in the comparison with the tree walking.
Yeah, I discuss something similar in the article, except I use an array of functions (search for "array of functions"), which is significantly faster than hashing and a hash table lookup. It was very slightly faster than switch, but I decided not to use it:
> This only gave me a 1-2% speed increase on GoAWK’s microbenchmarks (see results and code). In the end I decided I’d rather stick with the simpler switch code and find other ways to improve the speed. And when the Go compiler supports jump tables for switch, I’ll get a 10% improvement by doing nothing!
I wrote a niche grep library (in C++) that uses the byte code approach. I’m not a Go programmer, and am gobsmacked about how switch works—the jump table switch in C/C++ is a huge part of the benefit of a byte code VM. Also, my guess is that you may see a disproportionate performance increase (>18%) between the AST interpreter and the byte code backend with larger AWK programs, due to the latter’s better cache characteristics.
YMMV with computed goto. It was an important technique but modern processors with their crazy branch predictors can yield disappointing results today. It didn’t help my VM.
I would -not- write your own regex library. Doesn’t RE2 have a Go port? Or maybe something else? It’s a world of pain.
Given AWK’s usage as a string processor, I’d sell out hard for string processing. Maybe an opcode specifically for “hardcoded string op” with a secondary operand denoting which type of string op that you dispatch to through an array? Then just load that up with beefy functions and the VM is used to glue together the string operations.
Also, not sure how possible this is in Go, but if it allows for unions, packed structs, and bitfields, you can get some benefit from having 32 bit instructions where the opcodes are a byte and then have the other three bytes available for operands, and additionally allowing for some instructions to be multiword (so a jump could have a second word following that’s the target address, for a full 32 bits). Condensing your byte code without incurring other costs (bit access, compression) is key to improving your L1 cache rate. A cache line can then fit 16 instructions.
Preallocating as much memory as possible is helpful, too. Even if there are necessary conditionals for growing arrays and whatnot, they’re likely to be false and thus handled by branch prediction with no cost.
I was looking at this earlier and noticed the tree walking interpreter. Nice work with the upgrade!
For folks not familiar, basically every major interpreter today compiles to bytecode and runs on a bytecode VM. People will squabble about the term "compiler" saying it only applies to generating binary files directly but I still think it's fun almost every major interpreter is a compiler in the general sense.
The only major interpreters that do tree walking AFAIK are shells like bash.
Awesome blog post Ben! Question: how did you come up with your list of opcodes?
I ask because while some opcodes are obviously needed others as not so obvious and coming up with a balanced instruction set is somewhat difficult design problem.
Good question. A bit of experience, some guesswork, and a lot of testing and benchmarking along the way. I separated out global and local opcodes originally because I wanted individual variable accesses to be fast (why do at runtime what you can do at compile time). As I mention in the article, originally I had all the builtin functions as separate opcodes, but that was slower due to the sheer number of opcodes and Go's current binary tree approach to "switch". I also used Go's profiling tools a bunch to see where the hotspots where. I'm sure there's significant room for improvement, but it's hard, because sometimes when you improve one benchmark, something else suffers.
Is it possible to write an interpreter for this bytecode language using RPython, so as to get a JIT speedup?
I think if you write an interpreter in RPython, and annotate it with some directives, you can get a JIT compiler for free. But it wouldn't be "GoAWK" anymore!
Looks like much of the effort is working around basic Go language limitations. Coding the virtual machine interpreter in Go seems like a mistake.
Compiling straight to Go, and then compiling that the normal way, seems like it would do better, and be simpler. Assuming you really wanted to keep it all in Go.
Go isn't the most efficient language for these sorts of things, but it's also not bad (and Go itself is not really the bottleneck for real-world AWK scripts). That said, GoAWK is useful mainly when you want to use AWK scripts in a Go program (the Benthos stream processor does this: https://www.benthos.dev/docs/components/processors/awk/).
I also did a project recently, creatively called AWKGo, that compiles (a subset of) AWK scripts to Go source code: https://benhoyt.com/writings/awkgo/
To back up my simplicity claim, consider rp [2] -- like 60 non-comment/import/signature lines of code for the generator. Generated programs are even smaller. But, you can deploy gcc or clang or whatever against them and make fast libraries in the host language.
Why, if you are willing to write those little generation command options in C99 then you can compile the harness with tcc [3] in about 1 millisecond which is faster than most interpreter start-up times - byte code or otherwise - and can link against gcc -O3 (or whatever) helper libraries.
Anyway, I only write this because in my experience few people realize how much development cost they buy into when then insist on a full featured prog.lang, not to criticize Ben's work. You also make users learn quirks of a new language instead of the quirks of a "harness" which may be fewer|easier. (EDIT: Awk is fairly well established, of course.)
[1] https://forum.nim-lang.org/
[2] https://github.com/c-blake/cligen/blob/master/examples/rp.ni...
[3] https://repo.or.cz/w/tinycc.git
Unfortunately, yes. My hope was that it would be compact/small enough to be "quasi-self-documenting" to the likes of "compiler writer types". Probably not to ordinary users. (The extra asterisks are to get nice ANSI SGR escape highlighting and/or rST markup out of the auto-documentation system.) {EDIT2: Also, I would be happy to add some rp.README if you want to contribute one. You seem a great explainer. }
FWIW, I think of this kind of generation as part of the Go mentality more generally, but I am not a Go user/in that community. So, maybe that is speaking out of turn.
I also have a C version of this that I call `crp` I could provide if anyone wants (and yes, short for "crap"). C is a kind of higher ceremony language for such things { EDIT1: but even more established than awk... :-) }
Then one needs to go around explaining, that no, some one programming language could have been chosen and it was just a matter of convinience why C was picked up.
My own view is that awk was done more or less for these one-liner/simple purposes but by people like Aho for whom full featured languages are barely more thinking than my 60 lines (in either doing the parser or using). :-)
> This only gave me a 1-2% speed increase on GoAWK’s microbenchmarks (see results and code). In the end I decided I’d rather stick with the simpler switch code and find other ways to improve the speed. And when the Go compiler supports jump tables for switch, I’ll get a 10% improvement by doing nothing!
YMMV with computed goto. It was an important technique but modern processors with their crazy branch predictors can yield disappointing results today. It didn’t help my VM.
I would -not- write your own regex library. Doesn’t RE2 have a Go port? Or maybe something else? It’s a world of pain.
Given AWK’s usage as a string processor, I’d sell out hard for string processing. Maybe an opcode specifically for “hardcoded string op” with a secondary operand denoting which type of string op that you dispatch to through an array? Then just load that up with beefy functions and the VM is used to glue together the string operations.
Also, not sure how possible this is in Go, but if it allows for unions, packed structs, and bitfields, you can get some benefit from having 32 bit instructions where the opcodes are a byte and then have the other three bytes available for operands, and additionally allowing for some instructions to be multiword (so a jump could have a second word following that’s the target address, for a full 32 bits). Condensing your byte code without incurring other costs (bit access, compression) is key to improving your L1 cache rate. A cache line can then fit 16 instructions.
Preallocating as much memory as possible is helpful, too. Even if there are necessary conditionals for growing arrays and whatnot, they’re likely to be false and thus handled by branch prediction with no cost.
For folks not familiar, basically every major interpreter today compiles to bytecode and runs on a bytecode VM. People will squabble about the term "compiler" saying it only applies to generating binary files directly but I still think it's fun almost every major interpreter is a compiler in the general sense.
The only major interpreters that do tree walking AFAIK are shells like bash.
I ask because while some opcodes are obviously needed others as not so obvious and coming up with a balanced instruction set is somewhat difficult design problem.
I think if you write an interpreter in RPython, and annotate it with some directives, you can get a JIT compiler for free. But it wouldn't be "GoAWK" anymore!
https://rpython.readthedocs.io/en/latest/examples.html
Compiling straight to Go, and then compiling that the normal way, seems like it would do better, and be simpler. Assuming you really wanted to keep it all in Go.
I also did a project recently, creatively called AWKGo, that compiles (a subset of) AWK scripts to Go source code: https://benhoyt.com/writings/awkgo/