I was surprised to receive a reply from you, the author. Thank you :)
Since I'm a novice with both compilers and databases, could you tell me what the advantages and disadvantages are of using a VM with SQLite?
It is difficult to summarize the advantages and disadvantages of byte-code versus AST for SQL in a tweet. I need to write a new page on this topic for the SQLite documentation. Please remind me if something does not appear in about a week.
I am amazed that the author (D Richard Hipp) made an effort to find and respond to a tweet that was (1) not directed/"@tted" at him or (2) written in his native language of English (original tweet is in Japanese[1]).
Performance analysis indicates that SQLite spends very little time doing bytecode decoding and dispatch. Most CPU cycles are consumed in walking B-Trees, doing value comparisons, and decoding records - all of which happens in compiled C code. Bytecode dispatch is using less than 3% of the total CPU time, according to my measurements.
So at least in the case of SQLite, compiling all the way down to machine code might provide a performance boost 3% or less. That's not very much, considering the size, complexity, and portability costs involved.
A key point to keep in mind is that SQLite bytecodes tend to be very high-level (create a B-Tree cursor, position a B-Tree cursor, extract a specific column from a record, etc). You might have had prior experience with bytecode that is lower level (add two numbers, move a value from one memory location to another, jump if the result is non-zero, etc). The ratio of bytecode overhead to actual work done is much higher with low-level bytecode. SQLite also does these kinds of low-level bytecodes, but most of its time is spent inside of high-level bytecodes, where the bytecode overhead is negligible compared to the amount of work being done.
4. lots of small building blocks of static machine code precompiled/shipped with DB software binary, later iterated & looped through based on the query plan the optimizer came up with. Oracle does this with their columnar/vector/SIMD processing In-Memory Database option (it's not like LLVM as it doesn't compile/link/rearrange the existing binary building blocks, just jumps & loops through the existing ones in the required order)
Edit: It's worth clarifying that the entire codebase does not run like that, not even the entire plan tree - just the scans and tight vectorized aggregation/join loops on the columns/data ranges that happen to be held in RAM in a columnar format.
You can also compile to bytecode (when building), and then compile that bytecode to the machine code (when running). That way, you can take advantage of the exact processor that is running your program.
This is the approach taken by .NET and Java, and I presume most other runtime environments that use bytecode.
You can also parse the code into an augmented syntax tree or code DOM and then directly interpret that. This approach eliminates the intermediate code and bytecode machine at the cost of slower interpretation. It's slower due to memory cache and branch prediction issues rather than algorithmic ones.
For simple functions which are not called repeatedly you have to invert this list - interpreted code is fastest and compiling the code is slowest.
The complied code would still be faster if you excluded the compilation time. It's just that the overhead of compiling it is sometimes higher than the benefit.
APL interpreters are tree-walking, but they're backed by C procedures. Then GCC optimizes quite well and you get excellent performance! With stuff like vector instructions.
Getting on par with GCC/Clang with your own JIT is pretty hefty.
optimizing runtimes (usually bytecode) can beat statically compiled code, because they can profile the code over time, like the JVM.
... which isn't going to close the cold startup execution gap, which all the benchmarks/lies/benchmarks will measure, but it is a legitimate thing.
I believe Intel CPUs actually sort of are #2. All your x86 is converted to microcode ops, sometimes optimized on the fly (I remember Intel discussing "micro ops fusion" a decade ago I think).
The problem of rendering a tree-of-objects as a table is sufficiently difficult that nobody does it, as far as I know. Hence, no tree-of-objects database engine provides the level of detail in their "EXPLAIN" output that SQLite provides.
I believe Microsoft SQL Server uses an object tree internally, and yet its query plan output is a table:
It can also execute a query incrementally. In fact, certain features rely on this behavior. “Watch live data” (in SSMS) for extended events is actually an infinite table-valued function. It’s not exactly rocket science to do in a database, you just need to model data sources and operators as iterators.
Same with PostgreSQL. You can choose between text output (one table row per line), XML or JSON. But none of them are ordered according to a strict execution order like bytecode would be. To me that makes perfect sense though because if represents what can be parallelized in the plan. The bytecode representation in SQLite seems to have the inherent limitation that it is single threaded.
I love how the example SQL query uses abbreviated columns that sound like runes:
select
-- count(*)
a.mandt, b.rldnr, a.bukrs, a.gjahr, a.belnr, b.docln, a.buzei
-- *
from BSEG as a
innerjoin ACDOCA as b
on b.rclnt = a.mandt
and b.rbukrs = a.bukrs
and b.gjahr = a.gjahr
and b.belnr = a.belnr
where a.mandt = '715'
and b.rldnr = '0L'
and b.docln = '000001'
and a.buzei = '001'
and a.gjahr = '2018'
--and a.gjahr = '2017'
--and a.gjahr = '2019'
--order by a.mandt, b.rldnr, a.bukrs, a.gjahr, a.belnr, b.docln, a.buzei
limit 200;
Just realised this is probably from German too, "jahr" being year.
You get XML for estimated execution plans if you use SET SHOWPLAN_XML ON. You get the actual execution plan XML along with your results if you use SET STATISTICS XML. You can see cached execution plans in the sys.dm_exec_query_plan dynamic management view.
The estimated/actual execution plan feature in SQL Server Management Studio was changed to use XML in SQL Server 2005. The SQL Server team are only adding new information to the XML representation, not the text representation.
The name "estimated" execution plan is a bit misleading. If you executed the statement(s) with the current indexes and statistics, that's how it would be executed. I think the "estimated" part of the name is because the number of rows returned from each node, and the execution time of each node, is an estimate.
Assuming that the indexes and statistics don't change, you should expect the "actual" execution plan to have the same shape. It will just be annotated with the number of rows that were actually retrieved as well as the estimated numbers. SQL Server doesn't re-plan if it turns out the number of rows was greater (or fewer) than expected.
As a SQLite and SQL Server user, I find that I would like something more detailed from SQLite than EXPLAIN QUERY PLAN (which just tells you the join order), but less detailed than the bytecode. Particularly I want to know why it chose that join order or to use that index.
I don’t doubt the author, but what is it that makes rendering a tree of objects to a table a difficult problem? Is that not what browsers do when they render a table element?
"A tree-of-objects representation is more difficult to publish in a human-readable form. The objects that comprise the tree tend to all be very different, and thus it is tricky to come up with a consistent and simple table representation with which to display the objects. Any any such table representation that you do come up with would almost certainly have more than six columns, probably many more. The problem of rendering a tree-of-objects as a table is sufficiently difficult that nobody does it"
To further elaborate on this important point.
There is an 'impedance mismatch' (conceptual difficulty mapping between the two logic models) between the tree abstract data type and the table abstract data type. Specifically, there are four key differences between the simple table data structure and the more complex tree data structure that makes mapping between them a non-trivial operation.
Hierarchical: A table has a flat representation; a tree has a hierarchical
representation.
Order: The order of the rows in a table typically do not matter (they may have a unique rowid). The order of branches (nodes) and leaves in a tree is important, and the ordering itself in an encoding of valuable information.
Semi-structured: A table has a fixed structure (rows multiplied by columns). A
tree has a flexible structure - an arbitrary combination of branches (internal nodes) and leaves (terminal nodes). Semi-structured data has a structure that may not necessarily be known in advance, the tree has irregular and variable formation; the tree may have branches with missing or supplementary nodes.
Meta-data: The information describing the meaning of the data in a table
is typically stored separately from the table - consequently a schema is often mandatory. A schema is optional for a tree abstract data type.
As an aside, I have been visiting hacker news almost daily since 2010. This is my first comment on hacker news. I want to say thank you to the community for the many valuable insights over this years.
I'm wondering if one could write this bytecode directly (or with a higher level imperative language) instead of SQL. Often, the programmer knows exactly which index lookups need to happen in a loop, while it seems like a burden to express that in SQL. This might also be an opportunity to create a different type safe dsl for database access.
I was advocating for it for decades, but everyone dismisses it with “you don’t know better than rdbms”. That’s almost a religion. Despite the fact that most of the tables most people have are never any big (except for a few oltps that you ought to leave to specific sql server wizards anyway) and you usually do have the idea how it should work. SQL is a cool solution for a set of problems that has a very non-zero xor with a set of problems that we actually have. And most of our complex queries are trivially expressible in imperative loops. Index lookup and a complex plan building are key features of an rdbms, everything else it steals from you and forces to use an arcane inconvenient language that in practice isn’t even a standard. Make plan building and indexing core ops and leave everything else to some sort of a general purpose bytecode, everyone will be happier and will create dozens of DSLs for all their needs.
> I was advocating for it for decades, but everyone dismisses it with “you don’t know better than rdbms”. That’s almost a religion.
Which is simply not true. Query optimization is done heuristically, for the simple reason that you usually need to run the query to get the information required for "perfect" optimization. If the RDBMS really knew better, it wouldn't offer query hints.
I'm wondering the opposite - could RDBMS know better than programmer how to structure the program? The other day I had this realization, that if I squint, quite a lot of code I see and write could be seen as database tables, prepared statements, and specialized indexes. In particular, every time I use an associative table (or several) to speed up access to data along particular dimension (like X/Y coordinates in the world of a videogame), that's equivalent to making an index in SQL.
>SQL is a cool solution for a set of problems that has a very non-zero xor with a set of problems that we actually have.
My usual problems are:
1) Running queries manually, where SQL is much more friendly than writing bytecode by hand.
2) Running queries using a clueless ORM that has no idea how to optimize them, so leaving this job to the database makes sense.
I believe the actually rare problem is "writing hyper-optimized complex queries tailored to the current state of the database", where bytecode would help. But that is a very unusual usecase.
Maybe there are shops that use databases very differently from me, but it makes sense that SQL is optimized for the common use case.
And since db internals are very different, it's hard to make a single bytecode standard that makes sense to use across different databases. You can probably write something specialized for sqlite or postgres, but since nobody did, probably it's not as useful for other people too.
I was wondering the same thing. And in particular if a new query language that avoided many of the pitfalls of SQL could compile down to that bytecode and avoid having to deal with SQL as an intermediate representation. Also, if you can compile to bytecode ahead of time, then that could save the time needed to parse the text of a sql query to bytecode at runtime.
> Also, if you can compile to bytecode ahead of time, then that could save the time needed to parse the text of a sql query to bytecode at runtime.
I think we already kind of have that already; one can prepare a query/statement and then use it multiple times. Regex engines also do that, except that in the case of SQlite one can bind different parameter values.
Programmers worried about SQL lexing/parsing times can compile their most used queries once for all at programmer startup. Plus, one can sometimes break a query with annoyingly variable parts into smaller queries glued with SQLite API calls.
> Also, if you can compile to bytecode ahead of time, then that could save the time needed to parse the text of a sql query to bytecode at runtime.
That's exactly how those "prepared statements" work in SQLite - you get a handle to a piece of bytecode, and you can call it with different parameters.
You can do an automatic upgrade thing, where you recognise old formats and upgrade them on the fly. Possibly with a window on it where at sufficient age people need to run their elderly code through multiple upgrade cycles.
Or you can declare it an unstable interface and it's on the user to deal with it changing over time.
Or you can just leave it accessible without excessive work, but not document it, and then it's very obviously on the external tool to deal with the impedance matching.
Something akin to this is available[1] in DuckDB, itself started as a "shell" over SQLite. Using Python, you can compose your SQL plans as you like.
I recall a previous discussion about the SQLite bytecode and potential alternatives, where the main idea was that, if SQLite had gone the other way, you could have a much more general engine, where you could achieve something like DuckDB itself just as a set of extensions.
Reading the draft, it doesn't seem that extensibility of SQLite was a main factor in deciding. Maybe this trade-off also covers extensions somehow, which would be nice to see in the final document.
1) I think a simple new kind of query hint could go a long way: Declare a table as being "infinitely large" from the perspective of the query planner.
So, if a query caused a full table scan or similar against the table, it's an error, regardless of the actual content.
A lot of "typical" backend code would then simply require you to make the right indices you need to support your queries, and the query planner would be forced to use those indices in all situations.
2) You can already sort of do what you say; that "higher level imperative language" is available simply by breaking your query up into smaller chunks.
At least in MS-SQL (which is what I know), simply do
declare @tmp1 as table (...)
insert into @tmp1 ...
declare @tmp2 as table (...)
insert into @tmp2 ... from @tmp1 join ...
and so on; if you make each of those steps small enough you pretty much have the imperative language.
3) That said, I think SQL is a wonderful idea, I am also super-productive in implementing business logic in it; but it is a horrible language; the learning curve is so much higher than it needs to be and so many silly pitfalls. So fully support pushes for better languages.
Definitely yes. Databases and compilers have a lot in common. This bytecode is exactly equivalent to a compiler IR. It's called out as such in the OP where the decoupling between front end and backend is noted. Therefore you can construct the bytecode from outside of the database and feed it to the backend (may require patching sqlite to make the entry points accessible).
By analogy, this is really easy in LLVM. You can write the IR you want in a text file and feed it back into the infrastructure. This is used a lot in testing and debugging LLVM. It's much harder in GCC for reasons that never made much sense to me.
There's probably a parser for the format of the EXPLAIN text dump in sqlite somewhere intended for modifying the debug output then feeding it back into the engine. Maybe not documented, but I expect it works fine in practice.
You should watch this asianometry video on the birth of SQL, very interesting, and in fact a functional approach, based on relational algebra and tuple relational calculus, was originally what the father of the concept intended for interacting with the db. Other IBM engineers formulated the SQL language over that.
The precursors to sql rdbms, and the war over competing concepts, were also quite thought provoking.
I think most people associate bytecode VMs / interpreters with general-purpose programming languages, but it's a surprisingly useful concept in other contexts.
Sometimes bytecode VMs appear in unexpected places! A few that I'm aware of:
- eBPF, an extension mechanism in the Linux kernel
- DWARF expression language, with interpreters in debuggers like GDB and LLDB
- The RAR file format includes a bytecode encoding for custom data transformation
Bytecode is great for tiny domain languages and I use them in many projects.
Ultima IV used one to animate sprites on it's title screen map. For the next version of XU4 I implemented three bytecode interpreters to script the entire title sequence. There's a high level presentation interpreter, a GPU rendering interpreter, and one for the Ultima IV bytecode.
The original TeX Fonts stored their metrics in TFM (short for TeX font metrics) files, which contains a bytecode interpreter for calculating ligatures and kerning between characters. I learned about that when I tried reading the files myself.
From what I can tell, modern fonts using OpenType just have tables to accomplish something similar now, in the form of the GSUB and GPOS tables?
That’s awesome, I didn’t know SQLite had a bytecode.
Based on my experience writing interpreters, I know that bytecode is faster to interpret.
(Source: I wrote JavaScriptCore’s interpreter and it’s optimizing JITs. I’ve also worked on other language implementations. I’ve seen the AST vs bytecode thing play out more than once.)
Maybe I'm missing something, but the bytecode approach seems really obviously better, just from a memory usage and locality point of view. Scanning an array of bytes or words is obviously going to be faster than traversing a tree of pointers.
So I'd be surprised to find a serious language implementation using a pure AST in its interpreter, without at the very least flattening it to an array!
Edit to add: of course there are some cases where you might actually want the AST approach, as the Sqlite doc is careful to point out. But not very many.
It’s fine to use an AST interpreter if you pair it with really great JITs.
That has its own problems though - using an AST as a shared source of truth for your JIT compilers is cumbersome for implementing stuff like OSR. But from a perf standpoint, it’s probably fine.
Also - say you want to optimize just for startup time, like if the code you’re running has minimal looping or reexecution. Then AST is better because you skip a step before first execution.
SQLite's design docs were the first time I had seen a database use a virtual machine instead of walking a tree. I later noticed VMs in libraries, embedded DSLs, and other applications outside of large general-purpose programming languages. That really drove home for me that VMs could be anywhere and were often a useful step in handling a user's expressions.
Stack-based VMs, like SQLite's (I think), ARE trees. A stack based VM's bytecode (without DUP and POP) is just the post-order depth-first traversal of the corresponding expression tree. With DUP you have a connected acyclic DAG. With POP you have an acyclic DAG with one or more components. With loops you have a full graph.
When looked at this way, a VM makes the most sense actually because a pointer-heavy tree implementation is just terrible for caching and is super wasteful. Also, most SQL plans are trees (Until you get to WITH RECURSIVE).
I'm working on a CEP in C right now and a stack-based bytecode is simply the most 'obvious' way to represent a tree when you don't want to have to deal with lots of memory allocations. Just pre-allocate a buffer large enough and increment and go.
Either way, try taking something like the following and printing an expression tree like the one below
LITERAL 2
LITERAL 3
LITERAL 4
MUL
PLUS
now, write something to produce on a terminal, the following:
Plus
Mul
3
4
2
You should be able to do this in a simple iteration through the ops. Try it! Then try it with the variations above.
Now, try writing ops to manipulate and rewrite it. It's actually really a fun way to represent trees.
> Also, most SQL plans are trees (Until you get to WITH RECURSIVE).
WITH RECURSIVE can also generally be implemented tree-style, you just loop a bunch in one of the iterators.
There are some databases, like Umbra, which can have DAG query plans for the benefit of more efficient groupjoin (GROUP BY pushed down into a join). IIRC some of the unnesting patterns can also benefit from non-tree plans.
VMs really can be. They have a long history in code portability. In the old days it really wasn't uncommon to use an approach centered around some interpretable byte code running in a vm, where the reusable vm was all that needed porting to different architectures and operating systems. This all happened well before Java.
It was really big in gaming, Zork, Sierra games, LucasArts games, and even a few more "action" games like Flashback were all designed around VMs to make porting somewhat sane. Running the byte code is usually not the performance bottleneck in these cases but drawing stuff to the screen is, and the VM had to handle that.
Wozniak found that 16-bit arithmetic was much easier to perform on a 16-bit machine. So he wrote SWEET16, a VM with its own bytecode, to work with 16-bit pointers in Apple's Integer BASIC.
Similarly, the TI-99/4 series of computers had a cumbersome way of accessing memory, because most of the RAM in the machine was exclusively controlled by the video chip and could only be accessed by poking a few registers. So the designers implemented a bytecode VM in ROM called GPL (Graphics Programming Language) that would make accesses to this video RAM seem like straightforward RAM accesses. TI encouraged the use of GPL to develop games and other applications for the system. Unfortunately, the fact that it was interpreted by the CPU, plus the fact that the CPU could only access video-chip memory during the horizontal and vertical retrace intervals, made GPL execution very slow. And since the machine's in-ROM BASIC interpreter was written in GPL, you now know why BASIC programs on the TI-99/4 and /4A crept along at the most leisurely of paces.
So VMs were definitely a thing used to make certain kinds of tradeoffs, even way back in the 8-bit days when systems had mere kilobytes of memory. Who knows how many might be lurking in a modern system!
VMs are a persistently underrated tool by people who think the conversation begins and ends at VirtualBox. They are a phenomenal way to constrain the surface area of what one's own code has to target, and that's always a plus.
Doesn't MS Word internally run a Forth-like VM? I remember reading an article by someone who decompiled an early MS-DOS version of Word only to discover that there was a VM inside.
Or indeed any time complexity need to be layered. A VM design doesn't even have to have an explicit bytecode compilation stage -- you can write an interpreter that runs as the instructions are issued.
MonetDB is another database that uses a VM [1], using an instruction set called the MonetDB Assembly Language (MAL). I believe it pioneered the technique [2]. The VM is basically designed to express the query as vectorized functions on columns.
> Tree-Of-Objects → The input SQL is translated in a tree of objects that represent the processing to be done. The SQL is executed by walking this tree. This is the technique used by MySQL and PostgreSQL.
Yes, postgres for example. It maps pretty close to what you see from `explain` where the ops are pretty high level, reducing interpreter overhead. JIT's big gain is speeding up reading values out of row data
I recently implemented my own expression evaluator in java for in-memory data frames, and once you think about doing that deeply, you very quickly understand the need for bytecode. If you directly evaluate the expression using a tree representation, you basically have to do a whole lot of branching (either via switch statements or polymorphism) for every single line of useful operation. Yes, the branch predictor kicks in and it means that your code wouldn’t be as slow as if it didn’t, but it is still measurably slower than if you converted the expression into bytecode once and just ran that on all rows instead.
Bytecode will only impact packing, so more efficient ram, cache and cpu wise. But I don't understand how it would help with branching? You still have to make the same decisions? As in the bytecode executor still needs to do differnt things based on the op code, its not in hardware.
Consider evaluating an expression like "col1 + col2 == col3"
A tree based evaluator has to do something like -
for (int i = 0; i < df.size(); i++) {
// switch on expression type (column, binary operator, unary operator etc)
// switch on expression operator type (add, subtract, equals etc)
// switch on column type (int, long, float etc)
// actual evaluation
}
whereas a bytecode based evaluator is able to run something equivalent to -
for (int i = 0; i < df.size(); i++) {
result[i] = df.getLong(column1Index, i) + df.getLong(column2Index, i)) == df.getLong(column3Index, i)
}
There is threaded bytecode as well, which uses direct jumping vs a switch for dispatch. This can improve branch prediction, though it is a debated topic and may not offer much improvement for modern processors.
Running bytecode is much lower latency than compiling into native code. If you're not bottlenecked by running the bytecode (as opposed to memory or disk speed), you don't really have to JIT it any further into native code.
Which is why JavaScript engines (and JIT compilers for other languages) are typically designed to:
1. start interpreting the code once it has been parsed, and start jitting a function being called;
2. generate naive bytecode for the function that generates native code equivalents of the run actions of the AST (including some fast-path code for simple cases such as adding two 32-bit integers, and falling back to function calls to perform the add on more complex types);
3. generate more optimal code for sequences of instructions in the background, such as entire for/while loops, then patch in calls to those fast-path versions when ready.
That way you can start running the code immediately after parsing it, and can switch to the faster versions when they are ready if the code takes longer to run in the slower version.
Depends how much compilation you want to do. For example converting threaded code into native direct jumps need not be more expensive than plain bytecode. Of course the gains are also limited.
Yeah, but nobody is seriously considering that unless maybe for huge prepared statements. The argument is usually bytecode vs parser and associated data structures.
I don't have personal experience on it, but I've read that in practice it's not worth the effort—at least not yet. Apparently there are some issues with it and it barely speeds up queries (except perhaps certain ones?). I imagine this could be in big part because LLVM is not really a good fit for JIT where you want to spend very little time to do the compilation itself.
https://twitter.com/gorilla0513/status/1784756577465200740
I was surprised to receive a reply from you, the author. Thank you :) Since I'm a novice with both compilers and databases, could you tell me what the advantages and disadvantages are of using a VM with SQLite?
https://twitter.com/DRichardHipp/status/1784783482788413491
It is difficult to summarize the advantages and disadvantages of byte-code versus AST for SQL in a tweet. I need to write a new page on this topic for the SQLite documentation. Please remind me if something does not appear in about a week.
[1] https://twitter.com/gorilla0513/status/1784623660193677762
1. interpreted code
2. compiled then interpreted bytecode
3. compiled machine code
The further up, the simpler.
The further down, the faster.
So at least in the case of SQLite, compiling all the way down to machine code might provide a performance boost 3% or less. That's not very much, considering the size, complexity, and portability costs involved.
A key point to keep in mind is that SQLite bytecodes tend to be very high-level (create a B-Tree cursor, position a B-Tree cursor, extract a specific column from a record, etc). You might have had prior experience with bytecode that is lower level (add two numbers, move a value from one memory location to another, jump if the result is non-zero, etc). The ratio of bytecode overhead to actual work done is much higher with low-level bytecode. SQLite also does these kinds of low-level bytecodes, but most of its time is spent inside of high-level bytecodes, where the bytecode overhead is negligible compared to the amount of work being done.
Edit: It's worth clarifying that the entire codebase does not run like that, not even the entire plan tree - just the scans and tight vectorized aggregation/join loops on the columns/data ranges that happen to be held in RAM in a columnar format.
You can also compile to bytecode (when building), and then compile that bytecode to the machine code (when running). That way, you can take advantage of the exact processor that is running your program.
This is the approach taken by .NET and Java, and I presume most other runtime environments that use bytecode.
The complied code would still be faster if you excluded the compilation time. It's just that the overhead of compiling it is sometimes higher than the benefit.
Getting on par with GCC/Clang with your own JIT is pretty hefty.
... which isn't going to close the cold startup execution gap, which all the benchmarks/lies/benchmarks will measure, but it is a legitimate thing.
I believe Intel CPUs actually sort of are #2. All your x86 is converted to microcode ops, sometimes optimized on the fly (I remember Intel discussing "micro ops fusion" a decade ago I think).
I believe Microsoft SQL Server uses an object tree internally, and yet its query plan output is a table:
https://learn.microsoft.com/en-us/sql/t-sql/statements/set-s...
The estimated/actual execution plan feature in SQL Server Management Studio was changed to use XML in SQL Server 2005. The SQL Server team are only adding new information to the XML representation, not the text representation.
The name "estimated" execution plan is a bit misleading. If you executed the statement(s) with the current indexes and statistics, that's how it would be executed. I think the "estimated" part of the name is because the number of rows returned from each node, and the execution time of each node, is an estimate.
Assuming that the indexes and statistics don't change, you should expect the "actual" execution plan to have the same shape. It will just be annotated with the number of rows that were actually retrieved as well as the estimated numbers. SQL Server doesn't re-plan if it turns out the number of rows was greater (or fewer) than expected.
As a SQLite and SQL Server user, I find that I would like something more detailed from SQLite than EXPLAIN QUERY PLAN (which just tells you the join order), but less detailed than the bytecode. Particularly I want to know why it chose that join order or to use that index.
"A tree-of-objects representation is more difficult to publish in a human-readable form. The objects that comprise the tree tend to all be very different, and thus it is tricky to come up with a consistent and simple table representation with which to display the objects. Any any such table representation that you do come up with would almost certainly have more than six columns, probably many more. The problem of rendering a tree-of-objects as a table is sufficiently difficult that nobody does it"
To further elaborate on this important point.
There is an 'impedance mismatch' (conceptual difficulty mapping between the two logic models) between the tree abstract data type and the table abstract data type. Specifically, there are four key differences between the simple table data structure and the more complex tree data structure that makes mapping between them a non-trivial operation.
Hierarchical: A table has a flat representation; a tree has a hierarchical representation.
Order: The order of the rows in a table typically do not matter (they may have a unique rowid). The order of branches (nodes) and leaves in a tree is important, and the ordering itself in an encoding of valuable information.
Semi-structured: A table has a fixed structure (rows multiplied by columns). A tree has a flexible structure - an arbitrary combination of branches (internal nodes) and leaves (terminal nodes). Semi-structured data has a structure that may not necessarily be known in advance, the tree has irregular and variable formation; the tree may have branches with missing or supplementary nodes.
Meta-data: The information describing the meaning of the data in a table is typically stored separately from the table - consequently a schema is often mandatory. A schema is optional for a tree abstract data type.
As an aside, I have been visiting hacker news almost daily since 2010. This is my first comment on hacker news. I want to say thank you to the community for the many valuable insights over this years.
Deleted Comment
Which is simply not true. Query optimization is done heuristically, for the simple reason that you usually need to run the query to get the information required for "perfect" optimization. If the RDBMS really knew better, it wouldn't offer query hints.
My usual problems are:
1) Running queries manually, where SQL is much more friendly than writing bytecode by hand. 2) Running queries using a clueless ORM that has no idea how to optimize them, so leaving this job to the database makes sense.
I believe the actually rare problem is "writing hyper-optimized complex queries tailored to the current state of the database", where bytecode would help. But that is a very unusual usecase.
Maybe there are shops that use databases very differently from me, but it makes sense that SQL is optimized for the common use case.
And since db internals are very different, it's hard to make a single bytecode standard that makes sense to use across different databases. You can probably write something specialized for sqlite or postgres, but since nobody did, probably it's not as useful for other people too.
- be able to put the projection in a varable and reuse it (and I think orm people might love it)
- have a quick way to forward the the non-aggregated fields of projection to a group by (maybe with the aforementionned variables)
I think we already kind of have that already; one can prepare a query/statement and then use it multiple times. Regex engines also do that, except that in the case of SQlite one can bind different parameter values.
Programmers worried about SQL lexing/parsing times can compile their most used queries once for all at programmer startup. Plus, one can sometimes break a query with annoyingly variable parts into smaller queries glued with SQLite API calls.
Maybe SQLite could have a similar mechanism, but the cache stays on disk or an external memory cache.
That's exactly how those "prepared statements" work in SQLite - you get a handle to a piece of bytecode, and you can call it with different parameters.
Or you can declare it an unstable interface and it's on the user to deal with it changing over time.
Or you can just leave it accessible without excessive work, but not document it, and then it's very obviously on the external tool to deal with the impedance matching.
I recall a previous discussion about the SQLite bytecode and potential alternatives, where the main idea was that, if SQLite had gone the other way, you could have a much more general engine, where you could achieve something like DuckDB itself just as a set of extensions.
Reading the draft, it doesn't seem that extensibility of SQLite was a main factor in deciding. Maybe this trade-off also covers extensions somehow, which would be nice to see in the final document.
[1] https://duckdb.org/docs/api/python/expression
So, if a query caused a full table scan or similar against the table, it's an error, regardless of the actual content.
A lot of "typical" backend code would then simply require you to make the right indices you need to support your queries, and the query planner would be forced to use those indices in all situations.
2) You can already sort of do what you say; that "higher level imperative language" is available simply by breaking your query up into smaller chunks.
At least in MS-SQL (which is what I know), simply do
and so on; if you make each of those steps small enough you pretty much have the imperative language.3) That said, I think SQL is a wonderful idea, I am also super-productive in implementing business logic in it; but it is a horrible language; the learning curve is so much higher than it needs to be and so many silly pitfalls. So fully support pushes for better languages.
By analogy, this is really easy in LLVM. You can write the IR you want in a text file and feed it back into the infrastructure. This is used a lot in testing and debugging LLVM. It's much harder in GCC for reasons that never made much sense to me.
There's probably a parser for the format of the EXPLAIN text dump in sqlite somewhere intended for modifying the debug output then feeding it back into the engine. Maybe not documented, but I expect it works fine in practice.
The precursors to sql rdbms, and the war over competing concepts, were also quite thought provoking.
https://www.youtube.com/watch?v=z8L202FlmD4
Sometimes bytecode VMs appear in unexpected places! A few that I'm aware of:
- eBPF, an extension mechanism in the Linux kernel - DWARF expression language, with interpreters in debuggers like GDB and LLDB - The RAR file format includes a bytecode encoding for custom data transformation
More here: https://dubroy.com/blog/bytecode-vms-in-surprising-places/
Ultima IV used one to animate sprites on it's title screen map. For the next version of XU4 I implemented three bytecode interpreters to script the entire title sequence. There's a high level presentation interpreter, a GPU rendering interpreter, and one for the Ultima IV bytecode.
From what I can tell, modern fonts using OpenType just have tables to accomplish something similar now, in the form of the GSUB and GPOS tables?
Documentation for the TFM format here: https://tug.org/TUGboat/Articles/tb02-1/tb02fuchstfm.pdf (search for lig/kern)
https://en.wikipedia.org/wiki/Interpreter_pattern
https://gameprogrammingpatterns.com/bytecode.html
https://wiki.c2.com/?InterpreterPattern
I first read about it from the "Design Patterns" book, which I would recommend to everyone.
Any sufficiently complicated C or Fortran program contains an ad hoc, informally-specified, bug-ridden, slow implementation of half of Common Lisp.
Based on my experience writing interpreters, I know that bytecode is faster to interpret.
(Source: I wrote JavaScriptCore’s interpreter and it’s optimizing JITs. I’ve also worked on other language implementations. I’ve seen the AST vs bytecode thing play out more than once.)
So I'd be surprised to find a serious language implementation using a pure AST in its interpreter, without at the very least flattening it to an array!
Edit to add: of course there are some cases where you might actually want the AST approach, as the Sqlite doc is careful to point out. But not very many.
That has its own problems though - using an AST as a shared source of truth for your JIT compilers is cumbersome for implementing stuff like OSR. But from a perf standpoint, it’s probably fine.
Also - say you want to optimize just for startup time, like if the code you’re running has minimal looping or reexecution. Then AST is better because you skip a step before first execution.
When looked at this way, a VM makes the most sense actually because a pointer-heavy tree implementation is just terrible for caching and is super wasteful. Also, most SQL plans are trees (Until you get to WITH RECURSIVE).
I'm working on a CEP in C right now and a stack-based bytecode is simply the most 'obvious' way to represent a tree when you don't want to have to deal with lots of memory allocations. Just pre-allocate a buffer large enough and increment and go.
Either way, try taking something like the following and printing an expression tree like the one below
now, write something to produce on a terminal, the following: You should be able to do this in a simple iteration through the ops. Try it! Then try it with the variations above.Now, try writing ops to manipulate and rewrite it. It's actually really a fun way to represent trees.
WITH RECURSIVE can also generally be implemented tree-style, you just loop a bunch in one of the iterators.
There are some databases, like Umbra, which can have DAG query plans for the benefit of more efficient groupjoin (GROUP BY pushed down into a join). IIRC some of the unnesting patterns can also benefit from non-tree plans.
It was really big in gaming, Zork, Sierra games, LucasArts games, and even a few more "action" games like Flashback were all designed around VMs to make porting somewhat sane. Running the byte code is usually not the performance bottleneck in these cases but drawing stuff to the screen is, and the VM had to handle that.
Similarly, the TI-99/4 series of computers had a cumbersome way of accessing memory, because most of the RAM in the machine was exclusively controlled by the video chip and could only be accessed by poking a few registers. So the designers implemented a bytecode VM in ROM called GPL (Graphics Programming Language) that would make accesses to this video RAM seem like straightforward RAM accesses. TI encouraged the use of GPL to develop games and other applications for the system. Unfortunately, the fact that it was interpreted by the CPU, plus the fact that the CPU could only access video-chip memory during the horizontal and vertical retrace intervals, made GPL execution very slow. And since the machine's in-ROM BASIC interpreter was written in GPL, you now know why BASIC programs on the TI-99/4 and /4A crept along at the most leisurely of paces.
So VMs were definitely a thing used to make certain kinds of tradeoffs, even way back in the 8-bit days when systems had mere kilobytes of memory. Who knows how many might be lurking in a modern system!
https://casadevall.pro/articles/2020/11/compiling-word-for-w...
David Parnas talks a lot about this as a way to reliable modularisation: https://two-wrongs.com/deliberate-abstraction.html
[1] https://stratos.seas.harvard.edu/files/MonetDebull2012.pdf
[2] https://www.researchgate.net/publication/220538804_Database_...
I think it's a special case of the fact that finite state machines are everywhere you look.
> Tree-Of-Objects → The input SQL is translated in a tree of objects that represent the processing to be done. The SQL is executed by walking this tree. This is the technique used by MySQL and PostgreSQL.
A tree based evaluator has to do something like -
whereas a bytecode based evaluator is able to run something equivalent to -1. start interpreting the code once it has been parsed, and start jitting a function being called;
2. generate naive bytecode for the function that generates native code equivalents of the run actions of the AST (including some fast-path code for simple cases such as adding two 32-bit integers, and falling back to function calls to perform the add on more complex types);
3. generate more optimal code for sequences of instructions in the background, such as entire for/while loops, then patch in calls to those fast-path versions when ready.
That way you can start running the code immediately after parsing it, and can switch to the faster versions when they are ready if the code takes longer to run in the slower version.
I don't have personal experience on it, but I've read that in practice it's not worth the effort—at least not yet. Apparently there are some issues with it and it barely speeds up queries (except perhaps certain ones?). I imagine this could be in big part because LLVM is not really a good fit for JIT where you want to spend very little time to do the compilation itself.