> Figure 2 shows the experimental results, and GenDB outperforms all baselines on every query in both benchmarks. On TPC-H, GenDB achieves a total execution time of 214 ms across five representative queries.
> This result is 2.8× faster than DuckDB (594 ms) and Umbra (590 ms), which are the two fastest baselines, and 11.2× faster than ClickHouse.
> On SEC-EDGAR, GenDB achieves 328 ms, which is 5.0× faster than DuckDB and 3.9× faster than Umbra.
> The performance gap increases with query complexity. For example, on TPC-H Q9, which is a five-way join with a LIKE filter, GenDB completes in 38 ms, which is 6.1× faster than DuckDB. GenDB uses iterative optimization with early stopping criteria.
> On TPC-H, Q6 reaches a near-optimal time of 17 ms at iteration 0 with zone-map pruning and a branchless scan, and does not require further optimization. In contrast, Q18 starts at 12,147 ms and decreases to 74 ms by iteration 1, which is a 163× improvement. This gain comes from replacing a cache-thrashing hash aggregation with an index-aware sequential scan.
> On SEC-EDGAR, Q4 decreases from 1,410 ms to 106 ms over three iterations, which is a 13.3× improvement, and Q6 decreases from 1,121 ms to 88 ms over four iterations, which is a 12.7× improvement. In Q6, the optimizer gradually fuses scan, compact, and merge operations into a single OpenMP parallel region, which removes three thread-spawn overheads. By iteration 1, GenDB already outperforms all baselines
And knowing typical LLM latency, it's outside of the realm of OLTP and probably even OLAP. You can't wait tens of seconds to minutes until LLM generates you some optimal code that you then compile and execute.
The problems related to PostgreSQL are pretty much all described here. It's very difficult to do low-latency queries if you cannot cache the compiled code and do it over and over again. And once your JIT is slow you need a logic to decide whether to interpret or compile.
I think it would be the best to start interpreting the query and start compilation in another thread, and once the compilation is finished and interpreter still running, stop the interpreter and run the JIT compiled code. This would give you the best latency, because there would be no waiting for JIT compiler.
But... I consider SLJIT to be for a different use-case than AsmJit. It's more portable, but its scope is much more limited.
If this function is optimized, or switched to some other implementation when there is tens of thousands of virtual registers, you would get orders of magnitude faster compilation.
But realistically, which query requires tens of megabytes of machine code? These are pathological cases. For example we are talking about 25ms when it comes to a single function having 1MB of machine code, and sub-ms time when you generate tens of KB of machine code.
So from my perspective the ability to generate SIMD code that the CPU would execute fast in inner loops is much more valuable than anything else. Any workload, which is CPU-bound just deserves this. The question is how much the CPU bound the workload is. I would imagine databases like postgres would be more memory-bound if you are processing huge rows and accessing only a very tiny part of each row - that's why columnar databases are so popular, but of course they have different problems.
I worked on one project, which tried to deal with this by using buckets and hashing in a way that there would be 16 buckets, and each column would get into one of these, to make the columns closer to each other, so the query engine needs to load only buckets used in the query. But we are talking about gigabytes of RAW throughput per core in this case.
Deleted Comment
Yeah not a gotcha at all mr teacher. I think you should stop posting low effort responses and examine your own opportunities for education that may have been missed here.
Lets get this straight prepared statements should not be conflated with caching, yet the only way to cache a plan and avoid a full parse is to use a prepared statement and it is by far the biggest reason to use it and why many poolers and libraries try to prepare statements.
Do you realize how ridiculous this is, here is PG's own docs on the purpose of preparing:
"Prepared statements potentially have the largest performance advantage when a single session is being used to execute a large number of similar statements. The performance difference will be particularly significant if the statements are complex to plan or rewrite"
"Although the main point of a prepared statement is to avoid repeated parse analysis and planning of the statement, PostgreSQL will force re-analysis and re-planning of the statement before using it whenever database objects used in the statement have undergone definitional (DDL) changes or their planner statistics have been updated since the previous use of the prepared statement."
The MAIN POINT of preparing is what I am conflating with it, yes...
If PG cached plans automatically and globally then settings like constraint_exclusion and enable_partition_pruning would not need to exist or at least be on by default because the added overhead of the optimizations during planning would be meaningless.
Seriously this whole thread is Brandolini's law in action you obviously can't articulate how PG is better because it does not have a global plan cache and act like I don't know how PG works? Get real buddy.
Are you going to post another couple sentences with no content or are you done here?
> A prepared statement can be executed with either a generic plan or a custom plan. A generic plan is the same across all executions, while a custom plan is generated for a specific execution using the parameter values given in that call.
here https://www.postgresql.org/docs/current/sql-prepare.html
You're also mixing up parsing and planning for some reason. Query parsing costs like 1/100 of planning, it's not nothing, but pretty close to it.
Even though you're just a rude nobody, it still may be useful for others, who may read this stupid conversation...