vladich (u/vladich) - Readit News

vladich commented on Better JIT for Postgres github.com/vladich/pg_jit... · Posted by u/vladich

menaerus · 10 days ago

If you look into the results, you will see that they are able to execute 5x TPC-H queries in ~200ms (total). The dataset is not large it is rather small (10GB) but nonetheless, you wouldn't be able to run 5 queries in such a small amount of time if you had to analyze the workload, generate the code, build indices, start the agents/engine and retrieve the results. I didn't read the whole paper but this is why I think your understanding is wrong.

vladich · 9 days ago

If they count only query execution time, not everything else, it would make sense though. It also could be practical, if your system runs just a few predefined and very optimized queries.

vladich commented on Better JIT for Postgres github.com/vladich/pg_jit... · Posted by u/vladich

menaerus · 10 days ago

No, that's not how I believe they intended it to work. They generate the workload-specific engine up-front and not when the query arrives.

vladich · 10 days ago

Considering it's just s single Phd student who does this work, I don't believe such a task can be realistically accomplished, even as a PoC / research.

vladich commented on Better JIT for Postgres github.com/vladich/pg_jit... · Posted by u/vladich

menaerus · 10 days ago

No, that's not how I believe they intended it to work. They generate the workload-specific engine up-front and not when the query arrives.

vladich · 10 days ago

Then why they write the opposite?

vladich commented on Better JIT for Postgres github.com/vladich/pg_jit... · Posted by u/vladich

menaerus · 11 days ago

It has different limitations but inefficiency doesn't seem likely to be one of them. Did you read the Experimental Results section?

> Figure 2 shows the experimental results, and GenDB outperforms all baselines on every query in both benchmarks. On TPC-H, GenDB achieves a total execution time of 214 ms across five representative queries.

> This result is 2.8× faster than DuckDB (594 ms) and Umbra (590 ms), which are the two fastest baselines, and 11.2× faster than ClickHouse.

> On SEC-EDGAR, GenDB achieves 328 ms, which is 5.0× faster than DuckDB and 3.9× faster than Umbra.

> The performance gap increases with query complexity. For example, on TPC-H Q9, which is a five-way join with a LIKE filter, GenDB completes in 38 ms, which is 6.1× faster than DuckDB. GenDB uses iterative optimization with early stopping criteria.

> On TPC-H, Q6 reaches a near-optimal time of 17 ms at iteration 0 with zone-map pruning and a branchless scan, and does not require further optimization. In contrast, Q18 starts at 12,147 ms and decreases to 74 ms by iteration 1, which is a 163× improvement. This gain comes from replacing a cache-thrashing hash aggregation with an index-aware sequential scan.

> On SEC-EDGAR, Q4 decreases from 1,410 ms to 106 ms over three iterations, which is a 13.3× improvement, and Q6 decreases from 1,121 ms to 88 ms over four iterations, which is a 12.7× improvement. In Q6, the optimizer gradually fuses scan, compact, and merge operations into a single OpenMP parallel region, which removes three thread-spawn overheads. By iteration 1, GenDB already outperforms all baselines

vladich · 11 days ago

That's all great, but sadly impractical. I looked at one of the first statements: > GenDB is an LLM-powered agentic system that decomposes the complex end-to-end query processing and optimization task into a sequence of smaller and well-defined steps, where each step is handled by a dedicated LLM agent.

And knowing typical LLM latency, it's outside of the realm of OLTP and probably even OLAP. You can't wait tens of seconds to minutes until LLM generates you some optimal code that you then compile and execute.

vladich commented on Better JIT for Postgres github.com/vladich/pg_jit... · Posted by u/vladich

Asm2D · 12 days ago

Many SQL engines have JIT compilers.

The problems related to PostgreSQL are pretty much all described here. It's very difficult to do low-latency queries if you cannot cache the compiled code and do it over and over again. And once your JIT is slow you need a logic to decide whether to interpret or compile.

I think it would be the best to start interpreting the query and start compilation in another thread, and once the compilation is finished and interpreter still running, stop the interpreter and run the JIT compiled code. This would give you the best latency, because there would be no waiting for JIT compiler.

vladich · 11 days ago

The idea with parallel compilation is interesting. Worth considering, in some cases. The only problem with it is the same as too much parallelization - you can exhaust your CPU resources much faster. But with some sort of smart scheduling it should work. I'll think about it, thanks!

vladich commented on Better JIT for Postgres github.com/vladich/pg_jit... · Posted by u/vladich

larodi · 12 days ago

sadly, no windows version yet AFAICT

vladich · 11 days ago

Added Windows (x86_64 for now) support

vladich commented on Better JIT for Postgres github.com/vladich/pg_jit... · Posted by u/vladich

Asm2D · 11 days ago

Indeed, but this also means that you would get drastically different performance on platforms that have more physical registers vs on platforms that have less. For example x86_64 only has 16 GP registers, while AArch64 has 32 - if you use 25 registers without any analysis and just go to stack with 10 of them, the difference could be huge.

But... I consider SLJIT to be for a different use-case than AsmJit. It's more portable, but its scope is much more limited.

vladich · 11 days ago

It's definitely different, and for Postgres specifically, they may complement each other. SLJit can be used for low latency queries where codegen time is more important than optimizations, also for other platforms like s390x / PPC / SPARC, etc. AsmJit can be used for SIMD optimizations for x86_64 and ARM64. MIR is kinda in the middle - it does auto-allocations of registers, doesn't support SIMD, but also it's multiplatform. The only thing that doesn't fit well here is LLVM :). It has some advantages in some edge cases, but... It really needs a separate provider, the current one is bad. I'll probably create another LLVM backend for pg_jitter in the future to utilize it properly...

vladich commented on Better JIT for Postgres github.com/vladich/pg_jit... · Posted by u/vladich

Asm2D · 11 days ago

AsmJit has only one place where a lot of time is spent - bin-packing. It's the least optimized part, which has quadratic complexity (at the moment), which starts to show when you have like hundreds of thousands of virtual registers. There is even a benchmark in AsmJit called `asmjit_bench_regalloc`, which shows that a single function that has 16MB alone, with 65k labels and 200k virtual registers takes 2.2 seconds to generate (and 40ms of that is time to just call `emit()`).

If this function is optimized, or switched to some other implementation when there is tens of thousands of virtual registers, you would get orders of magnitude faster compilation.

But realistically, which query requires tens of megabytes of machine code? These are pathological cases. For example we are talking about 25ms when it comes to a single function having 1MB of machine code, and sub-ms time when you generate tens of KB of machine code.

So from my perspective the ability to generate SIMD code that the CPU would execute fast in inner loops is much more valuable than anything else. Any workload, which is CPU-bound just deserves this. The question is how much the CPU bound the workload is. I would imagine databases like postgres would be more memory-bound if you are processing huge rows and accessing only a very tiny part of each row - that's why columnar databases are so popular, but of course they have different problems.

I worked on one project, which tried to deal with this by using buckets and hashing in a way that there would be 16 buckets, and each column would get into one of these, to make the columns closer to each other, so the query engine needs to load only buckets used in the query. But we are talking about gigabytes of RAW throughput per core in this case.

vladich · 11 days ago

I have a test of 200Kb query that AsmJit takes 7 seconds to compile (that's not too bad both LLVM and MIR take ~20s), while sljit does it in 50ms. 200Kb is a pathological case, but it's not unheard of in the area I'm working on. It's realistic, although a rare case. Last 10-15 years most OLTP workloads became CPU bound, because active datasets of most real databases fully fit in memory. There are exceptions, of course.

vladich commented on Better JIT for Postgres github.com/vladich/pg_jit... · Posted by u/vladich

SigmundA · 11 days ago

>So, Mr Big Boy, now we can get to what a prepared statement in Postgres is.

Yeah not a gotcha at all mr teacher. I think you should stop posting low effort responses and examine your own opportunities for education that may have been missed here.

Lets get this straight prepared statements should not be conflated with caching, yet the only way to cache a plan and avoid a full parse is to use a prepared statement and it is by far the biggest reason to use it and why many poolers and libraries try to prepare statements.

Do you realize how ridiculous this is, here is PG's own docs on the purpose of preparing:

"Prepared statements potentially have the largest performance advantage when a single session is being used to execute a large number of similar statements. The performance difference will be particularly significant if the statements are complex to plan or rewrite"

"Although the main point of a prepared statement is to avoid repeated parse analysis and planning of the statement, PostgreSQL will force re-analysis and re-planning of the statement before using it whenever database objects used in the statement have undergone definitional (DDL) changes or their planner statistics have been updated since the previous use of the prepared statement."

The MAIN POINT of preparing is what I am conflating with it, yes...

If PG cached plans automatically and globally then settings like constraint_exclusion and enable_partition_pruning would not need to exist or at least be on by default because the added overhead of the optimizations during planning would be meaningless.

Seriously this whole thread is Brandolini's law in action you obviously can't articulate how PG is better because it does not have a global plan cache and act like I don't know how PG works? Get real buddy.

Are you going to post another couple sentences with no content or are you done here?

vladich · 11 days ago

You can't get a plan cache without a prepared statement, but you can get a prepared statement without a plan cache. It's not the same thing, and in most cases in Postgres prepared statements _do_not_ give you plan caching, because they are created for custom plans. "Custom plan" is a misnomer - having a "custom plan" means the query is replanned on each execution. It's a common misconception - even a sizeable portion of articles you can find on the internet miss this. But if you have a good reading comprehension, you can read, and, possibly, understand, this:

> A prepared statement can be executed with either a generic plan or a custom plan. A generic plan is the same across all executions, while a custom plan is generated for a specific execution using the parameter values given in that call.

here https://www.postgresql.org/docs/current/sql-prepare.html

You're also mixing up parsing and planning for some reason. Query parsing costs like 1/100 of planning, it's not nothing, but pretty close to it.

Even though you're just a rude nobody, it still may be useful for others, who may read this stupid conversation...