Postgres feature you're not using – CTEs a.k.a. WITH clauses

> CTEs won’t always be quite as performant as optimizing your SQL to be as concise as possible. In most cases I have seen performance differences smaller than a 2X difference, this tradeoff for readability is a nobrainer as far as I’m concerned. And with time the Postgres optimizer should continue to get better about such performance.

So, my knowledge here might be out of date, but this behavior isn't "the optimizer isn't smart enough": using a CTE in PostgreSQL causes an explicit boundary in the optimizer which prevents some optimizations from being performed. In some cases you really need/want this behavior, and it might increase the performance of your query; in other cases, this is the last thing you would want. People keep asking for optimizer hints, which PostgreSQL refuses to add, and yet they leave this quirk in explicitly as an escape hatch.

hyperman1 · a year ago

This changed in pg 12.

https://www.postgresql.org/docs/release/12.0/ : Automatic (but overridable) inlining of common table expressions (CTEs)

camgunz · a year ago

Yep, it's a huge improvement for complicated pipelines.

One thing to be aware of is "NOT MATERIALIZED". Here's what the docs say:

A useful property of WITH queries is that they are normally evaluated only once per execution of the parent query, even if they are referred to more than once by the parent query or sibling WITH queries. Thus, expensive calculations that are needed in multiple places can be placed within a WITH query to avoid redundant work. Another possible application is to prevent unwanted multiple evaluations of functions with side-effects. However, the other side of this coin is that the optimizer is not able to push restrictions from the parent query down into a multiply-referenced WITH query, since that might affect all uses of the WITH query's output when it should affect only one. The multiply-referenced WITH query will be evaluated as written, without suppression of rows that the parent query might discard afterwards. (But, as mentioned above, evaluation might stop early if the reference(s) to the query demand only a limited number of rows.)

However, if a WITH query is non-recursive and side-effect-free (that is, it is a SELECT containing no volatile functions) then it can be folded into the parent query, allowing joint optimization of the two query levels. By default, this happens if the parent query references the WITH query just once, but not if it references the WITH query more than once. You can override that decision by specifying MATERIALIZED to force separate calculation of the WITH query, or by specifying NOT MATERIALIZED to force it to be merged into the parent query. The latter choice risks duplicate computation of the WITH query, but it can still give a net savings if each usage of the WITH query needs only a small part of the WITH query's full output.

I sure do use them for many years now, my girlfriend now uses them and loves them, my topcoder friend eventually started using them 10 years ago.

So, no, we are using them , but y’all know what ? Unless recursive they are really nothing so special…

yas_hmaheshwari · a year ago

You are right about recursive CTE - that's a functionality that's not possible without CTE

However, I will argue that an equally great benefit is code readability. That allows you to give good names to pieces of SQL block is like having small functions in normal programming

larodi · a year ago

Indeed it is, although as cascade of create temp view statements within transaction works almost the same way for readability. besides, not everyone can count the open/closing braces for the CTE parts.

CTEs also cannot have indexes on the intermediate results, while views/mviews can and this benefits great cascaded analysis in SQL.

speaking from experience, where we had to optimize a particular network analysis for urban planning, which was initially impossible in qgis/argis, then first version ran for 3-10mins, and initially with intermediate indices the whole thing went down to 45sec-1.5min. this is not possible with CTE to my knowledge, even though I've searched for ways to tell it to build intermediate indices.

saurik · a year ago

muhrizqiardi · a year ago

I'm not using? I'm using that feature almost all the time!

mosselman · a year ago

Postgres feature you aren’t using: SELECT

teaearlgraycold · a year ago

This post was written for juniors

rkachowski · a year ago

in 2013

dveeden2 · a year ago

The good thing here is that other databases like MySQL, TiDB, SQLite etc also support this. so this. And even recursive CTEs are well supported.

https://modern-sql.com/caniuse/with_(non-recursive,_top_leve...

hansvm · a year ago

Used judiciously, a macro system in front of your SQL is often a good approach here. You gain the "linguistically common subtable" capabilities of a CTE, the ability to name constants, and the ability to name/parameterize common sub-expressions, even in databases not supporting CTEs. With LSP support being what it is, you can even whip up editor integration for your new language in a day or less.

Compared to CTEs (assuming you're not using recursion and don't _want_ the materialization), that extra syntactic sugar helps with readability, and the "macro" nature of the thing ensures you don't have any slowdowns from running the thing due to a different query plan.

Downsides include the ability for juniors to turn your SQL into an awful mess, the fact that you'll still occasionally _want_ materialized CTEs for performance and have to write them anyway, plus if you implement it wrong you'll have runtime overhead and very few introspection capabilities (the simplest version I've seen that's decent to work with is creating SQL files as artifacts from the templates using the build system, obviously depending on how much of a rube goldberg your particular builds are).

felixyz · a year ago

Any particular macro system you have used or created? If so, did you use an existing pre-processor, or just coded something up from scratch?

stareatgoats · a year ago

(2013)

It was also discussed around that time, 78 comments: https://news.ycombinator.com/item?id=7023907

lovasoa · a year ago

It's strange to call CTEs a "postgres feature". It's standard SQL, it's in all databases.

sherburt3 · a year ago

Not in MySQL 5.7, which is officially my least favorite database to maintain.

solatic · a year ago

This is the key trade-off you need to keep in the back of your head. Pre-mature performance optimization is the root of all evil; prefer readability first. But the fact remains that you should be setting timeouts and other time budgets, tracing calls from API through to the database, seeing what's taking the most time. If you get to the point where you need to optimize a CTE-based query, be prepared to rewrite it.