Another amazing release, congrats to all the contributors. There are simply too many things to call out - just a few highlights:
Massive improvements to vacuum operations:
"PostgreSQL 17 introduces a new internal memory structure for vacuum that consumes up to 20x less memory."
Much needed features for backups:
"pg_basebackup, the backup utility included in PostgreSQL, now supports incremental backups and adds the pg_combinebackup utility to reconstruct a full backup"
I'm a huge fan of FDW's and think they are an untapped-gem in Postgres, so I love seeing these improvements:
"The PostgreSQL foreign data wrapper (postgres_fdw), used to execute queries on remote PostgreSQL instances, can now push EXISTS and IN subqueries to the remote server for more efficient processing."
A bit off topic, can someone suggest how I can learn more about using databases(postgres specifically) in real world applications? I am familiar with SQL and common ORMs, but I feel the internet is full of beginner level tutorials which lack this depth.
I've been using duckdb to import data into postgres (especially CSVs and JSON) and it has been really effective.
Duckdb can run SQL across the different data formats and insert or update directly into postgres. I run duckdb with python and Prefect for batch jobs, but you can use whatever language or scheduler you perfer.
I can't recommend this setup enough. The only weird things I've run into is a really complex join across multiple postgres tables and parquet files had a bug reading a postgres column type. I simplified the query (which was a good idea anyways) and it hums away
- If the foreign server is close (latency) that’s great
- if your query is complex then it helps if the postgres planner can “push down” to mssql. That will usually happen if you aren’t doing joins to local data
I personally like to set up the foreign tables, then materialize the data into a local postgres table using pg_cron. It’s like a basic ETL pipeline completely built into postgres
If your company doesn't have an internal tool for storing credentials, you can always store them in the cloud provider's secrets management tool. E.g. Secrets Manager or Secure String in Parameter Store on AWS. Your CI/CD pipeline can pull the secrets from there.
in supabase we have a “vault” utility for this (for example: https://fdw.dev/catalog/clickhouse/#connecting-to-clickhouse). Sorry I can’t make recommendations for other platforms because i don’t want to suggest anything that could be considered unsafe - hopefully others can chime in
Very cool with the JSON_TABLE. The style of putting json response (from API, created from scraping, ect.) into jsonb column and then writing a view on top to parse / flatten is something I've been doing for a few years now. I've found it really great to put the json into a table, somewhere safe, and then do the parsing rather than dealing with possible errors on the scripting language side. I haven't seen this style been used in other places before, and to see it in the docs as a feature from new postgres makes me feel a bit more sane. Will be cool to try this out and see the differences from what I was doing before!
A personal rule of mine is to always separate data receipt+storage from parsing. The retrieval is comparatively very expensive and has few possible failure modes. Parsing can always fail in new and exciting ways.
Disk space to store the returned data is cheap and can be periodically flushed only when you are certain the content has been properly extracted.
I'm a huge fan of views to serve as the v1 solution for problems before we need to optimize our approach, and this is the main thing that serves as a blocker in those discussions. If only we were able to have v2 of the approach be an IVM-view, we could leverage them much more widely.
Agreed big disappointment that "incremental view maintenance" is taking so long to get into the core - despite several IVM extensions. For me this is by far the most important capability missing from Postgres.
In my understanding it was a timing issue with the UUIDv7 RFC not being finalized before the Postgres 17 feature freeze in early April. Shouldn't be an issue to get this in for Postgres 18, I think.
My assumption is that because you can easily do this through software when using UUID and probably a lot do it like this, the pressure of supporting it, is a lot lower than expected.
A number of features stood out to me in this release:
1. Chipping away more at vacuum. Fundamentally Postgres doesn't have undo log and therefore has to have vacuum. It's a trade-off of fast recovery vs well.. having to vacuum. The unfortunate part about vacuum is that it adds load to the system exactly when the system needs all the resources. I hope one day people stop knowing that vacuum exists, we are one step closer, but not there.
2. Performance gets better and not worse. Mark Callaghan blogs about MySQL and Postgres performance changes over time and MySQL keep regressing performance while Postgres keeps improving.
3. JSON. Postgres keep improving QOL for the interop with JS and TS.
4. Logical replication is becoming a super robust way of moving data in and out. This is very useful when you move data from one instance to another especially if version numbers don't match. Recently we have been using it to move at the speed of 1Gb/s
5. Optimizer. The better the optimizer the less you think about the optimizer. According to the research community SQL Server has the best optimizer. It's very encouraging that every release PG Optimizer gets better.
MySQL can be faster in certain circumstances (mostly range selects), but only if your schema and queries are designed to exploit InnoDB’s clustering index.
But even then, in some recent tests I did, Postgres was less than 0.1 msec slower. And if the schema and queries were not designed with InnoDB in mind, Postgres had little to no performance regression, whereas MySQL had a 100x slowdown.
I love MySQL for a variety of reasons, but it’s getting harder for me to continue to defend it.
I ran into a lot of that 20 years ago, surprised to hear it's still a thing at all given how it's basically common knowledge that most of the Internet and Cloud run on open source software.
I once met an older gentleman who was doing IT work for a defense contractor. He seemed nice enough. We were making small talk and I happened to mention that I had recently installed Linux on my computer at home. He tone changed almost immediately and he started ranting about how Linux was pirated source code, stolen from Microsoft, all of it contains viruses, etc. He was talking about the SCO vs Linux lawsuits but of course got absolutely ALL of the details wrong, like which companies were even involved in the lawsuits. He was so far off the deep end that I didn't even try to correct him, I just nodded and smiled and said I was actually running late to be somewhere else...
Well, from one VERY expensive vendor, to another considerably less expensive vendor
Also, MSSQL have few things going for it, and surprisingly no one seem to be even trying to catch up
- Their BI Stacks (PowerBI, SSAS)
- Their Database Development (SDK) ( https://learn.microsoft.com/en-us/sql/tools/sql-database-projects/sql-database-projects?view=sql-server-ver16 )
The MSSQL BI stack is unmatched , SSAS is the top star of BI cubes and the second option is not even close
SSRS is ok, SSIS is passable , but still both are very decent
PowerBI and family is also the best option for Mid to large (not FAANG large, but just normal large) companies
And finally the GEM that is database projects, you can program your DB changes declaratively, there is nothing like this in the market and again, no one is even trying
The easiest platformt todo evolutionary DB development is MS SQL
I really wish someone will implement DB Projects (dacpac) for Postgresql
Exactly. Supposedly the paid solution ensures long term support. The most fun part is that our customers need to buy these database licenses, so it directly reduces our own pay. Say no to non-technical (or rational) managers :<
What your boss doesn't realize is your business already depends on FOSS. Here are a few examples:
- Do you use any cloud provider? Those platforms are built on top of open source software: Linux, nginx (e.g Cloudflare's edge servers before the rust rewrite), ha-proxy (AWS ELB), etc
- Either the software your business builds or depends on probably uses open source libraries (e.g: libc, etc)
- The programming languages your business uses directly or indirectly are probably open source
My point is that folks that make these kinds of statements have no clue how their software is built or what kind software their business actually depends on.
Well, you can't necessarily trust open source for business software.
The more deeply your business depends on something, the more careful you need to be when selecting the source for that something. (And the db is often very deeply depended on.)
You want to see why their long-term incentives align with your own needs.
But a revenue stream is just one way to do this, and not a perfect one. (Case in point: Oracle.)
In my experience, SQL Server isn't bad though. I know a couple commercial products that started with SQL Server in the late '90s and remain happy with it now. The cost hasn't been a problem and they like the support and evolution of the product. They find they can adopt newer versions and features when they need to without too much pain/cost/time.
(Not that I don't think Postgres is generally a better green-field pick, though, and even more so porting away from Oracle.)
Wow, yea, the performance gains and new UX features (JSON_TABLE, MERGE improvements, etc) are huge here, but these really stand out to me:
> PostgreSQL 17 supports using identity columns and exclusion constraints on partitioned tables.
> PostgreSQL 17 also includes a built-in, platform independent, immutable collation provider that's guaranteed to be immutable and provides similar sorting semantics to the C collation except with UTF-8 encoding rather than SQL_ASCII. Using this new collation provider guarantees that your text-based queries will return the same sorted results regardless of where you run PostgreSQL.
I use this for my Lemmy instance & lemmy-ansible and it's been great! No longer having to support upgrade scripts and write a complete upgrade process[1] for people to follow has made my life a lot easier! Amazing product
I heard about that project but it still somewhat convoluted. Imagine being able to simply use "postgres:latest" or better yet "postgres:15" and switch to "postgres:16" and it would just update (like any other minor version does, or any other db, like mysql, does)
pg_upgrade is a bit manual at the moment. If the database could just be pointed to a data directory and update it automatically on startup, that would be great.
Being able to simply switch from "postgres:15" to "postgres:16" in docker for example (I'm aware about pg_autoupdate but it's external and I'm a bit iffy about using it)
What's more, even outside of docker running `pg_upgrade` requires both version to be present (or having older binary handy). Honestly, having the previous version logic to load and process the database seems like it would be little hassle but would improve upgrading significantly...
Massive improvements to vacuum operations:
Much needed features for backups: I'm a huge fan of FDW's and think they are an untapped-gem in Postgres, so I love seeing these improvements:I'm pretty sure I could read them when needed with fdw. Is it a good idea?
I think it can be slow but maybe I could use materialized views or something.
Duckdb can run SQL across the different data formats and insert or update directly into postgres. I run duckdb with python and Prefect for batch jobs, but you can use whatever language or scheduler you perfer.
I can't recommend this setup enough. The only weird things I've run into is a really complex join across multiple postgres tables and parquet files had a bug reading a postgres column type. I simplified the query (which was a good idea anyways) and it hums away
- If the foreign server is close (latency) that’s great
- if your query is complex then it helps if the postgres planner can “push down” to mssql. That will usually happen if you aren’t doing joins to local data
I personally like to set up the foreign tables, then materialize the data into a local postgres table using pg_cron. It’s like a basic ETL pipeline completely built into postgres
Do you have any recommendations on how to manage credentials for `CREATE USER MAPPING ` within the context of cloud hosted dbs?
Deleted Comment
It is so nice having json functionality in a relational db - even if you never actually store json in your database, its useful in so many situations.
Being able to generate json in a query from your data is a big deal too.
Looking forward to really learning json_table
Disk space to store the returned data is cheap and can be periodically flushed only when you are certain the content has been properly extracted.
That’s a very good idea!
Bit sad the UUIDv7 PR didn't make the cut just yet:
https://commitfest.postgresql.org/49/4388/
Where is the iceberg complexity?
1. Chipping away more at vacuum. Fundamentally Postgres doesn't have undo log and therefore has to have vacuum. It's a trade-off of fast recovery vs well.. having to vacuum. The unfortunate part about vacuum is that it adds load to the system exactly when the system needs all the resources. I hope one day people stop knowing that vacuum exists, we are one step closer, but not there.
2. Performance gets better and not worse. Mark Callaghan blogs about MySQL and Postgres performance changes over time and MySQL keep regressing performance while Postgres keeps improving.
https://x.com/MarkCallaghanDBhttps://smalldatum.blogspot.com/
3. JSON. Postgres keep improving QOL for the interop with JS and TS.
4. Logical replication is becoming a super robust way of moving data in and out. This is very useful when you move data from one instance to another especially if version numbers don't match. Recently we have been using it to move at the speed of 1Gb/s
5. Optimizer. The better the optimizer the less you think about the optimizer. According to the research community SQL Server has the best optimizer. It's very encouraging that every release PG Optimizer gets better.
But even then, in some recent tests I did, Postgres was less than 0.1 msec slower. And if the schema and queries were not designed with InnoDB in mind, Postgres had little to no performance regression, whereas MySQL had a 100x slowdown.
I love MySQL for a variety of reasons, but it’s getting harder for me to continue to defend it.
I once met an older gentleman who was doing IT work for a defense contractor. He seemed nice enough. We were making small talk and I happened to mention that I had recently installed Linux on my computer at home. He tone changed almost immediately and he started ranting about how Linux was pirated source code, stolen from Microsoft, all of it contains viruses, etc. He was talking about the SCO vs Linux lawsuits but of course got absolutely ALL of the details wrong, like which companies were even involved in the lawsuits. He was so far off the deep end that I didn't even try to correct him, I just nodded and smiled and said I was actually running late to be somewhere else...
What’s the rationale? What do you gain?
Also, MSSQL have few things going for it, and surprisingly no one seem to be even trying to catch up
The MSSQL BI stack is unmatched , SSAS is the top star of BI cubes and the second option is not even closeSSRS is ok, SSIS is passable , but still both are very decent
PowerBI and family is also the best option for Mid to large (not FAANG large, but just normal large) companies
And finally the GEM that is database projects, you can program your DB changes declaratively, there is nothing like this in the market and again, no one is even trying
The easiest platformt todo evolutionary DB development is MS SQL
I really wish someone will implement DB Projects (dacpac) for Postgresql
Besides, managing Microsoft licensing is a bliss close to Oracle's. And yeah, MSSQL is much better in almost every way than Oracle.
If you only compare those two, it's a non-brainier.
- Do you use any cloud provider? Those platforms are built on top of open source software: Linux, nginx (e.g Cloudflare's edge servers before the rust rewrite), ha-proxy (AWS ELB), etc - Either the software your business builds or depends on probably uses open source libraries (e.g: libc, etc) - The programming languages your business uses directly or indirectly are probably open source
My point is that folks that make these kinds of statements have no clue how their software is built or what kind software their business actually depends on.
The part that your boss doesn't trust Postgres is hilarious, of course.
The more deeply your business depends on something, the more careful you need to be when selecting the source for that something. (And the db is often very deeply depended on.)
You want to see why their long-term incentives align with your own needs.
But a revenue stream is just one way to do this, and not a perfect one. (Case in point: Oracle.)
In my experience, SQL Server isn't bad though. I know a couple commercial products that started with SQL Server in the late '90s and remain happy with it now. The cost hasn't been a problem and they like the support and evolution of the product. They find they can adopt newer versions and features when they need to without too much pain/cost/time.
(Not that I don't think Postgres is generally a better green-field pick, though, and even more so porting away from Oracle.)
> PostgreSQL 17 supports using identity columns and exclusion constraints on partitioned tables.
> PostgreSQL 17 also includes a built-in, platform independent, immutable collation provider that's guaranteed to be immutable and provides similar sorting semantics to the C collation except with UTF-8 encoding rather than SQL_ASCII. Using this new collation provider guarantees that your text-based queries will return the same sorted results regardless of where you run PostgreSQL.
- [1] https://github.com/LemmyNet/lemmy-ansible/blob/main/UPGRADIN...
What's more, even outside of docker running `pg_upgrade` requires both version to be present (or having older binary handy). Honestly, having the previous version logic to load and process the database seems like it would be little hassle but would improve upgrading significantly...