If I may hijack the thread with some more general complaints though, I wish the Postgres team would someday prioritize migration. Like make it easier to make all kinds of DB changes on a live DB, make it easier to upgrade between postgres versions with zero (or low) downtime, etc etc.
Warnings when the migration you're about to do is likely to take ages because for some reason it's going to lock the entire table, instant column aliases to make renames easier, instant column aliases with runtime typecasts to make type migrations easier, etc etc etc. All this stuff is currently extremely painful for, afaict, no good reason (other than "nobody coded it", which is of course a great reason in OSS land).
I feel like there's a certain level of stockholm syndrome in the sense that to PG experts, these things aren't that painful anymore because they know all the pitfalls and gotchas and it's part of why they're such valued engineers.
We currently use MongoDB and while Postgres is attractive for so many reasons, even with Amazon Aurora's Postgres we still need legacy "database maintenance windows" in order to achieve major version upgrades.
With MongoDB, you're guaranteed single-prior-version replication compatibility within a cluster. This means you spin up an instance with the updated version of MongoDB, it catches up to the cluster. Zero downtime, seamless transition. There may be less than a handful of cancelled queries that are retryable but no loss of writes with their retryable writes and write concern preferences. e.g. MongoDB 3.6 can be upgraded to MongoDB 4.0 without downtime.
Edit: Possibly misinformed but the last deep dive we did indicated there was not a way to use logical replication for seamless upgrades. Will have to research.
> Edit: Possibly misinformed but the last deep dive we did indicated there was not a way to use logical replication for seamless upgrades. Will have to research.
Migrations are quite highly prioritized in PostgreSQL. PostgreSQL has the by far best migration support of any database I have worked with. There is of course a lot of work left to do but that is true for many other areas in PostgreSQL.
Also I can't agree at all with "nobody coded it", patches in this area generally welcome. My first real patch for PostgreSQL was about improving migrations. And there has been talk about several of the ideas you propose, but nobody is working on any of them right now. So I mostly think it is a lack of resources in general.
Sorry, I didn't mean offense.
What I meant with "nobody coded it" is that the migration DX features that don't exist yet, likely don't exist simply because they haven't been made yet (and not because eg they're architecturally impossible or because the postgres team are stupid or sth).
Its hard to complain about OSS without attacking the creators, I tried to do that right buy clearly I failed nevertheless :-) Thanks for your contributions!
I love postgresql, as long as I don’t have to do upgrade. I have yet to see a successful zero downtime upgrade. That being said, other databases aren’t that much better. Maybe except SQLite.
They are slowly getting there. For example, postgres 12 added REINDEX CONCURRENTLY. Under the hood it's just recreating the index and then name swapping and dropping the old one. Basically what pg_repack was doing.
There's a huge collection of tricks out there that just need to become formal features. The trick I'm working with today is adding a check constraint with NOT VALID and then immediately calling VALIDATE because otherwise it takes a very aggressive lock that blocks writes. That could easy become ALTER TABLE CONCURRENTLY or something.
Do you know a good resource with these tricks? I often struggle to predict exactly which schema migrations will lock tables aggressively and what the smartest workaround is.
An orthogonal migration issue which I'm hitting right now: we need to migrate from heroku postgres to aws rds postgres, and I'm stressed about the risk and potential downtime in doing so. If there was a way to make a replica in rds based on heroku, promote the rds replica to be the primary, hard switch our apps over to rds, that'd be a lifesaver.
I'm working through this blog post [1] now, but there is still a bit to be defined (including a dependency on heroku's support team) to get this rolling.
Why the migration is required? Heroku postgres doesn't support logical replication, and logical replication is required for any ELT vendor (Fivetran, Stitch, Airbyte) to use Change Data Capture to replicate data from postgres to snowflake (with replicating deleted rows efficiently).
Note: I've also read this ebook [2], but this approach requires downtime.
Note 2: I reached out to heroku support and asked if logical replication was on their short term roadmap. They said they've heard this quite a bit, but nothing tangible is on the roadmap.
If anyone has any thoughts on the above migration, I'd be all ears. :)
I did that exact migration. Unfortunately, to my knowledge, there's no way to do it with zero downtime. You need to make your app read only until the RDS instance has ingested your data, then you can cut over. For me, that was roughly one gigabyte of data and took about forty seconds.
My best advice is to automate the whole thing. You can automate it with the Heroku and AWS CLIs. Test on your staging site until you can run through the whole process end to end a few times with no interruptions.
We've moved a number of customers from Heroku over to Crunchy Bridge with essentially no down time, am currently helping one customer with 7TB through that process. It's not over to RDS, but would be happy to talk through process if helpful. And we do support logical replication and have many people using wal2json/logical replication with us.
> Why the migration is required? Heroku postgres doesn't support logical replication
You could possibly hack together some form of higher-layer logical replication via postgres_fdw and database triggers. A comment ITT references this as a known technique.
One possible solution for the ETL stuff might be to use Heroku Kafka for the Change Data Capture and then from that Kafka you can move it someplace else.
Adding to your list of options that still require _some_ downtime: we used Bucardo [0] in lieu of logical replication. It was a bit of a pain, since Bucardo has some rough edges, but we made it work. Database was ~2 TiB.
> I feel like there's a certain level of stockholm syndrome in the sense that to PG experts, these things aren't that painful anymore
I don't know if I’m a PG expert, but I just prefer “migration tool” to be a separate thing, and for the DB server engine to focus on being an excellent DB server engine, with the right hooks to support a robust tooling ecosystem, rather than trying to be the tooling ecosystem.
As a developer I fully support the notion of splitting the tools out from the server engine, like things are today.
But, realistically, pg_upgrade's functionality would need to be integrated into the server itself if we're ever going to have zero-downtime upgrades, right?
I don't know how other RDBMSs handle this, if at all
> If I may hijack the thread with some more general complaints though, I wish the Postgres team would someday prioritize migration.
A thing I'm interested in is a 'simple' replication setup to reduce single points of failure. We currently use Galera with My/MariaDB/Percona and it's quite handy for HA-ish needs: we can have two DBs and the garbd running on the web app server.
Pointers to tutorials for Debian/Ubuntu to accomplish something similar would be appreciated. (We run things on-prem in a private cloud.)
There's no such thing as "simple" when it comes to HA setups, the requirements are simply too varied. PostgreSQL has great documentation for their HA featureset, but when it comes to systems-level concerns (detecting that a primary is down and arranging promotion of a replica to primary) you're expected to address those on your own.
My complain would be that there is no standard multi-master solution for postures, whereas mysql now has group replication as a native multi-master solution.
PostgreSQL is one of the most powerful and reliable pieces of software I've seen run at large scale, major kudos to all the maintainers for the improvements that keep being added.
> PostgreSQL 14 extends its performance gains to the vacuuming system, including optimizations for reducing overhead from B-Trees. This release also adds a vacuum "emergency mode" that is designed to prevent transaction ID wraparound
Dealing with transaction ID wraparounds in Postgres was one of the most daunting but fun experiences for me as a young SRE. Each time a transaction modifies rows in a PG database, it increments the transaction ID counter. This counter is stored as a 32-bit integer and it's critical to the MVCC transaction semantics - a transaction with a higher ID should not be visible to a transaction with a lower ID. If the value hits 2 billion and wraps around, disaster strikes as past transactions now appear to be in the future. If PG detects it is reaching that point, it complains loudly and eventually stops further writes to the database to prevent data loss.
Postgres avoids getting anywhere close to this situation in almost all deployments by performing routine "auto-vacuums" which mark old row versions as "frozen" so they are no longer using up transaction ID slots. However, there are a couple situations where vacuum will not be able to clean up enough row versions. In our case, this was due to long-running transactions that consumed IDs but never finished. Also it is possible but highly inadvisable to disable auto-vacuums. Here is a postmortem from Sentry who had to deal with this leading to downtime: https://blog.sentry.io/2015/07/23/transaction-id-wraparound-...
It looks like the new vacuum "emergency mode" functionality starts vacuuming more aggressively when getting closer to the wraparound event, and as with every PG feature highly granular settings are exposed to tweak this behaviour (https://www.postgresql.org/about/featurematrix/detail/360/)
> Also it is possible but highly inadvisable to disable auto-vacuums.
When I was running my first Postgres cluster (the reddit databases), I had no idea what vacuuming was for. All I knew was that every time it ran it slowed everything down. Being dumb, I didn't bother to read the docs, I just disabled the auto vacuum.
Eventually writes stopped and I had to take a downtime to do a vacuum. Learned a few important lessons that day. I also then set it up to do an aggressive vacuum every day at 3am, which was the beginning of low traffic time, so that the auto-vacuuming didn't have as much work to do during the day.
> Each time a transaction modifies rows in a PG database, it increments the transaction ID counter.
It's a bit more subtle than that: each transaction that modifies, deletes or locks rows will update the txID counter. Row updates don't get their own txID assigned.
> It looks like the new vacuum "emergency mode" functionality starts vacuuming more aggressively when getting closer to the wraparound
When close to wraparound, the autovacuum daemon stops cleaning up the vacuumed tables' indexes, yes. That saves time and IO, at the cost of index and some table bloat, but both are generally preferred over a system-blocking wraparound vacuum.
It would increase disk usage by a significant amount, since transaction IDs appear twice in tuple headers (xmin/xmax). Essentially they are overhead on every database row. This submission has a discussion on it: https://news.ycombinator.com/item?id=19082944
I believe mostly how much code needs to be changed, which they are working towards slowly, but there is more overhead (memory / disk) associated with those larger data types which are used everywhere:
I know this isn't even a big enough deal to mention in the news release, but I am massively excited about the new multirange data types. I work with spectrum licensing and range data types are a godsend (for representing spectrum ranges that spectrum licenses grant). However, there are so many scenarios where you want to treat multiple ranges like a single entity (say, for example, an uplink channel and a downlink channel in an FDD band). And there are certain operations like range differences (e.g. '[10,100)' - '[50,60)'), that aren't possible without multirange support. For this, I am incredibly grateful.
Also great is the parallel query support for materialized views, connection scalability, query pipelining, and jsonb accessor syntax.
Multiranges are one of the lead items in the news release :) I do agree that they are incredibly helpful and will help to reduce the complexity of working with ranges.
I second this, upgraded 100 odd instances using pg_upgrade. And it's lightning fast, takes less than a min whether the db cluster is a couple GB or a few TB.
Just make sure that you always run the check mode to catch incompatibilities between versions.
I think you can use -o '-D /etc/postgresql/9.5/main' -O '-D /etc/postgresql/13/main' to avoid making the symlink to postgresql.conf. This should also make it easier if postgres.conf tries to includes files from a conf.d subdirectory.
Looks like the best approach might be to use in-place upgrade tool that ships with Postgres to upgrade to v10 (use Postgres v10 to perform this upgrade). From there you'd be able to create a fresh v14 instance and use logical replication to sync over the v10 database. Before briefly stopping writes while you swap over to the v14 database.
EDIT: Looks like pg_upgrade supports directly upgrading multiple major versions. So maybe just use that if you can afford the downtime.
With downtime, I guess the pg_upgrade tool works fine.
Without downtime / near-zero downtime is more interesting though. Since this is an old version, something like Bucardo maybe? It can keep another pg14 instance in sync with your old instance by copying the data over and keeping it in sync. Then you switch your app over to the new DB and kill the old one.
Newer versions make this even easier with logical replication - just add the new DB as a logical replica of the old one, and kill the old one and switch in under 5 seconds when you're ready.
I think I remember a talk where someone manually set up logical replication with triggers and postgres_fdw to upgrade from an old server with zero downtime.
If you can afford a short downtime I would recommend just running pg_upgrade and upgrade directly to 13. Preferably test it out first before doing it in production.
Yeah this is how we do it... Even with terrabytes of data it seems to be pretty efficient and robust, and easy to just keep up the replicating version until you're satisfied it all looks good.
Depends on whether you can shut down the server for some time or not. If you can live with some downtime, just use pg_upgrade if you install the new server on the same machine: https://www.postgresql.org/docs/current/pgupgrade.html
We are currently using pg_upgrade to go from 9.6 to 13 in our systems. its supported and works well. We are using hte 'hardlinks' feature that makes it extremely fast as well.
Do be aware that using the "hardlinks" feature will mean that restarting the old database version will not work properly and may not be safe. You should make sure to copy the old directory beforehand if you might need that, or simply restore from backup.
This is one of the major pain points with PostgreSQL.
Unless you absolutely need any new features and uptime is important, you can just continue using PostgreSQL 9.6 even thought it is EOL. https://www.postgresql.org/support/versioning/
It will most likely work great for many more years.
I wish future versions of PostgreSQL will have some backwards compatibility for old system/data tables/datastructures and be able to do live migration when running a newer release.
This is not really true for a number of reasons. pg_upgrade is absolutely the preferred method if you are able to spare small downtime. It will absolutely work to upgrade your existing data structures to be compatible with the new Postgres version, so not sure what you even mean.
It doesn't automatically re-create indexes (to take advantage of new features) for example, but that is likely something you don't want to do right away (incurring extra downtime) when you are doing an upgrade anyways. You can easily just REINDEX after the upgrade is complete.
I converted from MySQL (before whole MariaDB and fork), and I've been happier with every new version. My biggest moment of joy was JSONB and it keeps getting better. Can we please make the connections lighter so that I don't have to use stuff like pgbouncer in the middle? I would love to see that in future versions.
FWIW Mysql 8 has gotten a lot better in standards compliance and ironing out legacy quirks, with some config tweaks. While my heart still belongs to PostgreSQL things like no query hints, dead tuple bloat (maybe zheap will help?), less robust replication (though getting better!), and costly connections dampens my enthusiasm.
The benchmark linked in the comments shows 7-38% improvements. Nothing to scoff at, but if you need PGBouncer that probably won't make enough of a difference.
I'd be curious to see if the concurrency improvements in PostgreSQL 14 help with increasing the threshold for when you need to introduce a connection pooler.
Lighter connections would finally allow for using lambda functions that access a Postgres database without needing a dedicated pgbouncer server in the middle.
PostgreSQL is one of those tools I know I can always rely on for a new use-case. There are very few cases where it can't do exactly what I need (large scale vector search/retrieval).
Congrats on the 14.0 release.
The pace of open source has me wondering what we'll be seeing 50 years from now.
Also there's Pinecone (https://www.pinecone.io) which can sit alongside Postgres or any other data warehouse and ingest vector embeddings + metadata for vector search/retrieval.
I need a self managed solution, so I'm not sure Pinecone is feasible and I don't think pgvector scales well enough for my use-case (hundreds of millions of vectors).
So far I think I'm going to go with Milvus[1], ideally I'd just have a foreign data wrapper for Milvus or FAISS.
If I may hijack the thread with some more general complaints though, I wish the Postgres team would someday prioritize migration. Like make it easier to make all kinds of DB changes on a live DB, make it easier to upgrade between postgres versions with zero (or low) downtime, etc etc.
Warnings when the migration you're about to do is likely to take ages because for some reason it's going to lock the entire table, instant column aliases to make renames easier, instant column aliases with runtime typecasts to make type migrations easier, etc etc etc. All this stuff is currently extremely painful for, afaict, no good reason (other than "nobody coded it", which is of course a great reason in OSS land).
I feel like there's a certain level of stockholm syndrome in the sense that to PG experts, these things aren't that painful anymore because they know all the pitfalls and gotchas and it's part of why they're such valued engineers.
With MongoDB, you're guaranteed single-prior-version replication compatibility within a cluster. This means you spin up an instance with the updated version of MongoDB, it catches up to the cluster. Zero downtime, seamless transition. There may be less than a handful of cancelled queries that are retryable but no loss of writes with their retryable writes and write concern preferences. e.g. MongoDB 3.6 can be upgraded to MongoDB 4.0 without downtime.
Edit: Possibly misinformed but the last deep dive we did indicated there was not a way to use logical replication for seamless upgrades. Will have to research.
It is possible since PG10
https://severalnines.com/database-blog/how-upgrade-postgresq...
I'm not gonna choose MongoDB if I need a relational model… even if it offers zero downtime upgrades out-of-the-box.
Deleted Comment
Also I can't agree at all with "nobody coded it", patches in this area generally welcome. My first real patch for PostgreSQL was about improving migrations. And there has been talk about several of the ideas you propose, but nobody is working on any of them right now. So I mostly think it is a lack of resources in general.
Its hard to complain about OSS without attacking the creators, I tried to do that right buy clearly I failed nevertheless :-) Thanks for your contributions!
BS, have you never worked with mysql or sqlite?
There's a huge collection of tricks out there that just need to become formal features. The trick I'm working with today is adding a check constraint with NOT VALID and then immediately calling VALIDATE because otherwise it takes a very aggressive lock that blocks writes. That could easy become ALTER TABLE CONCURRENTLY or something.
I'm working through this blog post [1] now, but there is still a bit to be defined (including a dependency on heroku's support team) to get this rolling.
Why the migration is required? Heroku postgres doesn't support logical replication, and logical replication is required for any ELT vendor (Fivetran, Stitch, Airbyte) to use Change Data Capture to replicate data from postgres to snowflake (with replicating deleted rows efficiently).
Note: I've also read this ebook [2], but this approach requires downtime.
Note 2: I reached out to heroku support and asked if logical replication was on their short term roadmap. They said they've heard this quite a bit, but nothing tangible is on the roadmap.
If anyone has any thoughts on the above migration, I'd be all ears. :)
1) https://vericred.com/how-we-migrated-a-1tb-database-from-her...
2) https://pawelurbanek.com/heroku-migrate-postgres-rds
My best advice is to automate the whole thing. You can automate it with the Heroku and AWS CLIs. Test on your staging site until you can run through the whole process end to end a few times with no interruptions.
You could possibly hack together some form of higher-layer logical replication via postgres_fdw and database triggers. A comment ITT references this as a known technique.
See https://blog.heroku.com/streaming-data-connectors-beta Heroku's own Kafka seems to have slightly more native support than if you use a 3rd party like Confluence.
We've not yet tried any of this, but it's been bookmarked as a possible solution to explore.
Adding to your list of options that still require _some_ downtime: we used Bucardo [0] in lieu of logical replication. It was a bit of a pain, since Bucardo has some rough edges, but we made it work. Database was ~2 TiB.
[0] https://bucardo.org/
I don't know if I’m a PG expert, but I just prefer “migration tool” to be a separate thing, and for the DB server engine to focus on being an excellent DB server engine, with the right hooks to support a robust tooling ecosystem, rather than trying to be the tooling ecosystem.
But, realistically, pg_upgrade's functionality would need to be integrated into the server itself if we're ever going to have zero-downtime upgrades, right?
I don't know how other RDBMSs handle this, if at all
A thing I'm interested in is a 'simple' replication setup to reduce single points of failure. We currently use Galera with My/MariaDB/Percona and it's quite handy for HA-ish needs: we can have two DBs and the garbd running on the web app server.
Pointers to tutorials for Debian/Ubuntu to accomplish something similar would be appreciated. (We run things on-prem in a private cloud.)
https://github.com/zalando/patroni
https://clusterlabs.github.io/PAF/
https://repmgr.org/
https://github.com/sorintlab/stolon
Also PG is one of the few that support schema/DDL statements inside a transaction.
> PostgreSQL 14 extends its performance gains to the vacuuming system, including optimizations for reducing overhead from B-Trees. This release also adds a vacuum "emergency mode" that is designed to prevent transaction ID wraparound
Dealing with transaction ID wraparounds in Postgres was one of the most daunting but fun experiences for me as a young SRE. Each time a transaction modifies rows in a PG database, it increments the transaction ID counter. This counter is stored as a 32-bit integer and it's critical to the MVCC transaction semantics - a transaction with a higher ID should not be visible to a transaction with a lower ID. If the value hits 2 billion and wraps around, disaster strikes as past transactions now appear to be in the future. If PG detects it is reaching that point, it complains loudly and eventually stops further writes to the database to prevent data loss.
Postgres avoids getting anywhere close to this situation in almost all deployments by performing routine "auto-vacuums" which mark old row versions as "frozen" so they are no longer using up transaction ID slots. However, there are a couple situations where vacuum will not be able to clean up enough row versions. In our case, this was due to long-running transactions that consumed IDs but never finished. Also it is possible but highly inadvisable to disable auto-vacuums. Here is a postmortem from Sentry who had to deal with this leading to downtime: https://blog.sentry.io/2015/07/23/transaction-id-wraparound-...
It looks like the new vacuum "emergency mode" functionality starts vacuuming more aggressively when getting closer to the wraparound event, and as with every PG feature highly granular settings are exposed to tweak this behaviour (https://www.postgresql.org/about/featurematrix/detail/360/)
When I was running my first Postgres cluster (the reddit databases), I had no idea what vacuuming was for. All I knew was that every time it ran it slowed everything down. Being dumb, I didn't bother to read the docs, I just disabled the auto vacuum.
Eventually writes stopped and I had to take a downtime to do a vacuum. Learned a few important lessons that day. I also then set it up to do an aggressive vacuum every day at 3am, which was the beginning of low traffic time, so that the auto-vacuuming didn't have as much work to do during the day.
It's a bit more subtle than that: each transaction that modifies, deletes or locks rows will update the txID counter. Row updates don't get their own txID assigned.
> It looks like the new vacuum "emergency mode" functionality starts vacuuming more aggressively when getting closer to the wraparound
When close to wraparound, the autovacuum daemon stops cleaning up the vacuumed tables' indexes, yes. That saves time and IO, at the cost of index and some table bloat, but both are generally preferred over a system-blocking wraparound vacuum.
https://wiki.postgresql.org/wiki/FullTransactionId
https://en.wikipedia.org/wiki/Site_reliability_engineering
It's amazing what improvements we can get through public collaboration.
Also great is the parallel query support for materialized views, connection scalability, query pipelining, and jsonb accessor syntax.
Deleted Comment
Is there a reason to do this rather than putting the key in /etc/apt/trusted.gpg.d/?
EDIT: Looks like pg_upgrade supports directly upgrading multiple major versions. So maybe just use that if you can afford the downtime.
Without downtime / near-zero downtime is more interesting though. Since this is an old version, something like Bucardo maybe? It can keep another pg14 instance in sync with your old instance by copying the data over and keeping it in sync. Then you switch your app over to the new DB and kill the old one.
Newer versions make this even easier with logical replication - just add the new DB as a logical replica of the old one, and kill the old one and switch in under 5 seconds when you're ready.
IMO, that counts as "near-zero downtime".
I wish future versions of PostgreSQL will have some backwards compatibility for old system/data tables/datastructures and be able to do live migration when running a newer release.
pg_upgrade will not work if the internal data structure for your data changes. It only recreates the system tables. https://www.postgresql.org/docs/14/pgupgrade.html
It doesn't automatically re-create indexes (to take advantage of new features) for example, but that is likely something you don't want to do right away (incurring extra downtime) when you are doing an upgrade anyways. You can easily just REINDEX after the upgrade is complete.
According to Robert Haas, the zheap project is dead.
https://www.depesz.com/2020/08/25/waiting-for-postgresql-14-...
https://techcommunity.microsoft.com/t5/azure-database-for-po...
Congrats on the 14.0 release.
The pace of open source has me wondering what we'll be seeing 50 years from now.
Also there's Pinecone (https://www.pinecone.io) which can sit alongside Postgres or any other data warehouse and ingest vector embeddings + metadata for vector search/retrieval.
So far I think I'm going to go with Milvus[1], ideally I'd just have a foreign data wrapper for Milvus or FAISS.
[1] https://github.com/milvus-io/milvus