Readit News logoReadit News
skrebbel · 4 years ago
These changes look fantastic.

If I may hijack the thread with some more general complaints though, I wish the Postgres team would someday prioritize migration. Like make it easier to make all kinds of DB changes on a live DB, make it easier to upgrade between postgres versions with zero (or low) downtime, etc etc.

Warnings when the migration you're about to do is likely to take ages because for some reason it's going to lock the entire table, instant column aliases to make renames easier, instant column aliases with runtime typecasts to make type migrations easier, etc etc etc. All this stuff is currently extremely painful for, afaict, no good reason (other than "nobody coded it", which is of course a great reason in OSS land).

I feel like there's a certain level of stockholm syndrome in the sense that to PG experts, these things aren't that painful anymore because they know all the pitfalls and gotchas and it's part of why they're such valued engineers.

bmcahren · 4 years ago
We currently use MongoDB and while Postgres is attractive for so many reasons, even with Amazon Aurora's Postgres we still need legacy "database maintenance windows" in order to achieve major version upgrades.

With MongoDB, you're guaranteed single-prior-version replication compatibility within a cluster. This means you spin up an instance with the updated version of MongoDB, it catches up to the cluster. Zero downtime, seamless transition. There may be less than a handful of cancelled queries that are retryable but no loss of writes with their retryable writes and write concern preferences. e.g. MongoDB 3.6 can be upgraded to MongoDB 4.0 without downtime.

Edit: Possibly misinformed but the last deep dive we did indicated there was not a way to use logical replication for seamless upgrades. Will have to research.

TedShiller · 4 years ago
The problem with MongoDB though is that you're on MongoDB
outworlder · 4 years ago
> Edit: Possibly misinformed but the last deep dive we did indicated there was not a way to use logical replication for seamless upgrades. Will have to research.

It is possible since PG10

https://severalnines.com/database-blog/how-upgrade-postgresq...

ggregoire · 4 years ago
MongoDB and Postgres are like apples and oranges tho.

I'm not gonna choose MongoDB if I need a relational model… even if it offers zero downtime upgrades out-of-the-box.

zozbot234 · 4 years ago
Logical replication across major releases for seamless upgrades has been supported and documented since pgSQL 10.

Deleted Comment

jeltz · 4 years ago
Migrations are quite highly prioritized in PostgreSQL. PostgreSQL has the by far best migration support of any database I have worked with. There is of course a lot of work left to do but that is true for many other areas in PostgreSQL.

Also I can't agree at all with "nobody coded it", patches in this area generally welcome. My first real patch for PostgreSQL was about improving migrations. And there has been talk about several of the ideas you propose, but nobody is working on any of them right now. So I mostly think it is a lack of resources in general.

skrebbel · 4 years ago
Sorry, I didn't mean offense. What I meant with "nobody coded it" is that the migration DX features that don't exist yet, likely don't exist simply because they haven't been made yet (and not because eg they're architecturally impossible or because the postgres team are stupid or sth).

Its hard to complain about OSS without attacking the creators, I tried to do that right buy clearly I failed nevertheless :-) Thanks for your contributions!

oauea · 4 years ago
DDL migrations are amazing. Migrations between different postgres versions not so much.
aynyc · 4 years ago
I love postgresql, as long as I don’t have to do upgrade. I have yet to see a successful zero downtime upgrade. That being said, other databases aren’t that much better. Maybe except SQLite.
nix23 · 4 years ago
> far best migration support of any database I have worked with

BS, have you never worked with mysql or sqlite?

phamilton · 4 years ago
They are slowly getting there. For example, postgres 12 added REINDEX CONCURRENTLY. Under the hood it's just recreating the index and then name swapping and dropping the old one. Basically what pg_repack was doing.

There's a huge collection of tricks out there that just need to become formal features. The trick I'm working with today is adding a check constraint with NOT VALID and then immediately calling VALIDATE because otherwise it takes a very aggressive lock that blocks writes. That could easy become ALTER TABLE CONCURRENTLY or something.

skrebbel · 4 years ago
Do you know a good resource with these tricks? I often struggle to predict exactly which schema migrations will lock tables aggressively and what the smartest workaround is.
tomhallett · 4 years ago
An orthogonal migration issue which I'm hitting right now: we need to migrate from heroku postgres to aws rds postgres, and I'm stressed about the risk and potential downtime in doing so. If there was a way to make a replica in rds based on heroku, promote the rds replica to be the primary, hard switch our apps over to rds, that'd be a lifesaver.

I'm working through this blog post [1] now, but there is still a bit to be defined (including a dependency on heroku's support team) to get this rolling.

Why the migration is required? Heroku postgres doesn't support logical replication, and logical replication is required for any ELT vendor (Fivetran, Stitch, Airbyte) to use Change Data Capture to replicate data from postgres to snowflake (with replicating deleted rows efficiently).

Note: I've also read this ebook [2], but this approach requires downtime.

Note 2: I reached out to heroku support and asked if logical replication was on their short term roadmap. They said they've heard this quite a bit, but nothing tangible is on the roadmap.

If anyone has any thoughts on the above migration, I'd be all ears. :)

1) https://vericred.com/how-we-migrated-a-1tb-database-from-her...

2) https://pawelurbanek.com/heroku-migrate-postgres-rds

bastawhiz · 4 years ago
I did that exact migration. Unfortunately, to my knowledge, there's no way to do it with zero downtime. You need to make your app read only until the RDS instance has ingested your data, then you can cut over. For me, that was roughly one gigabyte of data and took about forty seconds.

My best advice is to automate the whole thing. You can automate it with the Heroku and AWS CLIs. Test on your staging site until you can run through the whole process end to end a few times with no interruptions.

craigkerstiens · 4 years ago
We've moved a number of customers from Heroku over to Crunchy Bridge with essentially no down time, am currently helping one customer with 7TB through that process. It's not over to RDS, but would be happy to talk through process if helpful. And we do support logical replication and have many people using wal2json/logical replication with us.
zozbot234 · 4 years ago
> Why the migration is required? Heroku postgres doesn't support logical replication

You could possibly hack together some form of higher-layer logical replication via postgres_fdw and database triggers. A comment ITT references this as a known technique.

sidmitra · 4 years ago
One possible solution for the ETL stuff might be to use Heroku Kafka for the Change Data Capture and then from that Kafka you can move it someplace else.

See https://blog.heroku.com/streaming-data-connectors-beta Heroku's own Kafka seems to have slightly more native support than if you use a 3rd party like Confluence.

We've not yet tried any of this, but it's been bookmarked as a possible solution to explore.

another · 4 years ago
We faced this migration, too. My sympathies.

Adding to your list of options that still require _some_ downtime: we used Bucardo [0] in lieu of logical replication. It was a bit of a pain, since Bucardo has some rough edges, but we made it work. Database was ~2 TiB.

[0] https://bucardo.org/

tomc1985 · 4 years ago
When you subscribe to managed services instead of running the software yourself, these are the kinds of trade-offs that get made
smileysteve · 4 years ago
logical replication, but this is one of the walls that heroku creates.
dragonwriter · 4 years ago
> I feel like there's a certain level of stockholm syndrome in the sense that to PG experts, these things aren't that painful anymore

I don't know if I’m a PG expert, but I just prefer “migration tool” to be a separate thing, and for the DB server engine to focus on being an excellent DB server engine, with the right hooks to support a robust tooling ecosystem, rather than trying to be the tooling ecosystem.

JohnBooty · 4 years ago
As a developer I fully support the notion of splitting the tools out from the server engine, like things are today.

But, realistically, pg_upgrade's functionality would need to be integrated into the server itself if we're ever going to have zero-downtime upgrades, right?

I don't know how other RDBMSs handle this, if at all

throw0101a · 4 years ago
> If I may hijack the thread with some more general complaints though, I wish the Postgres team would someday prioritize migration.

A thing I'm interested in is a 'simple' replication setup to reduce single points of failure. We currently use Galera with My/MariaDB/Percona and it's quite handy for HA-ish needs: we can have two DBs and the garbd running on the web app server.

Pointers to tutorials for Debian/Ubuntu to accomplish something similar would be appreciated. (We run things on-prem in a private cloud.)

zozbot234 · 4 years ago
There's no such thing as "simple" when it comes to HA setups, the requirements are simply too varied. PostgreSQL has great documentation for their HA featureset, but when it comes to systems-level concerns (detecting that a primary is down and arranging promotion of a replica to primary) you're expected to address those on your own.
harikb · 4 years ago
Doesn’t PG already support inplace version upgrade?

Also PG is one of the few that support schema/DDL statements inside a transaction.

skrebbel · 4 years ago
"one of the few" is a pretty low bar though, I don't know a DB that doesn't suck at this.
znpy · 4 years ago
My complain would be that there is no standard multi-master solution for postures, whereas mysql now has group replication as a native multi-master solution.
brightball · 4 years ago
There’s a Ruby gem called strong_migrations that does this. It’s fantastic and I include it on all of my Ruby projects.
why-el · 4 years ago
The gem does not solve these issues, merely tells you about them (and even then it can't catch all of them, only the ones that Rails defines).
hyper_reality · 4 years ago
PostgreSQL is one of the most powerful and reliable pieces of software I've seen run at large scale, major kudos to all the maintainers for the improvements that keep being added.

> PostgreSQL 14 extends its performance gains to the vacuuming system, including optimizations for reducing overhead from B-Trees. This release also adds a vacuum "emergency mode" that is designed to prevent transaction ID wraparound

Dealing with transaction ID wraparounds in Postgres was one of the most daunting but fun experiences for me as a young SRE. Each time a transaction modifies rows in a PG database, it increments the transaction ID counter. This counter is stored as a 32-bit integer and it's critical to the MVCC transaction semantics - a transaction with a higher ID should not be visible to a transaction with a lower ID. If the value hits 2 billion and wraps around, disaster strikes as past transactions now appear to be in the future. If PG detects it is reaching that point, it complains loudly and eventually stops further writes to the database to prevent data loss.

Postgres avoids getting anywhere close to this situation in almost all deployments by performing routine "auto-vacuums" which mark old row versions as "frozen" so they are no longer using up transaction ID slots. However, there are a couple situations where vacuum will not be able to clean up enough row versions. In our case, this was due to long-running transactions that consumed IDs but never finished. Also it is possible but highly inadvisable to disable auto-vacuums. Here is a postmortem from Sentry who had to deal with this leading to downtime: https://blog.sentry.io/2015/07/23/transaction-id-wraparound-...

It looks like the new vacuum "emergency mode" functionality starts vacuuming more aggressively when getting closer to the wraparound event, and as with every PG feature highly granular settings are exposed to tweak this behaviour (https://www.postgresql.org/about/featurematrix/detail/360/)

jedberg · 4 years ago
> Also it is possible but highly inadvisable to disable auto-vacuums.

When I was running my first Postgres cluster (the reddit databases), I had no idea what vacuuming was for. All I knew was that every time it ran it slowed everything down. Being dumb, I didn't bother to read the docs, I just disabled the auto vacuum.

Eventually writes stopped and I had to take a downtime to do a vacuum. Learned a few important lessons that day. I also then set it up to do an aggressive vacuum every day at 3am, which was the beginning of low traffic time, so that the auto-vacuuming didn't have as much work to do during the day.

williamdclt · 4 years ago
Everytime I've seen people having "vacuuming too expensive" problems, the solution was "more vacuum"!
mattashii · 4 years ago
> Each time a transaction modifies rows in a PG database, it increments the transaction ID counter.

It's a bit more subtle than that: each transaction that modifies, deletes or locks rows will update the txID counter. Row updates don't get their own txID assigned.

> It looks like the new vacuum "emergency mode" functionality starts vacuuming more aggressively when getting closer to the wraparound

When close to wraparound, the autovacuum daemon stops cleaning up the vacuumed tables' indexes, yes. That saves time and IO, at the cost of index and some table bloat, but both are generally preferred over a system-blocking wraparound vacuum.

darksaints · 4 years ago
What is wrong with using a 64 bit, or even 128 bit transaction id?
hyper_reality · 4 years ago
It would increase disk usage by a significant amount, since transaction IDs appear twice in tuple headers (xmin/xmax). Essentially they are overhead on every database row. This submission has a discussion on it: https://news.ycombinator.com/item?id=19082944
SigmundA · 4 years ago
I believe mostly how much code needs to be changed, which they are working towards slowly, but there is more overhead (memory / disk) associated with those larger data types which are used everywhere:

https://wiki.postgresql.org/wiki/FullTransactionId

cutler · 4 years ago
SRE?
grzm · 4 years ago
MrWiffles · 4 years ago
Thank you for this explanation!
mattashii · 4 years ago
Once again, thanks to all the contributors that provided these awesome new features, translations and documentation.

It's amazing what improvements we can get through public collaboration.

postgresapp · 4 years ago
If you want to test the new features on a Mac, we've just uploaded a new release of Postgres.app: https://postgresapp.com/downloads.html
abdusco · 4 years ago
I love this app on Mac, but I wonder if there is a similar app for Windows (i.e. portable Postgres)?
TedShiller · 4 years ago
This app is amazing. Highly recommend it.
darksaints · 4 years ago
I know this isn't even a big enough deal to mention in the news release, but I am massively excited about the new multirange data types. I work with spectrum licensing and range data types are a godsend (for representing spectrum ranges that spectrum licenses grant). However, there are so many scenarios where you want to treat multiple ranges like a single entity (say, for example, an uplink channel and a downlink channel in an FDD band). And there are certain operations like range differences (e.g. '[10,100)' - '[50,60)'), that aren't possible without multirange support. For this, I am incredibly grateful.

Also great is the parallel query support for materialized views, connection scalability, query pipelining, and jsonb accessor syntax.

jkatz05 · 4 years ago
Multiranges are one of the lead items in the news release :) I do agree that they are incredibly helpful and will help to reduce the complexity of working with ranges.

Deleted Comment

buro9 · 4 years ago
Suppose I had a "friend" with a PostgreSQL 9.6 instance (a large single node)... what's the best way to upgrade to PostgreSQL 14?
dylanz · 4 years ago
If you're on a single node, probably something like this. This is what my "friend" is doing later this week. He's migrating from 9.5 to 13:

    sudo sh -c 'echo "deb http://apt.postgresql.org/pub/repos/apt $(lsb_release -cs)-pgdg main" > /etc/apt/sources.list.d/pgdg.list'
    wget --quiet -O - https://www.postgresql.org/media/keys/ACCC4CF8.asc | sudo apt-key add -
    sudo apt-get update
    sudo apt install postgresql-13
    sudo service postgresql stop
    sudo mkdir -p /secure/pgsql13/data
    sudo chown -R postgres:postgres /secure/pgsql13/data
    sudo -u postgres /usr/lib/postgresql/13/bin/initdb -D /secure/pgsql13/data
    sudo rm -rf /var/lib/postgresql/13/main
    sudo ln -s /secure/pgsql13/data /var/lib/postgresql/13/main
    sudo chown -hR postgres:postgres /var/lib/postgresql/13/main
    sudo ln -s /etc/postgresql/9.5/main/postgresql.conf /var/lib/postgresql/9.5/main/postgresql.conf
    sudo chown -hR postgres:postgres /etc/postgresql/9.5/main/postgresql.conf
    cd /tmp
    sudo -u postgres time /usr/lib/postgresql/13/bin/pg_upgrade --old-bindir /usr/lib/postgresql/9.5/bin --new-bindir /usr/lib/postgresql/13/bin --old-datadir /var/lib/postgresql/9.5/main --new-datadir /var/lib/postgresql/13/main --link --check
    sudo -u postgres time /usr/lib/postgresql/13/bin/pg_upgrade --old-bindir /usr/lib/postgresql/9.5/bin --new-bindir /usr/lib/postgresql/13/bin --old-datadir /var/lib/postgresql/9.5/main --new-datadir /var/lib/postgresql/13/main --link
    sudo service postgresql start 13
    sudo -u postgres "/usr/lib/postgresql/13/bin/vacuumdb" --all -j 32 --analyze-only
    sudo -u postgres ./delete_old_cluster.sh

throw0101a · 4 years ago
> wget --quiet -O - https://www.postgresql.org/media/keys/ACCC4CF8.asc | sudo apt-key add -

Is there a reason to do this rather than putting the key in /etc/apt/trusted.gpg.d/?

sharadov · 4 years ago
I second this, upgraded 100 odd instances using pg_upgrade. And it's lightning fast, takes less than a min whether the db cluster is a couple GB or a few TB. Just make sure that you always run the check mode to catch incompatibilities between versions.
GrayShade · 4 years ago
I think you can use -o '-D /etc/postgresql/9.5/main' -O '-D /etc/postgresql/13/main' to avoid making the symlink to postgresql.conf. This should also make it easier if postgres.conf tries to includes files from a conf.d subdirectory.
nicoburns · 4 years ago
Looks like the best approach might be to use in-place upgrade tool that ships with Postgres to upgrade to v10 (use Postgres v10 to perform this upgrade). From there you'd be able to create a fresh v14 instance and use logical replication to sync over the v10 database. Before briefly stopping writes while you swap over to the v14 database.

EDIT: Looks like pg_upgrade supports directly upgrading multiple major versions. So maybe just use that if you can afford the downtime.

yarcob · 4 years ago
Why would you first upgrade to PG 10?
sudhirj · 4 years ago
With downtime, I guess the pg_upgrade tool works fine.

Without downtime / near-zero downtime is more interesting though. Since this is an old version, something like Bucardo maybe? It can keep another pg14 instance in sync with your old instance by copying the data over and keeping it in sync. Then you switch your app over to the new DB and kill the old one.

Newer versions make this even easier with logical replication - just add the new DB as a logical replica of the old one, and kill the old one and switch in under 5 seconds when you're ready.

yarcob · 4 years ago
I think I remember a talk where someone manually set up logical replication with triggers and postgres_fdw to upgrade from an old server with zero downtime.
Diggsey · 4 years ago
pg_upgrade has a `link` option that when used reduces the time take to ~15 seconds.

IMO, that counts as "near-zero downtime".

jeltz · 4 years ago
If you can afford a short downtime I would recommend just running pg_upgrade and upgrade directly to 13. Preferably test it out first before doing it in production.
jeremy_sikich · 4 years ago
Pglogical, replicate to a new database server. There will be minimal downtime and it allows you to test the process thoroughly.
bananaoomarang · 4 years ago
Yeah this is how we do it... Even with terrabytes of data it seems to be pretty efficient and robust, and easy to just keep up the replicating version until you're satisfied it all looks good.
yarcob · 4 years ago
Depends on whether you can shut down the server for some time or not. If you can live with some downtime, just use pg_upgrade if you install the new server on the same machine: https://www.postgresql.org/docs/current/pgupgrade.html
briffle · 4 years ago
We are currently using pg_upgrade to go from 9.6 to 13 in our systems. its supported and works well. We are using hte 'hardlinks' feature that makes it extremely fast as well.
zozbot234 · 4 years ago
Do be aware that using the "hardlinks" feature will mean that restarting the old database version will not work properly and may not be safe. You should make sure to copy the old directory beforehand if you might need that, or simply restore from backup.
olavgg · 4 years ago
This is one of the major pain points with PostgreSQL. Unless you absolutely need any new features and uptime is important, you can just continue using PostgreSQL 9.6 even thought it is EOL. https://www.postgresql.org/support/versioning/ It will most likely work great for many more years.

I wish future versions of PostgreSQL will have some backwards compatibility for old system/data tables/datastructures and be able to do live migration when running a newer release.

pg_upgrade will not work if the internal data structure for your data changes. It only recreates the system tables. https://www.postgresql.org/docs/14/pgupgrade.html

Tostino · 4 years ago
This is not really true for a number of reasons. pg_upgrade is absolutely the preferred method if you are able to spare small downtime. It will absolutely work to upgrade your existing data structures to be compatible with the new Postgres version, so not sure what you even mean.

It doesn't automatically re-create indexes (to take advantage of new features) for example, but that is likely something you don't want to do right away (incurring extra downtime) when you are doing an upgrade anyways. You can easily just REINDEX after the upgrade is complete.

maxpert · 4 years ago
I converted from MySQL (before whole MariaDB and fork), and I've been happier with every new version. My biggest moment of joy was JSONB and it keeps getting better. Can we please make the connections lighter so that I don't have to use stuff like pgbouncer in the middle? I would love to see that in future versions.
paulryanrogers · 4 years ago
FWIW Mysql 8 has gotten a lot better in standards compliance and ironing out legacy quirks, with some config tweaks. While my heart still belongs to PostgreSQL things like no query hints, dead tuple bloat (maybe zheap will help?), less robust replication (though getting better!), and costly connections dampens my enthusiasm.
assface · 4 years ago
> maybe zheap will help?

According to Robert Haas, the zheap project is dead.

darksaints · 4 years ago
There has been a lot of improvement in this release.

https://www.depesz.com/2020/08/25/waiting-for-postgresql-14-...

the_duke · 4 years ago
The benchmark linked in the comments shows 7-38% improvements. Nothing to scoff at, but if you need PGBouncer that probably won't make enough of a difference.
jkatz05 · 4 years ago
I'd be curious to see if the concurrency improvements in PostgreSQL 14 help with increasing the threshold for when you need to introduce a connection pooler.
darksaints · 4 years ago
Check out the Azure team's benchmarks. Pretty damn impressive.

https://techcommunity.microsoft.com/t5/azure-database-for-po...

xwdv · 4 years ago
Lighter connections would finally allow for using lambda functions that access a Postgres database without needing a dedicated pgbouncer server in the middle.
jeltz · 4 years ago
Yes, but this patch does not make startup cheaper, it only decreases the performance cost of concurrent open connections.
plasma · 4 years ago
Not sure if you’re using this, but AWS has RDS Proxy as a service, in case you’re hosting your own
TOMDM · 4 years ago
PostgreSQL is one of those tools I know I can always rely on for a new use-case. There are very few cases where it can't do exactly what I need (large scale vector search/retrieval).

Congrats on the 14.0 release.

The pace of open source has me wondering what we'll be seeing 50 years from now.

gk1 · 4 years ago
I recall seeing some library that adds vector search to Postgres. Maybe https://github.com/ankane/pgvector?

Also there's Pinecone (https://www.pinecone.io) which can sit alongside Postgres or any other data warehouse and ingest vector embeddings + metadata for vector search/retrieval.

gavinray · 4 years ago
ElasticSearch has "dense_vector" datatype and vector-specific functions.

  https://www.elastic.co/guide/en/elasticsearch/reference/current/dense-vector.html
  https://www.elastic.co/guide/en/elasticsearch/reference/current/query-dsl-script-score-query.html#vector-functions
ZomboDB integrates ElasticSearch as a PG extension, written in Rust:

  https://github.com/zombodb/zombodb
I dunno what exactly a "dense_vector" is, but if you can't use the native "tsvector" maybe you could use this?

TOMDM · 4 years ago
I need a self managed solution, so I'm not sure Pinecone is feasible and I don't think pgvector scales well enough for my use-case (hundreds of millions of vectors).

So far I think I'm going to go with Milvus[1], ideally I'd just have a foreign data wrapper for Milvus or FAISS.

[1] https://github.com/milvus-io/milvus