The largest table was over 100 million rows. Some migrations were painful, however. At that time, some of them would lock the whole table and we'd need to run them overnight. Fortunately, this was for an internal app so we could do that.
I did a double take at this. At the onset of the article, the fact they're using a distributed database and the mention of a "mid 6 figure" DB bill made me assume they have some obscenely large database that's far beyond what a single node could do. They don't detail the Postgres setup that replaced it, so I assume it's a pretty standard single primary and a 100 million row table is well within the abilities of that—I have a 150 million row table happily plugging along on a 2vCPU+16GB instance. Apples and oranges, perhaps, but people shouldn't underestimate what a single modern server can do.
So if snapshot violation is happening inside Multi-AZ instances, it can happen with a single region - multiple read replica kind of setup as well ? But it might be easily observable in Multi-AZ setups because the lag is high ?
Two replicas in a “semi synchronous” configuration, as AWS calls it, is to my knowledge not available in base Postgres. AWS must be using some bespoke replication strategy, which would have different bugs than synchronous replication and is less battle-tested.
But as nobody except AWS knows the implementation details of RDS, this is all idle speculation that doesn’t mean much.
Multi-AZ instances is a long-standing feature of RDS where the primary DB is synchronously replicated to a secondary DB in another AZ. On failure of the primary, RDS fails over to the secondary.
Multi-AZ clusters has two secondaries, and transactions are synchronously replicated to at least one of them. This is more robust than multi-AZ instances if a secondary fails or is degraded. It also allows read-only access to the secondaries.
Multi-AZ clusters no doubt have more "magic" under the hood, as its not a vanilla Postgres feature as far as I'm aware. I imagine this is why it's failing the Jepsen test.
This is because a ton of them are in the Chinese equivalent of like Boise, Idaho where demand is fairly low. People want to live where the high paying jobs are.
Ahead of its time, I guess.
This is the main reason I think alternative sites have a hard time competing. Play anything on YouTube from anywhere and if it's buffering/slow then it's probably your internet connection that's the problem. By contrast do the same on competing streaming sites and it's, more or less, expected especially if you aren't in certain geographic areas.
Monetization on YouTube is mostly just a carrot on a stick. The overwhelming majority of content creators will never make anything more than pocket change off of it. That carrot might still work as an incentivization system, but I don't think it's necessarily the driving force.
[1] https://support.google.com/interconnect/answer/9058809?hl=en