Imagine you are starting a new project, and your backing datastore options are restricted to mysql or postgres (and/or their cloud-tailored equivalents.) What sort of functional requirements would cause you to choose one over the other?
Postgres. Fast, full-featured, rock-solid, and a great community.
I think many of us can’t be bothered to go over (again) the issues we’ve had with MySQL in the past. The last straw for me was about ten years ago, when I caught MySQL merrily making up nonsense results for a query I’d issued that accidentally didn’t make any sense.
Very likely this particular issue, and others like it, have been fixed in the meantime. But I just got the sense that MySQL was developed by people who didn’t quite know what they were doing, and that people who really did know what they were doing weren’t ever likely to be attracted to that to fix it.
Good you shouldn't be using them. You shouldn't be using foreign keys either. It just makes working with data harder and doesn't help with constraining it if your data modifications are inside transactions and properly written statements.
Having used postgres for the past decade, I tried MySQL for a side project to see whats changed with it. The sad answer is that it feels like nothing has changed - Oracle seems to have let what used to be a core technology of the industry languish.
I'm sure there are use cases where MySQL will be the better choice over postgres, but the future for the stack looks bleak.
Oracle seems to have let what used to be a core
technology of the industry languish
I think slowly squeezing the life from MySQL was a very explicit goal for them. After the big wins (Wal-Mart, etc) MySQL had 15-20 years ago I think it was very clear MySQL was going to chip away at more and more of Oracle's business.
I wonder how much Oracle spends on MySQL every year? They're spending a lot of money to keep MySQL at kind of a "not quite good enough" state. But they can't kill it outright - it'd be like boiling a frog fast instead of slow.
In the end, I wonder what extinguishing MySQL really accomplished for them. It might have bought them some breathing room but Postgres quickly filled MySQL's old segment.
Creating a series of connections very quickly is cheaper in MySQL and MariaDB than in PostgreSQL. Typically, a connection poller is used before PostgreSQL to support connection scalability.
I'm not sure if there has been a recent breakthrough that has changed that. I think that still applies today. Correct me if I'm wrong.
Actually I would argue that there isn’t a single reason to use MySQL over Postgres (barring the obvious- it’s what the team already knows, or the company already uses, etc.)
> I'm sure there are use cases where MySQL will be the better choice over postgres, but the future for the stack looks bleak.
see, I'm pretty sure there basically weren't. It lucked out at the right moment in the late 1990s. Also, Slashdot used it.
The only use case I can think of is when you want an application, and it requires or is highly optimised to MySQL. Otherwise, it should actively be avoided.
Yep. The real question here is: it's 2023, why would you choose MySQL over PostgreSQL?
Not that there aren't reasons. There are some. But for starting out with a new app without a very, very good reason to do something different? PostgreSQL every day of the week.
Working with MySQL (MariaDB, but doesn't make much difference). Never get any issues that couldn't be explained by architectural or development mistakes.
Just as example - how do you create read-only user (SELECT only) in Postgres? In MySQL it's extremely simple and it works, while in Postgres it's a nightmare to create and maintain
> Just as example - how do you create read-only user (SELECT only) in Postgres? In MySQL it's extremely simple and it works, while in Postgres it's a nightmare to create
Isn’t that
GRANT SELECT ON ALL TABLES IN SCHEMA foo TO bar;
?
> and maintain
If you mean you want to grant a user select rights to whatever table gets created in the future (a somewhat questionable idea from a security viewpoint):
ALTER DEFAULT PRIVILEGES IN SCHEMA foo GRANT SELECT ON TABLES TO bar;
I guess the lesson is that both these systems evolve fairly rapidly. You can’t use quirks you remember from over 5 years ago to judge their current versions.
It’s really unfair because a lot may have changed in 10 years so it might be worth reconsidering.
But I’m like you, MySQL did some nonsense once that took me hours to work out. So now I really can’t be bothered with any potential quirks it may still have. This is not an SNL sketch.
- It's less featureful, and I'd consider that a strong virtue in the YAGNI camp - less to go wrong, less mental overhead.
- Maintenance is simpler and far less necessary in my general experience.
- Replication is simpler and more reliable.
- You can tell the query optimizer what to do. When this is needed, you'll be thankful. It's a godsend.
That said, I wouldn't run Oracle MySQL. I opt for MariaDB on smaller projects and AWS Aurora MySQL for larger projects. Aurora scales insanely well, and replication lag is almost non-existent.
In my general experience MySQL was always significantly faster but it's been a number of years since I've worked with Postgres and the comments here seem to indicate that that may no longer be the case. YMMV
> It's less featureful, and I'd consider that a strong virtue in the YAGNI camp - less to go wrong, less mental overhead.
This doesn't really hold water in my opinion.
It's not like PostgreSQL is some minefield of misfeatures and quirky behavior. Some of these features exist, but have zero impact on you unless you actually opt to use them. But if you end up needing to: they're there, and you can just start using them.
Compare this to MySQL where they simply don't exist no matter how much you may need them. Need to apply an index to the result of a function to quickly fix a performance issue in prod? Sorry, you can't. Need window functions to accurately compute some analytics in a sane period of time? Sorry, you can't. The list of things you can do in PostgreSQL that you simply can't with MySQL is massive and grows every day.
The odds that you'll want, need, or greatly benefit at least one of these features is not small. Having the flexibility of knowing these features exist should you ever have a use-case for them is massive.
I had the misfortune of inheriting a MySQL 5.6 database and I had to manage it for a year and a half. It wasn't very big (less than 100GB), but:
- for some reason it hang periodically and had to be restarted during the night for no apparent reason
- compared to Postgres, the tooling is garbage (both for backups and even more for general database administration)
- essential features are missing, the most important one of which, for me, was proper CSV import/export. CSV-related functionality is so broken and terribly inconvenient to use. In a specific case I had to write a program to export millions of records manually since MySQL could not generate correct CSV export due to some columns containing text with special characters, quotes, newlines. Any combination of the export parameters ("ENCLOSED BY", "ESCAPED BY" and all the other garbage options) failed in one way or another. I even tried to use non-standard characters like \x15 and \r for column and row separation but even that failed. With Postgres, "with csv header" is simple and works every time.
I also managed bigger Postgres databases (up to tens of terabytes) and never had the issues I encountered with MySQL.
> It's less featureful, and I'd consider that a strong virtue in the YAGNI camp - less to go wrong, less mental overhead.
Imagine when you actually need any of the features that PostgreSQL provides like pub/sub, logical replication, JSONB etc. With MySQL you might have to hack a solution that is much more complex or you have to set up an entirely separate tool. What I find nice with PostgreSQL is that for simple cases you can get away without a dedicated key/value store or a queue or a full text search engine. You can do a lot of these kinds of tasks with just a single database.
> features that PostgreSQL provides like [...] logical replication
MySQL has offered logical replication for considerably longer than Postgres, and it's a substantially more configurable and mature implementation. MySQL's built-in replication has always been logical replication. It's a feature MySQL has had for 23 years -- built-in logical replication is literally one of the top reasons why all the biggest MySQL users chose MySQL originally!
The problem is that when you you need to tell the planner what to do, you can do it in MySQL, but not postgres. Imagine you've got a production database with lots of traffic which suddenly can't handle anything because it inserted an extra row which tipped the balance and now takes seconds to process a common query.
Do you know how to fix the table statistics quickly? Do you know how to change that query to force the execution plan you want? Do you know how long the solution will last until the stats change again?
MySQL is a bit more predictable for this case and if things go really bad for some unexpected reason, one comment can fix it.
I'm looking at it from ops perspective. The ratio during development doesn't matter that much - all issues are solvable at that stage. For me it's rather "which situation would I rather be in at 3am".
Postgres query planner suddenly deciding to do something silly in the middle of the night has taken Notion down a few times. It's quite annoying, and it's very frustrating to have no recourse.
There is almost no good reason to choose MySQL over PostgreSQL for any operational reason, I did a deep dive many moons ago (before major improvements in performance to postgres) and people were saying that MySQL was faster. I found that not to be true and the differences have only gained even more favour towards postgres.
also, I assume you mean MariaDB as MySQL is owned by Oracle and I would greatly implore anyone and everyone to avoid Oracle as if it has herpes.
There are a lot of historic problems with MySQL accepting invalid data, committing data even when there are constraint issues, and having very poor transactional isolation, I am not sure if these have improved.
Truthfully, the only benefits you gain from using MariaDB or MySQL are:
* Memory tables
* Having inconsistent replicas (which can be useful when you want your downstream to have less data than your upstream and you know it won’t get updated.)
herpes isn't that bad. most people will get it in their lifetime. 1 in 6 people have hsv-2, the less common variant. trying to avoid herpes is like trying to avoid chickenpox (although herpes isn't nearly as harmful as chickenpox).
you should avoid Oracle like it's a blood pathogen.
I think that the only reasons to choose MySQL (or Maria) over Postgres for a new project are operational. Postgres is probably the better database in almost all respects, but major version upgrades are much much more of a pain on Postgres than on almost any other system I have ever used. That being said, I would choose Postgres pretty much every time for a new project. The only reason I would use Maria or MySQL would be if I thought I later would want to have something like Vitess, for which I think there isn't really an equivalent for Postgres.
> but major version upgrades are much much more of a pain on Postgres than on almost any other system I have ever used.
This is a thread comparing MySQL and Postgres and your claim is that postgres is harder to do major version upgrades than anything you have used??
Context is important here, have you honestly actually upgraded a MySQL node? It’s a lesson in pain and “major” version changes happen on minor versions, like the entire query planner completely trashing performance in 5.6->5.7
Postgres has two forms of updates:
1) in place binary upgrade.
Fast, clean, simple, requires that you have the binaries for the old and the new database.
2) dump/restore.
Serialise the database into text files, load a new database and deserialise those files into it.
Slow, but works flawlessly & consistently with relatively low danger.
MySQL can only do option 2.
You can sort of fake an “update” by abusing the fact that MYSQLs replication offers no guarantees, so you can make a new server a replica; then roll over. But it is impossible to know what data was lost in that transition and MySQL will happily continue without ever telling you.
I have experienced this behaviour in large e-commerce retailers. MySQL was very popular for a very long time and I am intimately aware of operational best practices and how they are merely patching over an insane system.
Is there a good way to do case-insensitive accent-insensitive collations yet in postgresql? It’s been a holdup for using that for some use cases like searching for data, like a person’s name, in pgsql when the casing or accents don’t match perfectly.
Mssql has had this for ever, and I’m pretty sure MySQL has it as well.
We ran a Galera cluster for 4 days and it died on the first ALTER TABLE we did. Only option was to reduce it down to a Primary / Replica setup. I really can't recommend anyone using this.
Postgres is >50x slower for range queries(example below) and is akin to using array-of-pointers (ie Java) whereas MySQL supports array-of-struct (C). Illustration from Dropbox scaling talk below.
Sneak peek photo [1] (from [2]). Just imagine its literally 500-1000x more convoluted per B-tree leaf node. That's every Postgres table unless you CLUSTER periodically.
Mind boggling how many people aren't aware of primary indexes in MySQL that is not supported at all in Postgres. For certain data layouts, Postgres pays either 2x storage (covering index containing every single column), >50x worse performance by effectively N+1 bombing the disk for range queries, or blocking your table periodically (CLUSTER).
In Postgres the messiness loading primary data after reaching the B-tree leaf nodes pollutes caches and takes longer. This is because you need to load one 8kb page for every row you want, instead of one 8kb with 20-30 rows packed together.
Example: Dropbox file history table. They initially used autoinc id for primary key in MySQL. This causes everybodys file changes to be mixed together in chronological order on disk in a B-Tree. The first optimization they made was to change the primary key to (ns_id, latest, id) so that each users (ns_id) latest versions would be grouped together on disk.
If a dropbox user has 1000 files and you can fit 20 file-version rows on each 8kb disk page (400bytes/row), the difference in performance for querying across those 1000 files is 20 + logN disk reads (MySQL) vs 1000 + logN disk reads (Postgres). AKA 400KiB data loaded (MySQL) vs 8.42MiB loaded (Postgres). AKA >50x improvement in query time and disk page cache utilization.
In Postgres you get two bad options for doing this: 1) Put every row of the table in the index making it a covering index, and paying to store all data twice (index and PG heap). No way to disable the heap primary storage. 2) Take your DB offline every day and CLUSTER the table.
Realistically, PG users pay that 50x cost without thinking about it. Any time you query a list of items in PG even using an index, you're N+1 querying against your disk and polluting your cache.
This is why MySQL is faster than Postgres most of the time. Hopefully more people become aware of disk data layout and how it affects query performance.
There is a hack for Postgres where you store data in an array within the row. This puts the data contiguously on disk. It works pretty well, sometimes, but it’s hacky. This strategy is part of the Timescale origin story.
Open to db perf consulting. email is in my profile.
I made same comment elsewhere before finding this comment. I can vouch for this speed up in the ratio of records per page. It is very real. Only applies for small records where you can pack many rows into a page, and where you can cleanly partition by user/tenant, but that’s common enough.
I will say: we kept every table we didn’t have to migrate for perf reasons in Postgres, and never regretted it.
Edit: and the index “fix” for Postgres doesn’t work often. Postgres will validate the row on disk, even if it’s in the index, if the page’s visibility map isn’t set. If you data isn’t write once, there’s a decent chance your page is dirty and it will still make the “heap” fetch.
You are confusing two concepts here. In InnoDB, the tables are always ordered by the primary key when written to actual disk storage.
This is not the same as "having a primary key", Postgres also has primary keys. It just stores the PK index separately from the bulk of the data.
Oracle also has primary keys, even if the order of the rows is different to the key order. In Oracle, when the rows are stored in the same order as the keys in the primary index, it is a special case and these tables are called IOT, index ordered tables.
The disadvantages of IOT are that inserts are slower, because in a normal table, the data is appended to the table, which is the fastest way to add data, and only the index needs to be reordered. In an IOT, the entire table storage is reordered to take the new data into account.
Select queries, OTOH, are much faster when using IOT, for obvious reasons, and this is what you describe in your comment.
If you use TEXT, BLOB, or JSON fields, even in MySQL, the actual data is stored separately.
When using clustered indexes, one tradeoff is that if a non-clustered index isn't covering for a query, it will need to perform B-tree traversals to find rows in the clustered index. This can significantly increase the amount of (at least logical) IO compared to heap tables, where the non-clustered indexes can refer directly to the row id.
Because you can only have a single clustered index, you're effectively paying for efficient range queries on a single clustering key by making all other queries slower.
This tradeoff may or may not be worth it depending on your query patterns. In my experience, you can often get away with adding some subset of columns to a non-clustered index to make it covering, and get efficient range queries without making a copy of the entire dataset.
And even with clustered indexes, as soon as you want a range query that's not supported by your clustered index, you're faced with the exact same choices, except that you have to pay the cost of the extra B-tree traversals.
(Admitting bias: I've only ever worked with postgres in production with update-heavy tables so I've dealt with more of its problems than MySQL's)
Postgres also has other gotchas with indexes - MVCC row visibility isn't stored in the index for obvious performance reasons (writes to non-indexed columns would mean always updating all indexes instead of HOT updates [1]) so you have to hope the version information is cached in the visibility map or else don't really get the benefit of index only scans.
But OTOH, I've read that secondary indexes cause other performance penalties with having to refer back to the data in clustered indexes? Never looked into the details because no need to for postgres which we've been very happy with at our scale :)
I've definitely run in to these kind of issues and learned about them the hard way, but I found that in PostgreSQL it's quite a bit easier to understand what is actually going on due to better documentation and tooling, and I've found this very valuable. Maybe it's just because I've spent more time with PostgreSQL than MariaDB, but MariaDB has often left me quite a bit more confused (on performance, but also other topics).
If your pg database improves with CLUSTER, you can use pg_repack instead to achieve the same effect without downtime. Besides reordering the heap, it will also clear out bloat from the heap and indexes. I highly recommend partitioning if you have heavy write traffic on large tables since that will keep overhead low and make it complete faster.
The only reason I’d consider MariaDB, which I’m surprised I don’t see mentioned currently, is its bitemporal features. There are whole worlds of problems solved that are almost universally badly done in schema/business logic instead.
Granted I haven’t had to make a decision like this for several years, I’ve hardly even touched a database except to debug some stuff on localhost that’s outside of my explicit purview. So maybe Postgres solutions have narrowed the gap on this without my knowing it.
I would be wary putting too much stock in a seven year old post on the topic; lots of stuff has changed. Specifically, at least one person claimed that a patch greatly improved PostgreSQL for this use case: https://news.ycombinator.com/item?id=26285452
If you say what you’re trying to actually achieve I can help with a solution, but asking if it supports an arbitrary feature is not going to get the answer you want because depending on what you’re actually using an archive table for, Postgres might have something already built in but it will almost assuredly not be exactly like an archive table storage type.
PostgreSQL every time, unless you have a specific reason, or as already pointed out, you're sure you don't just need SQLite.
PSQL in my experience has vastly better tooling, community, and is ahead of the curve tech wise. Same with extensions availability. Or perhaps you need to move away from it to say CockroachDB, or similar which is much easier.
Whichever one you start out with, you will be annoyed if you switch to the other one 5 years later. I started out with mysql, and when I started working on a postgres project, I was shocked at some of the ways it was lacking (how am I supposed to store email addresses in a database without collations?).
But when postgres folks grouse about stuff in mysql, I'm usually nodding along and saying "yeah, that would be nice".
They're both great options. If anybody on your team is already an expert at one of them, use that one.
> when I started working on a postgres project, I was shocked at some of the ways it was lacking (how am I supposed to store email addresses in a database without collations?)
> how am I supposed to store email addresses in a database without collations?
Not familiar with MySQL so trying to look that up, but with a constraint? Or just don't do that? - SO answer I found says 'it's much more useful to have johndoe@ and JohnDoe@ treated as the same than it is to support case sensitive email addresses'.. ok, it's also incompliant, but whatever's 'more useful’ I guess!
Having used both in production, I agree with the above. It's not going to make or break your business or project.
I will also add that their are giant companies relying on both databases with great success. Facebook still runs on MySQL, and contribute back to it. Youtube I'm not sure about, but it did run on MySQL for a long time, well after it got massive. I'm sure examples exist for Postgres (Amazonm since they moved off Oracle?)
Use MySQL if you're expecting to have to do large operational database tasks re: migrations, maintenances, with no ability to go offline. gh-ost, percona-osc, the new INSTANT DDL stuff, is all quite far ahead in MySQL-land. Additionally, Vitess and Planetscale are making huge strides in MySQL performance. There are more people and guides in the world to help recover even the most mutilated of MySQL databases. MySQL gets more love in extremely large enterprise-level organizations, and it shows.
Use Postgres if you need some of the quite-good extensions, most notably PostGIS, or if you just want things to work; most documentation will be postgres flavored. Postgres gets more love from web-developers, and it shows.
With a sane migration tool like pt-osc or gh-ost you can absolutely abort in the middle. What's more, you can pause in the middle or even slow down in the middle based on arbitrary logic (ie, pause migration if replication delay rises above a certain value). Postgres is nice and transactional DDL has its place but postgres stopped halfway through IMO. Vanilla Postgres > vanilla MySQL, but the migration story of MySQL + tooling is so far beyond Postgres + tooling that it's not even funny.
That said, if you don't expect to have tables with 100m+ rows, even vanilla postgres will be good enough.
Having answered this a ton over the years, don't want to really take shots at MySQL. But Postgres stands in pretty unique ground.
1. It's solid as a really reach data platform (more than just a relational database). It's extension framework is quite unique compared to others. It's JSONB support was the first among other relational databases and is feature rich and performant. Multiple index types. Transactional DDL. The list goes on.
2. No central owner. A lot of open source is source code is open, but it's maintained by a central company.
3. I mentioned extensions, but really that is understated. It can do really advanced geospatial, full text search, time series, the list goes on.
> let me point out something that I've been saying both internally and externally for the last five years (although never on a stage—which explains why I've been staying away from stages talking about MySQL): MySQL is a pretty poor database, and you should strongly consider using Postgres instead.
I think many of us can’t be bothered to go over (again) the issues we’ve had with MySQL in the past. The last straw for me was about ten years ago, when I caught MySQL merrily making up nonsense results for a query I’d issued that accidentally didn’t make any sense.
Very likely this particular issue, and others like it, have been fixed in the meantime. But I just got the sense that MySQL was developed by people who didn’t quite know what they were doing, and that people who really did know what they were doing weren’t ever likely to be attracted to that to fix it.
Here is an 18-year-old bug, that DELETE triggers don't work for foreign key cascades: https://bugs.mysql.com/bug.php?id=11472
That makes the entire feature mostly worthless. Reported in 2005, last updated in 2008.
---
While I would choose PostgreSQL every time, MySQL has the following advantages:
1. Write-performance, due to fundamental design tradeoffs. [1]
2. Per-connection resources, due to single-process design.
3. Related to #1, no vacuum requirement.
[1] https://www.uber.com/blog/postgres-to-mysql-migration/
I'm sure there are use cases where MySQL will be the better choice over postgres, but the future for the stack looks bleak.
I wonder how much Oracle spends on MySQL every year? They're spending a lot of money to keep MySQL at kind of a "not quite good enough" state. But they can't kill it outright - it'd be like boiling a frog fast instead of slow.
In the end, I wonder what extinguishing MySQL really accomplished for them. It might have bought them some breathing room but Postgres quickly filled MySQL's old segment.
I'm not sure if there has been a recent breakthrough that has changed that. I think that still applies today. Correct me if I'm wrong.
see, I'm pretty sure there basically weren't. It lucked out at the right moment in the late 1990s. Also, Slashdot used it.
The only use case I can think of is when you want an application, and it requires or is highly optimised to MySQL. Otherwise, it should actively be avoided.
Not that there aren't reasons. There are some. But for starting out with a new app without a very, very good reason to do something different? PostgreSQL every day of the week.
Just as example - how do you create read-only user (SELECT only) in Postgres? In MySQL it's extremely simple and it works, while in Postgres it's a nightmare to create and maintain
Isn’t that
?> and maintain
If you mean you want to grant a user select rights to whatever table gets created in the future (a somewhat questionable idea from a security viewpoint):
I think both are possible in PostgreSQL 9 and later (https://www.postgresql.org/docs/9.0/sql-grant.html , https://www.postgresql.org/docs/9.0/sql-alterdefaultprivileg...)That version is from 2010.
I guess the lesson is that both these systems evolve fairly rapidly. You can’t use quirks you remember from over 5 years ago to judge their current versions.
But I’m like you, MySQL did some nonsense once that took me hours to work out. So now I really can’t be bothered with any potential quirks it may still have. This is not an SNL sketch.
Dead Comment
- It's less featureful, and I'd consider that a strong virtue in the YAGNI camp - less to go wrong, less mental overhead.
- Maintenance is simpler and far less necessary in my general experience.
- Replication is simpler and more reliable.
- You can tell the query optimizer what to do. When this is needed, you'll be thankful. It's a godsend.
That said, I wouldn't run Oracle MySQL. I opt for MariaDB on smaller projects and AWS Aurora MySQL for larger projects. Aurora scales insanely well, and replication lag is almost non-existent.
In my general experience MySQL was always significantly faster but it's been a number of years since I've worked with Postgres and the comments here seem to indicate that that may no longer be the case. YMMV
This doesn't really hold water in my opinion.
It's not like PostgreSQL is some minefield of misfeatures and quirky behavior. Some of these features exist, but have zero impact on you unless you actually opt to use them. But if you end up needing to: they're there, and you can just start using them.
Compare this to MySQL where they simply don't exist no matter how much you may need them. Need to apply an index to the result of a function to quickly fix a performance issue in prod? Sorry, you can't. Need window functions to accurately compute some analytics in a sane period of time? Sorry, you can't. The list of things you can do in PostgreSQL that you simply can't with MySQL is massive and grows every day.
The odds that you'll want, need, or greatly benefit at least one of these features is not small. Having the flexibility of knowing these features exist should you ever have a use-case for them is massive.
Need to apply an index to the result of a function? No problem, use a functional index, supported since October 2018: https://dev.mysql.com/doc/refman/8.0/en/create-index.html#cr...
Need to use a window function? No problem, supported since April 2018: https://dev.mysql.com/doc/refman/8.0/en/window-functions.htm...
I do remember getting bad tables with myisam tables a decade ago, sometimes after a bad shutdown.
Deleted Comment
Not saying it can't happen, but I don't think it's a common occurrence.
- for some reason it hang periodically and had to be restarted during the night for no apparent reason
- compared to Postgres, the tooling is garbage (both for backups and even more for general database administration)
- essential features are missing, the most important one of which, for me, was proper CSV import/export. CSV-related functionality is so broken and terribly inconvenient to use. In a specific case I had to write a program to export millions of records manually since MySQL could not generate correct CSV export due to some columns containing text with special characters, quotes, newlines. Any combination of the export parameters ("ENCLOSED BY", "ESCAPED BY" and all the other garbage options) failed in one way or another. I even tried to use non-standard characters like \x15 and \r for column and row separation but even that failed. With Postgres, "with csv header" is simple and works every time.
I also managed bigger Postgres databases (up to tens of terabytes) and never had the issues I encountered with MySQL.
Imagine when you actually need any of the features that PostgreSQL provides like pub/sub, logical replication, JSONB etc. With MySQL you might have to hack a solution that is much more complex or you have to set up an entirely separate tool. What I find nice with PostgreSQL is that for simple cases you can get away without a dedicated key/value store or a queue or a full text search engine. You can do a lot of these kinds of tasks with just a single database.
MySQL has offered logical replication for considerably longer than Postgres, and it's a substantially more configurable and mature implementation. MySQL's built-in replication has always been logical replication. It's a feature MySQL has had for 23 years -- built-in logical replication is literally one of the top reasons why all the biggest MySQL users chose MySQL originally!
And that one case I remember was perfectly solvable the regular way, with a little more time.
Do you know how to fix the table statistics quickly? Do you know how to change that query to force the execution plan you want? Do you know how long the solution will last until the stats change again?
MySQL is a bit more predictable for this case and if things go really bad for some unexpected reason, one comment can fix it.
I'm looking at it from ops perspective. The ratio during development doesn't matter that much - all issues are solvable at that stage. For me it's rather "which situation would I rather be in at 3am".
also, I assume you mean MariaDB as MySQL is owned by Oracle and I would greatly implore anyone and everyone to avoid Oracle as if it has herpes.
There are a lot of historic problems with MySQL accepting invalid data, committing data even when there are constraint issues, and having very poor transactional isolation, I am not sure if these have improved.
Truthfully, the only benefits you gain from using MariaDB or MySQL are:
* Memory tables
* Having inconsistent replicas (which can be useful when you want your downstream to have less data than your upstream and you know it won’t get updated.)
herpes isn't that bad. most people will get it in their lifetime. 1 in 6 people have hsv-2, the less common variant. trying to avoid herpes is like trying to avoid chickenpox (although herpes isn't nearly as harmful as chickenpox).
you should avoid Oracle like it's a blood pathogen.
Deleted Comment
The worst part about having it is having to talk about having it. It's really not bad as a condition separate from societal concern.
Citation needed.
This is a thread comparing MySQL and Postgres and your claim is that postgres is harder to do major version upgrades than anything you have used??
Context is important here, have you honestly actually upgraded a MySQL node? It’s a lesson in pain and “major” version changes happen on minor versions, like the entire query planner completely trashing performance in 5.6->5.7
Postgres has two forms of updates:
1) in place binary upgrade.
Fast, clean, simple, requires that you have the binaries for the old and the new database.
2) dump/restore.
Serialise the database into text files, load a new database and deserialise those files into it.
Slow, but works flawlessly & consistently with relatively low danger.
MySQL can only do option 2.
You can sort of fake an “update” by abusing the fact that MYSQLs replication offers no guarantees, so you can make a new server a replica; then roll over. But it is impossible to know what data was lost in that transition and MySQL will happily continue without ever telling you.
I have experienced this behaviour in large e-commerce retailers. MySQL was very popular for a very long time and I am intimately aware of operational best practices and how they are merely patching over an insane system.
Mssql has had this for ever, and I’m pretty sure MySQL has it as well.
My gut tells me that I would do it in the query itself though, and not rely on the collation. Maybe I am misunderstanding.
Galera is the main one I can think of:
* https://galeracluster.com/library/documentation/tech-desc-in...
* https://mariadb.com/kb/en/what-is-mariadb-galera-cluster/
* https://packages.debian.org/search?keywords=galera
I'm not aware of any multi-master, active-active(-active) replication system that is open source for PostgreSQL.
In the Postgres space there is citusdb which provides multi master.
There is also BDR from 2ndquadrant if you want a paid/supported solution. https://www.enterprisedb.com/products/edb-postgres-distribut...
Sneak peek photo [1] (from [2]). Just imagine its literally 500-1000x more convoluted per B-tree leaf node. That's every Postgres table unless you CLUSTER periodically.
[1]: https://josipmisko.com/img/clustered-vs-nonclustered-index.w...
[2]: https://josipmisko.com/posts/clustered-vs-non-clustered-inde...
Mind boggling how many people aren't aware of primary indexes in MySQL that is not supported at all in Postgres. For certain data layouts, Postgres pays either 2x storage (covering index containing every single column), >50x worse performance by effectively N+1 bombing the disk for range queries, or blocking your table periodically (CLUSTER).
In Postgres the messiness loading primary data after reaching the B-tree leaf nodes pollutes caches and takes longer. This is because you need to load one 8kb page for every row you want, instead of one 8kb with 20-30 rows packed together.
Example: Dropbox file history table. They initially used autoinc id for primary key in MySQL. This causes everybodys file changes to be mixed together in chronological order on disk in a B-Tree. The first optimization they made was to change the primary key to (ns_id, latest, id) so that each users (ns_id) latest versions would be grouped together on disk.
Dropbox scaling talk: https://youtu.be/PE4gwstWhmc?t=2770
If a dropbox user has 1000 files and you can fit 20 file-version rows on each 8kb disk page (400bytes/row), the difference in performance for querying across those 1000 files is 20 + logN disk reads (MySQL) vs 1000 + logN disk reads (Postgres). AKA 400KiB data loaded (MySQL) vs 8.42MiB loaded (Postgres). AKA >50x improvement in query time and disk page cache utilization.
In Postgres you get two bad options for doing this: 1) Put every row of the table in the index making it a covering index, and paying to store all data twice (index and PG heap). No way to disable the heap primary storage. 2) Take your DB offline every day and CLUSTER the table.
Realistically, PG users pay that 50x cost without thinking about it. Any time you query a list of items in PG even using an index, you're N+1 querying against your disk and polluting your cache.
This is why MySQL is faster than Postgres most of the time. Hopefully more people become aware of disk data layout and how it affects query performance.
There is a hack for Postgres where you store data in an array within the row. This puts the data contiguously on disk. It works pretty well, sometimes, but it’s hacky. This strategy is part of the Timescale origin story.
Open to db perf consulting. email is in my profile.
I will say: we kept every table we didn’t have to migrate for perf reasons in Postgres, and never regretted it.
Edit: and the index “fix” for Postgres doesn’t work often. Postgres will validate the row on disk, even if it’s in the index, if the page’s visibility map isn’t set. If you data isn’t write once, there’s a decent chance your page is dirty and it will still make the “heap” fetch.
This is not the same as "having a primary key", Postgres also has primary keys. It just stores the PK index separately from the bulk of the data.
Oracle also has primary keys, even if the order of the rows is different to the key order. In Oracle, when the rows are stored in the same order as the keys in the primary index, it is a special case and these tables are called IOT, index ordered tables.
The disadvantages of IOT are that inserts are slower, because in a normal table, the data is appended to the table, which is the fastest way to add data, and only the index needs to be reordered. In an IOT, the entire table storage is reordered to take the new data into account.
Select queries, OTOH, are much faster when using IOT, for obvious reasons, and this is what you describe in your comment.
If you use TEXT, BLOB, or JSON fields, even in MySQL, the actual data is stored separately.
Because you can only have a single clustered index, you're effectively paying for efficient range queries on a single clustering key by making all other queries slower.
This tradeoff may or may not be worth it depending on your query patterns. In my experience, you can often get away with adding some subset of columns to a non-clustered index to make it covering, and get efficient range queries without making a copy of the entire dataset.
And even with clustered indexes, as soon as you want a range query that's not supported by your clustered index, you're faced with the exact same choices, except that you have to pay the cost of the extra B-tree traversals.
Postgres also has other gotchas with indexes - MVCC row visibility isn't stored in the index for obvious performance reasons (writes to non-indexed columns would mean always updating all indexes instead of HOT updates [1]) so you have to hope the version information is cached in the visibility map or else don't really get the benefit of index only scans.
But OTOH, I've read that secondary indexes cause other performance penalties with having to refer back to the data in clustered indexes? Never looked into the details because no need to for postgres which we've been very happy with at our scale :)
[1] https://www.postgresql.org/docs/current/storage-hot.html
Granted I haven’t had to make a decision like this for several years, I’ve hardly even touched a database except to debug some stuff on localhost that’s outside of my explicit purview. So maybe Postgres solutions have narrowed the gap on this without my knowing it.
it is the only official, in-binary replication mechanism.
https://www.uber.com/blog/postgres-to-mysql-migration/
PostgreSQL has memory tables too.
PSQL in my experience has vastly better tooling, community, and is ahead of the curve tech wise. Same with extensions availability. Or perhaps you need to move away from it to say CockroachDB, or similar which is much easier.
https://www.postgresql.org/docs/15/app-psql.html
Whichever one you start out with, you will be annoyed if you switch to the other one 5 years later. I started out with mysql, and when I started working on a postgres project, I was shocked at some of the ways it was lacking (how am I supposed to store email addresses in a database without collations?).
But when postgres folks grouse about stuff in mysql, I'm usually nodding along and saying "yeah, that would be nice".
They're both great options. If anybody on your team is already an expert at one of them, use that one.
How long ago was this? :)
Not familiar with MySQL so trying to look that up, but with a constraint? Or just don't do that? - SO answer I found says 'it's much more useful to have johndoe@ and JohnDoe@ treated as the same than it is to support case sensitive email addresses'.. ok, it's also incompliant, but whatever's 'more useful’ I guess!
I will also add that their are giant companies relying on both databases with great success. Facebook still runs on MySQL, and contribute back to it. Youtube I'm not sure about, but it did run on MySQL for a long time, well after it got massive. I'm sure examples exist for Postgres (Amazonm since they moved off Oracle?)
Use Postgres if you need some of the quite-good extensions, most notably PostGIS, or if you just want things to work; most documentation will be postgres flavored. Postgres gets more love from web-developers, and it shows.
Always use postgresql. It's more logical, more extensible, saner, supports many extensions and is more predictable.
MySQL is inconsistent crap, that trades away consistency, correctness and stability for a little bit of performance in standard use cases.
Do yourself a favor and always use postgreSQL. I switched 15 years ago and never looked back. Have done 15-50 projects since in psql.
That said, if you don't expect to have tables with 100m+ rows, even vanilla postgres will be good enough.
Deleted Comment
1. It's solid as a really reach data platform (more than just a relational database). It's extension framework is quite unique compared to others. It's JSONB support was the first among other relational databases and is feature rich and performant. Multiple index types. Transactional DDL. The list goes on.
2. No central owner. A lot of open source is source code is open, but it's maintained by a central company.
3. I mentioned extensions, but really that is understated. It can do really advanced geospatial, full text search, time series, the list goes on.
Having explained this a ton of times first 10 years ago - https://www.craigkerstiens.com/2012/04/30/why-postgres/ and then again 5 years later with and updated version, most recently tried to capture more of this in an updated form on the Crunchy Data blog - https://www.crunchydata.com/why-postgres
From a former MySQL developer:
> let me point out something that I've been saying both internally and externally for the last five years (although never on a stage—which explains why I've been staying away from stages talking about MySQL): MySQL is a pretty poor database, and you should strongly consider using Postgres instead.
Deleted Comment