Ask HN: It's 2023, how do you choose between MySQL and Postgres?

Unpopular opinion on HN apparently, but MySQL

- It's less featureful, and I'd consider that a strong virtue in the YAGNI camp - less to go wrong, less mental overhead.

- Maintenance is simpler and far less necessary in my general experience.

- Replication is simpler and more reliable.

- You can tell the query optimizer what to do. When this is needed, you'll be thankful. It's a godsend.

That said, I wouldn't run Oracle MySQL. I opt for MariaDB on smaller projects and AWS Aurora MySQL for larger projects. Aurora scales insanely well, and replication lag is almost non-existent.

In my general experience MySQL was always significantly faster but it's been a number of years since I've worked with Postgres and the comments here seem to indicate that that may no longer be the case. YMMV

stouset · 3 years ago

> It's less featureful, and I'd consider that a strong virtue in the YAGNI camp - less to go wrong, less mental overhead.

This doesn't really hold water in my opinion.

It's not like PostgreSQL is some minefield of misfeatures and quirky behavior. Some of these features exist, but have zero impact on you unless you actually opt to use them. But if you end up needing to: they're there, and you can just start using them.

Compare this to MySQL where they simply don't exist no matter how much you may need them. Need to apply an index to the result of a function to quickly fix a performance issue in prod? Sorry, you can't. Need window functions to accurately compute some analytics in a sane period of time? Sorry, you can't. The list of things you can do in PostgreSQL that you simply can't with MySQL is massive and grows every day.

The odds that you'll want, need, or greatly benefit at least one of these features is not small. Having the flexibility of knowing these features exist should you ever have a use-case for them is massive.

evanelias · 3 years ago

Your examples regarding MySQL's features are not correct.

Need to apply an index to the result of a function? No problem, use a functional index, supported since October 2018: https://dev.mysql.com/doc/refman/8.0/en/create-index.html#cr...

Need to use a window function? No problem, supported since April 2018: https://dev.mysql.com/doc/refman/8.0/en/window-functions.htm...

EamonnMR · 3 years ago

I would disagree on maintenance being simpler. I have never had Postgres randomly munge a table and require me to run a command to fix it.

jjeaff · 3 years ago

I have not had that happen in MySQL either, at least, not with innodb. what command would that be?

I do remember getting bad tables with myisam tables a decade ago, sometimes after a bad shutdown.

Deleted Comment

michaelcampbell · 3 years ago

My needs are meager (simple CRUD, low volume), but I haven't had that happen in MySQL either, in over 15 years of running it in production.

Not saying it can't happen, but I don't think it's a common occurrence.

rocho · 3 years ago

I had the misfortune of inheriting a MySQL 5.6 database and I had to manage it for a year and a half. It wasn't very big (less than 100GB), but:

- for some reason it hang periodically and had to be restarted during the night for no apparent reason

- compared to Postgres, the tooling is garbage (both for backups and even more for general database administration)

- essential features are missing, the most important one of which, for me, was proper CSV import/export. CSV-related functionality is so broken and terribly inconvenient to use. In a specific case I had to write a program to export millions of records manually since MySQL could not generate correct CSV export due to some columns containing text with special characters, quotes, newlines. Any combination of the export parameters ("ENCLOSED BY", "ESCAPED BY" and all the other garbage options) failed in one way or another. I even tried to use non-standard characters like \x15 and \r for column and row separation but even that failed. With Postgres, "with csv header" is simple and works every time.

I also managed bigger Postgres databases (up to tens of terabytes) and never had the issues I encountered with MySQL.

drogus · 3 years ago

> It's less featureful, and I'd consider that a strong virtue in the YAGNI camp - less to go wrong, less mental overhead.

Imagine when you actually need any of the features that PostgreSQL provides like pub/sub, logical replication, JSONB etc. With MySQL you might have to hack a solution that is much more complex or you have to set up an entirely separate tool. What I find nice with PostgreSQL is that for simple cases you can get away without a dedicated key/value store or a queue or a full text search engine. You can do a lot of these kinds of tasks with just a single database.

evanelias · 3 years ago

> features that PostgreSQL provides like [...] logical replication

MySQL has offered logical replication for considerably longer than Postgres, and it's a substantially more configurable and mature implementation. MySQL's built-in replication has always been logical replication. It's a feature MySQL has had for 23 years -- built-in logical replication is literally one of the top reasons why all the biggest MySQL users chose MySQL originally!

PedroBatista · 3 years ago

Correct, but as far as I know MySQL has those mentioned features but the overall management effort seems to be considerably lower.

tonymet · 3 years ago

less features, simpler admin, more compatibility, more familiarity. I agree

droobles · 3 years ago

samesies

pawelduda · 3 years ago

What's the ratio of solving DB perf issues by optimizing it and letting the planner do its work, to telling it what to do? For me it's like 1000:1.

And that one case I remember was perfectly solvable the regular way, with a little more time.

viraptor · 3 years ago

The problem is that when you you need to tell the planner what to do, you can do it in MySQL, but not postgres. Imagine you've got a production database with lots of traffic which suddenly can't handle anything because it inserted an extra row which tipped the balance and now takes seconds to process a common query.

Do you know how to fix the table statistics quickly? Do you know how to change that query to force the execution plan you want? Do you know how long the solution will last until the stats change again?

MySQL is a bit more predictable for this case and if things go really bad for some unexpected reason, one comment can fix it.

I'm looking at it from ops perspective. The ratio during development doesn't matter that much - all issues are solvable at that stage. For me it's rather "which situation would I rather be in at 3am".

jitl · 3 years ago

Postgres query planner suddenly deciding to do something silly in the middle of the night has taken Notion down a few times. It's quite annoying, and it's very frustrating to have no recourse.

There is almost no good reason to choose MySQL over PostgreSQL for any operational reason, I did a deep dive many moons ago (before major improvements in performance to postgres) and people were saying that MySQL was faster. I found that not to be true and the differences have only gained even more favour towards postgres.

also, I assume you mean MariaDB as MySQL is owned by Oracle and I would greatly implore anyone and everyone to avoid Oracle as if it has herpes.

There are a lot of historic problems with MySQL accepting invalid data, committing data even when there are constraint issues, and having very poor transactional isolation, I am not sure if these have improved.

Truthfully, the only benefits you gain from using MariaDB or MySQL are:

* Memory tables

* Having inconsistent replicas (which can be useful when you want your downstream to have less data than your upstream and you know it won’t get updated.)

0xbadcafebee · 3 years ago

> avoid Oracle as if it has herpes

herpes isn't that bad. most people will get it in their lifetime. 1 in 6 people have hsv-2, the less common variant. trying to avoid herpes is like trying to avoid chickenpox (although herpes isn't nearly as harmful as chickenpox).

you should avoid Oracle like it's a blood pathogen.

hamilyon2 · 3 years ago

As a person who has herpes firmly in his nerves, I would say don't underestimate herpes.

soperj · 3 years ago

Chickenpox is actually caused by a herpesvirus. herpes varicella zoster.

Deleted Comment

unethical_ban · 3 years ago

hello, fellow person with herpes! (I assume)

The worst part about having it is having to talk about having it. It's really not bad as a condition separate from societal concern.

za3faran · 3 years ago

> most people will get it in their lifetime

Citation needed.

noodlesUK · 3 years ago

I think that the only reasons to choose MySQL (or Maria) over Postgres for a new project are operational. Postgres is probably the better database in almost all respects, but major version upgrades are much much more of a pain on Postgres than on almost any other system I have ever used. That being said, I would choose Postgres pretty much every time for a new project. The only reason I would use Maria or MySQL would be if I thought I later would want to have something like Vitess, for which I think there isn't really an equivalent for Postgres.

dijit · 3 years ago

> but major version upgrades are much much more of a pain on Postgres than on almost any other system I have ever used.

This is a thread comparing MySQL and Postgres and your claim is that postgres is harder to do major version upgrades than anything you have used??

Context is important here, have you honestly actually upgraded a MySQL node? It’s a lesson in pain and “major” version changes happen on minor versions, like the entire query planner completely trashing performance in 5.6->5.7

Postgres has two forms of updates:

1) in place binary upgrade.

Fast, clean, simple, requires that you have the binaries for the old and the new database.

2) dump/restore.

Serialise the database into text files, load a new database and deserialise those files into it.

Slow, but works flawlessly & consistently with relatively low danger.

MySQL can only do option 2.

You can sort of fake an “update” by abusing the fact that MYSQLs replication offers no guarantees, so you can make a new server a replica; then roll over. But it is impossible to know what data was lost in that transition and MySQL will happily continue without ever telling you.

I have experienced this behaviour in large e-commerce retailers. MySQL was very popular for a very long time and I am intimately aware of operational best practices and how they are merely patching over an insane system.

LammyL · 3 years ago

Is there a good way to do case-insensitive accent-insensitive collations yet in postgresql? It’s been a holdup for using that for some use cases like searching for data, like a person’s name, in pgsql when the casing or accents don’t match perfectly.

Mssql has had this for ever, and I’m pretty sure MySQL has it as well.

dijit · 3 years ago

Maybe this helps: https://stackoverflow.com/posts/11007216/revisions ?

My gut tells me that I would do it in the query itself though, and not rely on the collation. Maybe I am misunderstanding.

EwanToo · 3 years ago

Not really, no, it's doable but not easily

throw0101b · 3 years ago

> There is almost no good reason to choose MySQL over PostgreSQL for any operational reason

Galera is the main one I can think of:

* https://galeracluster.com/library/documentation/tech-desc-in...

* https://mariadb.com/kb/en/what-is-mariadb-galera-cluster/

* https://packages.debian.org/search?keywords=galera

I'm not aware of any multi-master, active-active(-active) replication system that is open source for PostgreSQL.

jaachan · 3 years ago

We ran a Galera cluster for 4 days and it died on the first ALTER TABLE we did. Only option was to reduce it down to a Primary / Replica setup. I really can't recommend anyone using this.

dijit · 3 years ago

Last time I looked at Galera it had a limit at 5TiB of data. Which is fine I guess.

In the Postgres space there is citusdb which provides multi master.

There is also BDR from 2ndquadrant if you want a paid/supported solution. https://www.enterprisedb.com/products/edb-postgres-distribut...

davidgerard · 3 years ago

yeah, that's the use case "I run an application that requires or is highly optimised for MySQL".

avinassh · 3 years ago

how do multi master MySQL handles conflicts?

asdfman123 · 3 years ago

The main problem with herpes is the stigma against it. Don't besmirch it by associating with Oracle.

srcreigh · 3 years ago

Postgres is >50x slower for range queries(example below) and is akin to using array-of-pointers (ie Java) whereas MySQL supports array-of-struct (C). Illustration from Dropbox scaling talk below.

Sneak peek photo [1] (from [2]). Just imagine its literally 500-1000x more convoluted per B-tree leaf node. That's every Postgres table unless you CLUSTER periodically.

[1]: https://josipmisko.com/img/clustered-vs-nonclustered-index.w...

[2]: https://josipmisko.com/posts/clustered-vs-non-clustered-inde...

Mind boggling how many people aren't aware of primary indexes in MySQL that is not supported at all in Postgres. For certain data layouts, Postgres pays either 2x storage (covering index containing every single column), >50x worse performance by effectively N+1 bombing the disk for range queries, or blocking your table periodically (CLUSTER).

In Postgres the messiness loading primary data after reaching the B-tree leaf nodes pollutes caches and takes longer. This is because you need to load one 8kb page for every row you want, instead of one 8kb with 20-30 rows packed together.

Example: Dropbox file history table. They initially used autoinc id for primary key in MySQL. This causes everybodys file changes to be mixed together in chronological order on disk in a B-Tree. The first optimization they made was to change the primary key to (ns_id, latest, id) so that each users (ns_id) latest versions would be grouped together on disk.

Dropbox scaling talk: https://youtu.be/PE4gwstWhmc?t=2770

If a dropbox user has 1000 files and you can fit 20 file-version rows on each 8kb disk page (400bytes/row), the difference in performance for querying across those 1000 files is 20 + logN disk reads (MySQL) vs 1000 + logN disk reads (Postgres). AKA 400KiB data loaded (MySQL) vs 8.42MiB loaded (Postgres). AKA >50x improvement in query time and disk page cache utilization.

In Postgres you get two bad options for doing this: 1) Put every row of the table in the index making it a covering index, and paying to store all data twice (index and PG heap). No way to disable the heap primary storage. 2) Take your DB offline every day and CLUSTER the table.

Realistically, PG users pay that 50x cost without thinking about it. Any time you query a list of items in PG even using an index, you're N+1 querying against your disk and polluting your cache.

This is why MySQL is faster than Postgres most of the time. Hopefully more people become aware of disk data layout and how it affects query performance.

There is a hack for Postgres where you store data in an array within the row. This puts the data contiguously on disk. It works pretty well, sometimes, but it’s hacky. This strategy is part of the Timescale origin story.

Open to db perf consulting. email is in my profile.

scosman · 3 years ago

I made same comment elsewhere before finding this comment. I can vouch for this speed up in the ratio of records per page. It is very real. Only applies for small records where you can pack many rows into a page, and where you can cleanly partition by user/tenant, but that’s common enough.

I will say: we kept every table we didn’t have to migrate for perf reasons in Postgres, and never regretted it.

Edit: and the index “fix” for Postgres doesn’t work often. Postgres will validate the row on disk, even if it’s in the index, if the page’s visibility map isn’t set. If you data isn’t write once, there’s a decent chance your page is dirty and it will still make the “heap” fetch.

Shorel · 3 years ago

You are confusing two concepts here. In InnoDB, the tables are always ordered by the primary key when written to actual disk storage.

This is not the same as "having a primary key", Postgres also has primary keys. It just stores the PK index separately from the bulk of the data.

Oracle also has primary keys, even if the order of the rows is different to the key order. In Oracle, when the rows are stored in the same order as the keys in the primary index, it is a special case and these tables are called IOT, index ordered tables.

The disadvantages of IOT are that inserts are slower, because in a normal table, the data is appended to the table, which is the fastest way to add data, and only the index needs to be reordered. In an IOT, the entire table storage is reordered to take the new data into account.

Select queries, OTOH, are much faster when using IOT, for obvious reasons, and this is what you describe in your comment.

If you use TEXT, BLOB, or JSON fields, even in MySQL, the actual data is stored separately.

boloust · 3 years ago

When using clustered indexes, one tradeoff is that if a non-clustered index isn't covering for a query, it will need to perform B-tree traversals to find rows in the clustered index. This can significantly increase the amount of (at least logical) IO compared to heap tables, where the non-clustered indexes can refer directly to the row id.

Because you can only have a single clustered index, you're effectively paying for efficient range queries on a single clustering key by making all other queries slower.

This tradeoff may or may not be worth it depending on your query patterns. In my experience, you can often get away with adding some subset of columns to a non-clustered index to make it covering, and get efficient range queries without making a copy of the entire dataset.

And even with clustered indexes, as soon as you want a range query that's not supported by your clustered index, you're faced with the exact same choices, except that you have to pay the cost of the extra B-tree traversals.

ideal_gas · 3 years ago

(Admitting bias: I've only ever worked with postgres in production with update-heavy tables so I've dealt with more of its problems than MySQL's)

Postgres also has other gotchas with indexes - MVCC row visibility isn't stored in the index for obvious performance reasons (writes to non-indexed columns would mean always updating all indexes instead of HOT updates [1]) so you have to hope the version information is cached in the visibility map or else don't really get the benefit of index only scans.

But OTOH, I've read that secondary indexes cause other performance penalties with having to refer back to the data in clustered indexes? Never looked into the details because no need to for postgres which we've been very happy with at our scale :)

[1] https://www.postgresql.org/docs/current/storage-hot.html

frodowtf · 3 years ago

That was a great read in contrast to all the "there's no reason to use mysql" nonsense in this thread

arp242 · 3 years ago

I've definitely run in to these kind of issues and learned about them the hard way, but I found that in PostgreSQL it's quite a bit easier to understand what is actually going on due to better documentation and tooling, and I've found this very valuable. Maybe it's just because I've spent more time with PostgreSQL than MariaDB, but MariaDB has often left me quite a bit more confused (on performance, but also other topics).

singron · 3 years ago

If your pg database improves with CLUSTER, you can use pg_repack instead to achieve the same effect without downtime. Besides reordering the heap, it will also clear out bloat from the heap and indexes. I highly recommend partitioning if you have heavy write traffic on large tables since that will keep overhead low and make it complete faster.

JohnBooty · 3 years ago

Thanks for that informative link. It's rare in these sorts of discussions.

eyelidlessness · 3 years ago

The only reason I’d consider MariaDB, which I’m surprised I don’t see mentioned currently, is its bitemporal features. There are whole worlds of problems solved that are almost universally badly done in schema/business logic instead.

Granted I haven’t had to make a decision like this for several years, I’ve hardly even touched a database except to debug some stuff on localhost that’s outside of my explicit purview. So maybe Postgres solutions have narrowed the gap on this without my knowing it.

otabdeveloper4 · 3 years ago

Does Postgres have binlog replication yet?

dijit · 3 years ago

Yes, for over half a decade at least, but “binlog” is a MySQL term, for postgresql it has the much more apt name: write-ahead log.

it is the only official, in-binary replication mechanism.

paulddraper · 3 years ago

MySQL is certainly faster for writes.

https://www.uber.com/blog/postgres-to-mysql-migration/

arp242 · 3 years ago

I would be wary putting too much stock in a seven year old post on the topic; lots of stuff has changed. Specifically, at least one person claimed that a patch greatly improved PostgreSQL for this use case: https://news.ycombinator.com/item?id=26285452

za3faran · 3 years ago

MySQL is free, regardless of Oracle's ownership.

supergram · 3 years ago

> * Memory tables

PostgreSQL has memory tables too.

pyuser583 · 3 years ago

Does Postgres have an archive mode?

dijit · 3 years ago

If you say what you’re trying to actually achieve I can help with a solution, but asking if it supports an arbitrary feature is not going to get the answer you want because depending on what you’re actually using an archive table for, Postgres might have something already built in but it will almost assuredly not be exactly like an archive table storage type.