At one point the change was made from 6 to 3 bytes, I expect this made the on disk structure incompatible with prior versions. why were they able to make this change but unable to make a later change say from 3 to 4 bytes.
I am not as familiar with mysql but note that postgres major versions are not disk compatible with each other(you have to dump and restore). I expect this is the exact reason. you want a system where it is possible to make changes.
see also: the drama around linux abi changes vs openbsd abi changes.
To meme off of raul julia: openbsd to linux: "For you, the day your abi changed was the most important day of your life. But for us, it was Tuesday."
It's actually quite hard to fix bugs in charset/collations, because any changes to the sort order implicitly could affect the on-disk format (indexes are sorted).
(It makes use of skeema, which allows you to track your schema in a declarative way vs. ALTER TABLE statements.)
The article is MariaDB specific.
Developers who are specifically targeting systems with limited memory will try to produce small binaries. If you're talking about a distribution like Ubuntu, though, it's simply not a concern. At all. Applications are built to do whatever it is that they need to do, and whatever size they end up being is how big they are, almost without exception.
The reason CLI binaries are small is that ALL binaries are small. They are compiled, and any resources they need are stored externally in other files. They use shared libraries, making the code even smaller through re-use.
I have 1785 programs in /usr/bin on my Ubuntu server, and all but 10 of them are under 5M. The ones that are larger are only big for unusual reasons (e.g. mysql_embedded).
I'm not sure what you're referring to when you talk about usability. Are you saying that my 1.1MB nginx lacks some utility? And that it lacks it because someone was worried about the size of the binary? That's simply false to the point of being nonsensical.
One of the biggest binaries I use is the Postgres server, at a hefty 6.5MB. Is Postgres missing features that affect its usability?
I was the product manager, and we did have complaints about the size. So it was removed:
https://mysqlserverteam.com/mysql-8-0-retiring-support-for-l...
I don't believe there have been any regrets.
Redefining `utf8` to mean 4-byte would break the upgrade since existing tables would not be able to join against newly created tables.
This is discussed here: https://mysqlserverteam.com/sushi-beer-an-introduction-of-ut...
https://mysqlserverteam.com/new-defaults-in-mysql-8-0/
https://dev.mysql.com/doc/refman/5.7/en/added-deprecated-rem...
https://dev.mysql.com/doc/refman/5.6/en/server-default-chang...
One detail that is not always obvious is how much work goes into limiting regressions. The work to switch to utf8mb4 really started in MySQL 5.6 by not allocating the sort buffer in full (and then further improved in 5.7). 8.0 then added a new temptable storage engine for variable length temp tables.
These are not small cases either: When you compare to latin1 because the _profile_ of queries could change from all in memory to on disk, we could be talking about 10x regressions. In MySQL 8.0 it is more like 11% https://www.percona.com/blog/2019/02/27/charset-and-collatio...
Edit: Also forgot to mention, switching the default character set broke over 600 tests. It's not as easy as it sounds!
Furthermore, column-stores are so different than row-stores in their ideal implementation that they are effectively completely different systems with the same outer interface (SQL). Daniel Abadi wrote a fantastic paper [1] in which he showed that fast column stores must use completely different strategies at every layer of query processing.
You should be very skeptical of any database claiming to be good at both transaction processing and analytical workloads. Such as system would effectively be two different databases hiding behind the same SQL interface, and would not be as useful as you might expect, for devops reasons.
I agree that column and row store have very different characteristics, but what I think is worth mentioning is that some hybrid solutions actually store as both row and columnar and have a query optimizer that can pick between them. For example: Oracle DB In-Memory, SQL Server Columnstore index.
At the same event as this announcement, we also announced that we are working on TiFlash which will do similar. Stay tuned for a blog post with more details :-)
However one of the perks of being a male is that society only ever expects me to take a shower to be presentable. So I'm totally cool if a colleague wants to leave the camera off even if it's a 1 on 1.