Readit News logoReadit News
remon · a year ago
Impressive numbers at a glance but that boils down to ~140qps which is between one and two orders of magnitude below what you'd expect a normal MySQL node typically would serve. Obviously average execution time is mostly a function of the complexity of the query but based on Uber's business I can't really see what sort of non-normative queries they'd run at volume (e.g. for their customer facing apps). Uber's infra runs on Amazon AWS afaik and even taking some level of volume discount into account they're burning many millions of USD on some combination of overcapacity or suboptimal querying/caching strategies.
aseipp · a year ago
Dividing the fleet QPS by the number of nodes is completely meaningless because it assumes that queries are distributed evenly across every part of the system and that every part of the system is uniform (e.g. it is unclear what the read/write patterns are, proportion of these nodes are read replicas or hot standbys, if their sizing and configuration are the same). That isn't realistic at all. I would guess it is extremely likely that hot subsets of these clusters, depending on the use case, see anywhere from 1 to 4 orders of magnitude higher QPS than your guess, probably on a near constant basis.

Don't get me wrong, a lot of people have talked about Uber doing overengineering in weird ways, maybe they're even completely right. But being like "Well, obviously x/y = z, and z is rather small, therefore it's not impressive, isn't this obvious?" is the computer programming equivalent of the "econ 101 student says supply and demand explain everything" phenomenon. It's not an accurate characterization of the system at all and falls prey to the very thing you're alluding to ("this is obvious.")

0cf8612b2e1e · a year ago
Simple enough just to think about localities and time of day. New York during Tuesday rush hour could be more load than all of North Dakota sees in a month. Even busy cities probably drop down to nothing on a weekday at 3am.
Twirrim · a year ago
They're not on AWS. They use on-prem and are migrating to Google and Oracle clouds.

https://www.forbes.com/sites/danielnewman/2023/02/21/uber-go...

Deleted Comment

Jgrubb · a year ago
See, the problem is that the people who care about cost performance and the people who care about UX performance are rarely the same people, and often neither side is empowered with the data or experience they need to bridge the gap.
bushbaba · a year ago
Hardware is cheap relative to salaries. It might take 1 engineer 1 quarter to optimize. Compare that to a few thousand per server.
nunez · a year ago
Didn't realize their entire MySQL data layer runs in AWS. Given that they went with basically a blue-green update strategy, this was, essentially a "witness our cloud spend" kind of post.
pocket_cheese · a year ago
They're not. Almost all of their infra was on prem when I worked there 3 years ago.
remon · a year ago
It's sort of funny how can you immediately tell it's LLM sanitized/rewritten.
jdbdndj · a year ago
It reads like any of those tech blogs, using big words where not strictly necessary but also not wrong

Don't know about your LLM feeling

est31 · a year ago
It contains the word "delve", a word that got way more popular in use since the introduction of LLMs.

Also this paragraph sounds a lot like it has been written by LLMs, it's over-expressive:

    We systematically advanced through each tier, commencing from tier 5 and descending to tier 0. At every tier, we organized the clusters into manageable batches, ensuring a systematic and controlled transition process. Before embarking on each stage of the version upgrade, we actively involved the on-call teams responsible for each cluster, fostering collaboration and ensuring comprehensive oversight.
The paragraph uses "commencing from" together with "descending to". People would probably write something like "starting with". It shows how the LLM has no spatial understanding: tier 0 is not below or above tier 5, especially as the text has not introduced any such spatial ordering previously. And it gets worse: there is no prior mention of the word "tier" in the blog post. The earlier text speaks of stages, and lists 5 steps (without giving them any name, but the standard term is more like "step" instead of "tier").

There is more signs like "embark", or that specific use of "fostering collaboration" which goes beyond corporate-speak, it also sounds a lot like what an LLM would say. Apparently "safeguard" is also a word LLMs write very often.

maeil · a year ago
This [1] is a good piece on it. Here's [2] anorher good one.

We don't just carry out a MySQL upgrade, oh no. We embark on a significant journey. We don't have reasons, but compelling factors. And then, we use compelling again soon after when describing how "MySQL v8.0 offered a compelling proposition with its promise of substantial performance enhancements", just as any human meatbag would.

[1] https://www.latimes.com/socal/daily-pilot/opinion/story/2024...

[2] https://english.elpais.com/science-tech/2024-04-25/excessive...

remon · a year ago
Nah this isn't a big word salad issue. The content is fine. It's just clearly a text written by humans and then rewritten by an LLM, potentially due to the original author(s) not being native speakers. If you feel it's natural English that's fine too ;)
exe34 · a year ago
I always thought 90% of what management wrote/said could be replaced by a RNN, and nowadays LLMs do even better!
1f60c · a year ago
I got that feeling as well. In addition, I suspect it was originally written for an internal audience and adapted for the 'blog because the references to SLOs and SLAs don't really make sense in the context of external Uber customers.
aprilthird2021 · a year ago
Let's delve into why you think that
fs0c13ty00 · a year ago
It's simple. Human writing is short and to the point (either because they're lazy or want to save the reader's time), yet still manages to capture your attention. AI writing tends to be too elaborate and lacks a sense of "self".

I feel like this article challenges my patience and attention too much, there is really no need to focus on the pros of upgrading here. We reader just want to know how they managed to upgrade at that large scale, challenges they faced and how the solved them. Not to mention any sane tech writers that value their time wouldn't write this much.

blackenedgem · a year ago
I'm enjoying the replys to this not getting that it's a joke
Starlevel004 · a year ago
every section is just a list in disguise, and gpts LOVE listts
l5870uoo9y · a year ago
AI has a preference for dividing everything into sections, especially "Introduction" and "Conclusion" sections.
anitil · a year ago
It was hard to read in places because of that, I need to work out a reverse prompt to make it clearer
msoad · a year ago
Yeah, I kinda stopped reading when I felt this. Not sure why? The substance is still interesting and worth learning from but knowing LLM wrote it made me feel icky a little bit
greenavocado · a year ago
Scroll to the bottom to see a list of those who claimed to have authored it
bronzekaiser · a year ago
Scroll to the bottom and look at the authors Its immediately obvious
karthikmurkonda · a year ago
I don't get it. Why is it so?
cheema33 · a year ago
> it's LLM sanitized/rewritten

LLM is the new spellchecker. Soon we'll we will wonder why some people don't use it to sanity check blog posts or any other writing.

And let's be honest, some writings would greatly benefit from a sanity check.

lawrjone · a year ago
Yeah I found this really off putting: it’s not possible for you to have several goals that are all ‘paramount’, and the word ‘seamless’ adds nothing in every place it appears!

I wish it didn’t turn me off the content as much as it does but it’s very jarring.

Deleted Comment

whalesalad · a year ago
So satisfying to do a huge upgrade like this and then see the actual proof in the pudding with all the reduced latencies and query times.
hu3 · a year ago
Yeah some numbers caught my attention like ~94% reduction in overall database lock time.

And to think they never have to worry about VACUUM. Ahh the peace.

InsideOutSanta · a year ago
As somebody who has always used MySQL, but always been told that I should be using Postgres, I'd love to understand what the issues with VACUUM are, and what I should be aware of when potentially switching databases?
tomnipotent · a year ago
MySQL indexes can contain references to rows in the undo log and has a periodic VACUUM-like process to remove those references, though no where near as impactful.
brightball · a year ago
There are always tradeoffs.
anonzzzies · a year ago
Yeah, until vacuum is gone, i'm not touching postgres. So many bad experiences with our use cases over the decades. I guess most people don't have our uses, but i'm thinking Uber does.
m4r1k · a year ago
Uber's collaboration with Percona is pretty neat. The fact that they've scaled their operations without relying on Oracle's support is a testament to the expertise and vision of their SRE and SWE teams. Respect!
tiffanyh · a year ago
Aren't they using Persona in lieu of Oracle.

So it's kind of the same difference, no?

paulryanrogers · a year ago
Word on the street is Oracle contracts are expensive and hard to cancel, like a deal with the devil. Not sure if their MySQL support is any different than Oracle DB itself
jauntywundrkind · a year ago
Having spent a couple months doing a corporate mandated password rotation on our services - a number of which weren't really designed for password rotation - happy to see the dual password thing mentioned.

Being able to load in a new password while the current one is active is where it's at! Trying to coordinate a big bang where everyone flips over at the same time is misery, and I spent a bunch of time updating services to not have to do that! Great enhancement.

I wonder what other datastores have dual (or more) password capabilities?

johannes1234321 · a year ago
I can't answer with an overview on who got such a feature, but "every" system got a different way of doing that: rotating usernames as well. Create a new user with new password.

This isn't 100% equal as ownership (thus permissions with DEFINER) in stored procedures etc. needs some thought, but bad access using outdated username is simpler to trace (as username can be logged etc. contrary to passwords; while MySQL allows for tracing using performance_schema logging incl. user defined connection attributes which may ease finding the "bad" application)

sandGorgon · a year ago
so how does an architecture like "2100 clusters" work. so the write apis will go to a database that contains their data ?

how is this done - like a user would have history, payments, etc. are all of them colocated in one cluster ? (which means the sharding is based on userid) ?

is there then a database router service that routes the db query to the correct database ?

ericbarrett · a year ago
A query for a given item goes to a router*, as you said, that directs it to a given shard which holds the data. I don't know Uber's schema, but usually the data is "denormalized" and you are not doing too many JOINs etc. Probably a caching layer in front as well.

If you think this sounds more like a job for a K/V store than a relational database, well, you'd be right; this is why e.g. Facebook moved to MyRocks. But MySQL/InnoDB does a decent job and gives you features like write guarantees, transactions, and solid replication, with low write latency and no RAFT or similar nondeterministic/geographically limited protocols.

* You can also structure your data so that the shard is encoded in the lookup key so the "routing" is handled locally. Depends on your setup

bob1029 · a year ago
I imagine it works just like any multi-tenant SaaS product wherein you have a database per customer (region/city) with a unified web portal. The primary difference being that this is B2C and the ratio of customers per database is much greater than 1.
candiddevmike · a year ago
Does Uber still use Docstore? I'd imagine having built an effectively custom DB on top of MySQL made this upgrade somewhat inconsequential for most apps.
geitir · a year ago
Yes

Dead Comment

anitil · a year ago
I wonder why they did a large version jump in one shot (v5.7->v8) and didn't do incremental upgrades (v5.7 -> 6.x etc)?

I wonder because the promotion of the secondary v8 node to primary is a breaking change in this path, whereas in an incremental upgrade it might not have been. But I also understand at this sort of scale things might be as easy as that.