Upgrading Uber's MySQL Fleet

Impressive numbers at a glance but that boils down to ~140qps which is between one and two orders of magnitude below what you'd expect a normal MySQL node typically would serve. Obviously average execution time is mostly a function of the complexity of the query but based on Uber's business I can't really see what sort of non-normative queries they'd run at volume (e.g. for their customer facing apps). Uber's infra runs on Amazon AWS afaik and even taking some level of volume discount into account they're burning many millions of USD on some combination of overcapacity or suboptimal querying/caching strategies.

aseipp · a year ago

Dividing the fleet QPS by the number of nodes is completely meaningless because it assumes that queries are distributed evenly across every part of the system and that every part of the system is uniform (e.g. it is unclear what the read/write patterns are, proportion of these nodes are read replicas or hot standbys, if their sizing and configuration are the same). That isn't realistic at all. I would guess it is extremely likely that hot subsets of these clusters, depending on the use case, see anywhere from 1 to 4 orders of magnitude higher QPS than your guess, probably on a near constant basis.

Don't get me wrong, a lot of people have talked about Uber doing overengineering in weird ways, maybe they're even completely right. But being like "Well, obviously x/y = z, and z is rather small, therefore it's not impressive, isn't this obvious?" is the computer programming equivalent of the "econ 101 student says supply and demand explain everything" phenomenon. It's not an accurate characterization of the system at all and falls prey to the very thing you're alluding to ("this is obvious.")

0cf8612b2e1e · a year ago

Simple enough just to think about localities and time of day. New York during Tuesday rush hour could be more load than all of North Dakota sees in a month. Even busy cities probably drop down to nothing on a weekday at 3am.

Twirrim · a year ago

They're not on AWS. They use on-prem and are migrating to Google and Oracle clouds.

https://www.forbes.com/sites/danielnewman/2023/02/21/uber-go...

Deleted Comment

Jgrubb · a year ago

See, the problem is that the people who care about cost performance and the people who care about UX performance are rarely the same people, and often neither side is empowered with the data or experience they need to bridge the gap.

bushbaba · a year ago

Hardware is cheap relative to salaries. It might take 1 engineer 1 quarter to optimize. Compare that to a few thousand per server.

nunez · a year ago

Didn't realize their entire MySQL data layer runs in AWS. Given that they went with basically a blue-green update strategy, this was, essentially a "witness our cloud spend" kind of post.

pocket_cheese · a year ago

They're not. Almost all of their infra was on prem when I worked there 3 years ago.

It's sort of funny how can you immediately tell it's LLM sanitized/rewritten.

jdbdndj · a year ago

It reads like any of those tech blogs, using big words where not strictly necessary but also not wrong

Don't know about your LLM feeling

est31 · a year ago

It contains the word "delve", a word that got way more popular in use since the introduction of LLMs.

Also this paragraph sounds a lot like it has been written by LLMs, it's over-expressive:

    We systematically advanced through each tier, commencing from tier 5 and descending to tier 0. At every tier, we organized the clusters into manageable batches, ensuring a systematic and controlled transition process. Before embarking on each stage of the version upgrade, we actively involved the on-call teams responsible for each cluster, fostering collaboration and ensuring comprehensive oversight.

The paragraph uses "commencing from" together with "descending to". People would probably write something like "starting with". It shows how the LLM has no spatial understanding: tier 0 is not below or above tier 5, especially as the text has not introduced any such spatial ordering previously. And it gets worse: there is no prior mention of the word "tier" in the blog post. The earlier text speaks of stages, and lists 5 steps (without giving them any name, but the standard term is more like "step" instead of "tier").

There is more signs like "embark", or that specific use of "fostering collaboration" which goes beyond corporate-speak, it also sounds a lot like what an LLM would say. Apparently "safeguard" is also a word LLMs write very often.

maeil · a year ago

This [1] is a good piece on it. Here's [2] anorher good one.

We don't just carry out a MySQL upgrade, oh no. We embark on a significant journey. We don't have reasons, but compelling factors. And then, we use compelling again soon after when describing how "MySQL v8.0 offered a compelling proposition with its promise of substantial performance enhancements", just as any human meatbag would.

[1] https://www.latimes.com/socal/daily-pilot/opinion/story/2024...

[2] https://english.elpais.com/science-tech/2024-04-25/excessive...

remon · a year ago

Nah this isn't a big word salad issue. The content is fine. It's just clearly a text written by humans and then rewritten by an LLM, potentially due to the original author(s) not being native speakers. If you feel it's natural English that's fine too ;)

exe34 · a year ago

I always thought 90% of what management wrote/said could be replaced by a RNN, and nowadays LLMs do even better!

1f60c · a year ago

I got that feeling as well. In addition, I suspect it was originally written for an internal audience and adapted for the 'blog because the references to SLOs and SLAs don't really make sense in the context of external Uber customers.

aprilthird2021 · a year ago

Let's delve into why you think that

fs0c13ty00 · a year ago

It's simple. Human writing is short and to the point (either because they're lazy or want to save the reader's time), yet still manages to capture your attention. AI writing tends to be too elaborate and lacks a sense of "self".

I feel like this article challenges my patience and attention too much, there is really no need to focus on the pros of upgrading here. We reader just want to know how they managed to upgrade at that large scale, challenges they faced and how the solved them. Not to mention any sane tech writers that value their time wouldn't write this much.

blackenedgem · a year ago

I'm enjoying the replys to this not getting that it's a joke

Starlevel004 · a year ago

every section is just a list in disguise, and gpts LOVE listts

l5870uoo9y · a year ago

AI has a preference for dividing everything into sections, especially "Introduction" and "Conclusion" sections.

anitil · a year ago

It was hard to read in places because of that, I need to work out a reverse prompt to make it clearer

msoad · a year ago

Yeah, I kinda stopped reading when I felt this. Not sure why? The substance is still interesting and worth learning from but knowing LLM wrote it made me feel icky a little bit

greenavocado · a year ago

Scroll to the bottom to see a list of those who claimed to have authored it

bronzekaiser · a year ago

Scroll to the bottom and look at the authors Its immediately obvious

karthikmurkonda · a year ago

I don't get it. Why is it so?

cheema33 · a year ago

> it's LLM sanitized/rewritten

LLM is the new spellchecker. Soon we'll we will wonder why some people don't use it to sanity check blog posts or any other writing.

And let's be honest, some writings would greatly benefit from a sanity check.

lawrjone · a year ago

Yeah I found this really off putting: it’s not possible for you to have several goals that are all ‘paramount’, and the word ‘seamless’ adds nothing in every place it appears!

I wish it didn’t turn me off the content as much as it does but it’s very jarring.

Deleted Comment