Spending $5k to learn how database indexes work

As the author touches on, the main problem here isn't learning about indexes. It's about "infinity scaling" working too well for people who do not understand the consequences.

In no sane version of the world should "not adding a db index" lead to getting a 50x bill at the end of the month without knowing.

I am a strong believer that services that are based on "scale infinitly" really need hard budget controls, and slower-scaling (unless explicitly overidden/allowed, of course).

If I accidently push very non-performant code, I kind of expect my service to get less performant, quickly realize the problem, and fix it. I don't expect a service to seemingly-magically detect my poor code, increase my bill by a couple orders-of-magnitude, and only alert me hours (if not days) later.

ashleyn · 4 years ago

There's no free lunch. Cloud services trade performance woes for budget surprises. This may be preferable in some cases but the tradeoff should be recognised.

danielvaughn · 4 years ago

There's plenty of space in the middle though, no? Bank accounts cut you off if you hit a zero balance, or they can execute your transactions and charge you a fee. Why can't these services implement throttling or even halting if the charges hit a certain ceiling?

nrmitchi · 4 years ago

Disagree; cloud services trade reduced operation work for higher prices. There is nothing inherent to "cloud services" that requires budget surprises.

hodgesrm · 4 years ago

> Cloud services trade performance woes for budget surprises.

I'm not sure why you think this is a trade-off. In general cloud services automate operations. Whether they are faster is unrelated. Many are not--services that use object storage for backing storage can be orders of magnitude slower than equivalent software using nVME SSD.

richardw · 4 years ago

Our internal monitoring alerts for performance anomalies. Quite possible to scale and warn you.

oceanplexian · 4 years ago

> This may be preferable in some cases but the tradeoff should be recognised.

It's not a "tradeoff", it's a product feature.

perl4ever · 4 years ago

>I don't expect a service to seemingly-magically detect my poor code, increase my bill by a couple orders-of-magnitude

When you put it like that, it sounds like an awfully good business to be in.

anglinb · 4 years ago

Haha yep, I was like wait I'm used to getting feedback from the system telling me I messed up and this I barley noticed. PlanetScale has Query Statistics that are really useful for spotting slow queries but don't expose the "rows read" so you can't really tie this view back to billing. I think they're aware of this though and might expose that information.

coding123 · 4 years ago

It can't be like that. I have discussions with vendors sometimes and the first question I ask - if something lapses and we weren't paying attention - you won't cut our service right?

I think too, in most cases, people would rather run over than cut service.

Also how would such a system work? Let's say you sign up for some API and what, set your billing limit to 500 requests per day. Let's say you're now hitting fabulous numbers / signups - but suddenly you start hitting that 500. If that shuts off your signups or what have you, you're typically going to be worse off than if you just pay the overage bill.

I know it sucks, but the first time you pay your overage is probably your last.

Traster · 4 years ago

It's important to think about this in an a la carte design, not one fixed solution for all use cases.

Step 1: You give people the ability to put in soft limits - "Warn me when I hit 500",

Step 2: You also give the ability to put in hard limits "Pull the plug at 10k" (caveat to both these things - you guarantee this at an eventual consistency level, like "Well you hit 500 but by the time our stats updated you were at 600",

Step 3: You introduce rate limits - "We're expecting 500 in a month, warn us if we hit 50 in a day or 10 in an hour".

Step 4: You introduce predictive warnings "Our statistics show you'll hit your monthly limit on the 23rd of the month"

Step 5: You put in predictive limits to allow scaling - "The last 3 months we've seen the following use trend, warn us if we exceed double that trend, cut off if we see 50x that trend"

You might set some of these limits or none of these limits depending how predictable your use case is.

nrmitchi · 4 years ago

> Let's say you're now hitting fabulous numbers / signups - but suddenly you start hitting that 500.

Sure, you get alerted, confirm it's reasonable, and then change your limits. You're also describing how many APIs actually work.

I'll say there is also a difference between going from 500 requests/day to 1000 requests/day, where you might say "this is probably legitimate and I want to run over", and from 500 requests/day to 25k requests/day.

One is mildly inconvenient, and the other is potentially bankrupting.

thrashh · 4 years ago

If you’re expecting something near 500, then why would you set your limit to 500. Set it towards something like 20k at least.

Or obviously if you don’t think this will be a problem, you have control to set it to uncapped.

I don’t understand what argument you have.

heavyset_go · 4 years ago

A billing limit feature is something that's been wanted for years, yet the most that's offered is budget alerts.

cassonmars · 4 years ago

The question that follows would be: how do you know what was intended to be less performant versus optimized on-demand? The intentions can be easily inferred when the query at hand was a simple join, and to no surprise, many cloud database offerings _do_ provide optimization automation (Azure SQL will for example even automatically add obvious indexes if you let it). But what if the query did need to scan all the rows in a join, but was only a one-off, and you didn’t want to pay the continued perf and storage costs of maintaining an index? The cloud provider can’t know that, and even with proactive measures (“make it slower” can’t work because speed is part of the product design, and budget controls can only go so far before it impacts your own customers) there’s only so much that can be done. The choice of infinity scale tools comes with infinity scale costs, and so there’s a responsibility that engineers using these tools need to understand what they’re accepting with that choice.

nrmitchi · 4 years ago

> The question that follows would be: how do you know what was intended to be less performant versus optimized on-demand?

I'm saying that the cloud provider shouldn't try to make assumption either way, and I'm definitely not saying that it should try to manage indexes for you.

If you are typically using X ops/s, and begin using 50X ops/s, the default should not be "this customer probably wants to spend 50x their previous spend". It should maybe scale up some percentage of previous usage, but definitely not into a range that would be considered anomalous.

> The choice of infinity scale tools comes with infinity scale costs, and so there’s a responsibility that engineers using these tools need to understand what they’re accepting with that choice.

Sure, but I have never once seen one of these providers make clear that using them comes with the risk of being charged "infinity money".

ltbarcly3 · 4 years ago

> In no sane version of the world should "not adding a db index" lead to getting a 50x bill at the end of the month without knowing.

Computers do what you tell them to do. If you are totally clueless and don't bother to take even a few minutes to try to understand a system you are using, the results are going to be poor. Thinking any system can overcome total user ignorance is the thing here that isn't sane.

What the person in this article did is like opening all your windows and setting the thermostat to 74 degrees. It will use massive amounts of energy and just keep trying to heat the house 24/7. If someone turns around after doing this and claims there is actually a problem with thermostats not being smart enough because what if someone doesn't know leaving the window open lets cold air in, well, they shouldn't be allowed to touch the thermostat anymore.

jeroenhd · 4 years ago

> Computers do what you tell them to do. If you are totally clueless and don't bother to take even a few minutes to try to understand a system you are using, the results are going to be poor. Thinking any system can overcome total user ignorance is the thing here that isn't sane.

In theory I agree, but this website features something like "how I nearly bankrupted myself with an AWS bill" on the homepage every month or so. People are blissfully unaware about the extreme costs they're paying to the scaling cloud providers that they often don't even need in the first place.

While I don't think services should block extreme spend all together, a monthly/weekly/daily limit would go a long way to prevent these stories. Very few services that abstract away performance costs have a good way to limit expenses. I don't know if that's intentional or if these companies just don't care, but it's infuriating to me.

It's fine to expose the same tool to both someone who doesn't know the difference between indexes and foreign keys and someone who's been building cloud infra for many years, but as a company you should be prepared to respond to your customers' most likely mistakes. This specific case would probably be hard to detect automatically, but so many wasted CPU cycles, kilowatts and forgiven bills could be prevented if someone would just send an email saying "hey, you've been using more than 10x the normal capacity today, everything alright?"

nrmitchi · 4 years ago

This is a lot of victim-blaming in a such a small response.

> If you are totally clueless and don't bother to take even a few minutes to try to understand a system you are using, the results are going to be poor.

Having a hosted system which behaves different than the underlying technology it's modelled on is not immediately clear. The realm of "things you don't know that you don't know" expands drastically with managed services.

> Thinking any system can overcome total user ignorance is the thing here that isn't sane.

It's never been suggested that this is possible. There is a large range of options in between "solve all user error" and "don't hand everyone a loaded foot-gun".

lmilcin · 4 years ago

Why not write a simple service that tracks various stats (like number of users, requests, etc.) as well as billed costs over time?

You could then get various interesting stats in real time as well as some pretty useful alerting.

nrmitchi · 4 years ago

Even with "infinite scale", you should still be monitoring, and be doing some form of budget monitoring.

The difference is that application performance metrics are generally available in near-real time, whereas billing metrics are 1) very platform specific, and 2) generally not even close to real time.

It's hard to react quickly when your platform has effectively transformed near-real-time performance alerts to delayed/rolled-up billing alerts (which would also be much more difficult to use to pinpoint where the underlying issue is)

bufferoverflow · 4 years ago

If you create an inefficient process, you should be responsible for the consequences. Why would you expect some third party to take the responsibility?

If you create a horrible internal combustion engine, your gas station should not bear the costs.

zerd · 4 years ago

If you create an inefficient internal combustion engine, you'd know because you have to go to the gas station every 5 miles. In this case it would be like someone was filling up the the gas without you knowing, and then a few weeks later you get the bill, and then you realize that your engine is inefficient.

thih9 · 4 years ago

In theory yes; in practice it’s very easy to push inefficient code to production by accident, as shown in the article.

nuerow · 4 years ago

> I am a strong believer that services that are based on "scale infinitly" really need hard budget controls, and slower-scaling (unless explicitly overidden/allowed, of course).

+1 on the budget control, but I don't think there are good arguments in favor of slower scaling.

The ability to scale on demand is sold (and bought) based on the expectation that services just meet the workload that's thrown at them without any impact on availability or performance. That's one of the main selling points of managed services, if not the primary selling point.

Arguing in favor of slower scaling implies arguing in favor of downtime. A service that's too slow to scale is a service that requires a human managing it. A managed service that is unable to meet demand fluctuations is a managed service that can't justify the premium that is charged for it.

nrmitchi · 4 years ago

I may have not been as clear as I should have; I'm not necessarily arguging that typical, or expected scaling action should be slowed down. Ie, throttling scaling from X -> 1.5X doesn't really make sense.

A scaling change that would be considered anomalous, and introduces an order-of-magnitude change over historical usage could be scaled more slowly.

> Arguing in favor of slower scaling implies arguing in favor of downtime.

Sure, I guess that in a limited scope, that is what I am saying. I would much rather have a short-term "downtime that requires human intervention" problem, then a long term "Johnny deployed bad code and now the company is bankrupt" problem.

> The ability to scale on demand is sold (and bought) based on the expectation that services just meet the workload that's thrown at them without any impact on availability or performance. That's one of the main selling points of managed services, if not the primary selling point.

I tend to disagree with this. Managed services are often bought on the expectation that they do not require management, deep operational knowledge, and are reliable. There's also often the trade off of upfront costs (either human or capex costs).

Scalability of obviously part of the analysis, but "scability" and "the ability to scale from 1X -> 100X in a couple seconds" are not necessarily the same thing.

still_grokking · 4 years ago

> In no sane version of the world should "not adding a db index" lead to getting a 50x bill at the end of the month without knowing.

Oh, that would be actually quite useful for learning things if the bill would tell you that it got so high because you stupid dump-ass didn't use DB indices properly.

I'm every time shocked how many people using DBs don't know about indices! Those people should pay such a bill once. They would never ever again "forget" about DB indices I guess.

Of course I'm joking to some extend. But only to some extend…

I feel like indexes are a pretty fundamental type of DB knowledge. In fact I'd say it's table stakes knowledge you should have if you're working with them. Further more, knowing that ForeignKeys typically apply an index to that column is also in my head basic knowledge. I'm sorry you got burnt, and congrats on learning a lesson, but you could have gotten the same knowledge by ever googling MySql ForeignKeys and saved yourself a headache.

In fact it's like a big bullet point near the top of the docs page.

"MySQL requires indexes on foreign keys and referenced keys so that foreign key checks can be fast and not require a table scan. In the referencing table, there must be an index where the foreign key columns are listed as the first columns in the same order. Such an index is created on the referencing table automatically if it does not exist. This index might be silently dropped later if you create another index that can be used to enforce the foreign key constraint. index_name, if given, is used as described previously."

I'm not entirely sure why buzz around "developer learns basic knowledge" has this on the front page.

scottlamb · 4 years ago

Good for you. But I think you're being uncharitable by failing to distinguish between "concept I didn't understand" and "thing I forgot to consider until I saw the problem it caused". The title also suggests the former, but I think the author is being a bit humble by underplaying his existing knowledge. Likely he actually did know what indexes are before; if you asked him to detail how MySQL foreign keys work he might have even remembered to say they add an implicit index. But it's super easy to miss that you're depending on a side effect like that until you see the slow query (or, in this case, high bill).

When you're programming, how many compiler errors do you see a day? (For me, easily dozens, likely hundreds.) Do you think each one indicates a serious gap in your knowledge?

Along these lines: imposter syndrome is a common problem in our industry. One way it can manifest is junior engineers can thinking they're bad programmers when they repeatedly see walls of compiler errors. I think it'd help a lot to show them a ballpark of how often senior engineers see the same thing. [1] I know that when I'm actively writing new code (especially in languages that deliberately produce errors at compile time rather than runtime), I see dozens and dozens of errors during a normal day. I don't think this is a sign I'm a bad programmer. I think it just means I'm moving fast and trusting the compiler to point out the problems it can find rather than wasting time and headspace on finding them myself. I pay more attention to potential errors that I know won't get caught automatically and particularly to ones that can have serious consequences.

I think the most important thing the author learned is that failing to add an index can cost this much money before you notice.

Ideally the author and/or the vendor will also brainstorm ways to make these errors obvious before the high bill. Load testing with realistic data is one way (though people talk about load testing a lot more than they actually do it). Another would be watching for abrupt changes in the operations the billing is based on.

[1] This is something I wish I'd done while at Google. They have the raw data for this with their cloud-based work trees (with FUSE) and cloud-based builds. I think the hardest part would be to classify when someone is actively developing new code, but it seems doable.

Grimm1 · 4 years ago

No you've missed my point, the author seemingly didn't know that ForeignKeys applied indexes by default in MySql. It's not "Concept I didn't understand", clearly they're capable of understanding because they did after they ran into the issue. It's about not having had basic knowledge to begin with.

But he didn't see compiler errors, he caused monetary cost to his employer.

When I deploy something that unintentionally causes a large monetary bill to my employer, then yes I do believe that indicates a gap in knowledge so I don't in anyway believe I'm being uncharitable. Or and this would be worse, a lack of caring. (Which is not what I think happened here though)

I won't respond to your imposter syndrome bit I don't really think it's relevant to my point.

Nextgrid · 4 years ago

> I'm not entirely sure why buzz around "developer learns basic knowledge" has this on the front page.

The problem is that in the old days, not knowing about indexes left you with an underperforming system or downtime. But in The Cloud™ it leaves you with an unreasonably huge bill and that somehow as an industry we're accepting this as normal.

Grimm1 · 4 years ago

Which really is a head scratcher. You'd figure especially as a startup seeing a 5k oopsie isn't really as acceptable. Mistakes do happen and I don't mean any shade to this particular person (they'll never make this mistake again) but as an industry the aggregate consequence of this is you have a lot of waste and stupid choices that then have to be cleaned up when more knowledgeable (read highly paid) people are introduced later on.

They'll have to clean up the mess which causes real business consequences that, and I've personally seen this, will directly impact bottom line and have no quick or easy solution to wiggle out of.

Maybe it's acceptable for products like this because the balance between good engineering and company health probably aren't as cut and clear but stuff like this always makes me sad because it's such low hanging fruit, it doesn't require any real effort, just basic curiosity around your job.

heisenbit · 4 years ago

Yes we really should not accept this. The ability to impose limits on spending is key to control an enterprise. Whole security certification guacamole is based on having established controls. But where the bit hits the fan control is absent.

aspenmayer · 4 years ago

Using money to solve business problems is good business sense, but only if that’s the best way to spend that money. I agree with you that the status quo is normal, but nonsensical.

williamdclt · 4 years ago

> But in The Cloud™ it leaves you with an unreasonably huge bill and that somehow as an industry we're accepting this as normal.

No. Nobody finds that "normal", that's just untrue. It's even the _whole_ subject of this blogpost: the bill was not normal.

I don't disagree that some people are overrelying on cloud services, but that didn't become normality, it's still a beginner's mistake

derekdahmer · 4 years ago

I've been using relational databases for web apps for my entire career and probably would have made this same mistake if using PlanetScale for the first time.

The author had two misunderstandings:

1) An index isn't created automatically

2) You get billed for the number of rows scanned, not the number of rows returned

Even if I noticed #1, I probably wouldn't have guessed at #2 for the same reason as the author.

watt · 4 years ago

You are absolutely missing the point. The point is not about indexes or full table scans, but it's a about cloud providers who will charge you for every row "inspected" and how a full table scan might cost you $0.15 and it would add up. It's not about slow performance which you can diagnose and fix, it's about getting an unexpected $5k bill, which you can't fix.

And in the end, if the cloud provider wants to charge you for rows "inspected", this can't be buried in small print. That's unacceptable!

The billing must come with up-front, red capital letters warning, and must come with alerts when your bill is unexpectedly little high (higher than expected, not just 10x or 100x higher). It must automatically shut down the process, requiring the customer confirm they want proceed, that you actually want to spend all that money. And it must be on the cloud provider to detect billing anomalies and fully own them in case it goes the wrong way. This is the cloud "bill of rights" we need.

AnotherGoodName · 4 years ago

You'd be surprised and frustrated. If you ever see someone say "We hired Oracle consultants and they are miracle workers" or "NoSQL is sooo much faster than SQL" you can be pretty sure they missed databases 101 and the requirement to add indexes.

ampersandy · 4 years ago

> I'm not entirely sure why buzz around "developer learns basic knowledge" has this on the front page.

Because it's a well written, humble account of learning from a mistake then using it as an opportunity to teach others to help them avoid the same mistake.

If anyone leads a team, I hope they might learn from this approach, rather than just bashing on people and implying they don't deserve any attention because they made a mistake a more experienced developer might have dodged.

Grimm1 · 4 years ago

Personally I find it trite but whatever floats your boat.

pjscott · 4 years ago

One of the best database habits I've ever developed is to run EXPLAIN on every query that I expect to run repeatedly, then sanity-check the output. It's very little effort, and has prevented so much hassle.

williamdclt · 4 years ago

If we weren't using underwhelming ORM DSLs, I'd love to use/write a github bot that automatically runs EXPLAIN ANALYZE on queries updated in a PR and post the query plan!

yashap · 4 years ago

Seriously. Like, every junior dev has to learn DB indexing basics sometime, and apparently the other of this blog post just did. But really can’t understand why this article is getting voted to the top of HN.

azeirah · 4 years ago

What I gained from the article wasn't that the dev was unaware of indices, it's that he didn't realise indices were missing due to how planetscale's database disallows foreign keys.

I never worked with a database that doesn't have foreign keys and it's not unthinkable to forget when you do for the first time, that foreign keys were what created indexes for you automatically.

A little bit of planning could have prevented that though :/

cerved · 4 years ago

they were using some kind of foreign keyless MySQLish whatever thing