The Amazon Prime Day 2023 AWS Bill

version_five · 2 years ago

  Amazon Prime Day event resulted in an incremental 163 petabytes of EBS storage capacity allocated – generating a peak of 15.35 trillion requests and 764 petabytes of data transfer per day.

The main thing that strikes me is how (seemingly) inefficient everything is. What do they possibly need this amount of data for in selling stuff? Are they taking high-def video of every customer as they browse for something to buy? I get that it's a huge company and this is (I guess) their business time, but how can the y need so much storage. Ditto for much of the other stuff.

luhn · 2 years ago

Yeah, those numbers struck me as well. At 375 million items sold, that's about 0.5GB storage and 2GB transfer per item.

steveBK123 · 2 years ago

10+ years ago I worked on a trading system that was generating something like 1TB/day of messaging.

As we hit these levels we asked them - how many trades are we even doing on this system? The answer was something on the order of.. 50. Granted it was a bond system and the nationals are huge, but theres just no reason to store 20GB per trade.

These are the kinds of decisions that get made when one team is responsible for message generation and the other is responsible for the storage, lol.

We then had to work backwards with them to unwind a lot of the INFO level chatty messaging between what you'd now call "microservices" and reduce the volume by 90+%.

fbdab103 · 2 years ago

I suppose you need to know how many requests did not result in a purchase. Is it 1000 views:purchase? I have not checked in on a Prime Day sale for several years, but is there any timeliness component (Flash Sales?) where people would be incentivized to mash the reload button?

tomwheeler · 2 years ago

Yes, but that's per item sold.

After looking at screen after screen of no-name garbage on Prime Day, I gave up. I suspect that there are tons of people like me. In other words, we only contributed to the numerator, not the denominator.

thenewarrakis · 2 years ago

I think the EBS numbers are "double counting". Most of the other services in the list are using EBS under the hood, so I wouldn't be surprised if this number includes stuff like the Aurora instances, CloudTrail events, SQS events, etc that are also included.

Also, it specifically says "incremental capacity allocated", not necessarily used. Keep in mind that every EC2 instance launched also means new EBS storage is allocated. The article also estimates that 50 million EC2 instances were used for Prime Day. If you assume that half of these were newly created to support the surge of Prime Day, 25 million instances using up 160 PB of storage is only 6 gigabytes per instance, which definitely seems in the realm of possibility.

rqtwteye · 2 years ago

It seems to me that a lot of modern architectures store the same data in multiple places. The systems I see proposed in my company probably need often 10 times more space than the actual data we have because they copy and cache a lot of stuff.

figassis · 2 years ago

Microservices requires denormalizing data across tables and dbs. There’s a cost to how many microservices you build.

CamperBob2 · 2 years ago

Hot take: Amazon's search UX is so terrible that it not only wastes near-endless amounts of customer time and patience, but their own bandwidth as well.

greatpostman · 2 years ago

They’ve a/b tested it to death

thayne · 2 years ago

A lot of that was certainly just for the root volumes of all those ec2 instances (how much exactly is hard to know without more details). Which of course would have duplicate copies of the various base images for the VMs.

Although, that does bring up the question of why AWS doesn't have a way to share a single read-only volume across multiple ec2 instances in the same availability zone. In many workloads there isn't any need to write to disk.

schlarpc · 2 years ago

There kind of is, but it's not really made for that use case so there's a bunch of caveats (it's read/write, has a limited max number of attachments, io1/2 required, can't be the boot volume): https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/ebs-volu...

kamikaz1k · 2 years ago

Sometimes it’s just a bad decision that happens to “scaLe”. Like the print video thing.[1]

1. https://youtu.be/J7ITgYBn_3k

twoodfin · 2 years ago

The EBS storage could easily be highly redundant (for good reason) local cache copies of store data.

pipingdog · 2 years ago

Logging, metrics, distributed action trace.

rurp · 2 years ago

> $102 million in infrastructure spend for an event that brought in over $12.7 billion in sales isn’t the worst return on investment that companies could make — by a landslide!

Well it's not amazing if your margin's are tiny, as they are in many industries (such as retail). Plus this was almost certainly architected by some of the foremost AWS experts in the world. It's verrrry easy to spend vastly more than was strictly necessary in AWS.

I don't mean to be too negative though, it was a really interesting article. Pretty wild to think about spending $100m on infrastructure over two days and still making a bunch of profit.

madrox · 2 years ago

Important to remember that, before you could burst your infrastructure in the cloud, sites simply went offline in events like this. You took actively lost revenue in those cases.

ndriscoll · 2 years ago

Or you could just design your architecture to not perform trillions of database requests for hundreds of millions of sales.

The listing data is almost static and should almost fit in RAM (the hot set probably does. Apparently Amazon has ~350M listings. A 24TB RAM server could give ~68kB/listing, and probably only a small fraction is hot). Since you'll need multiple servers anyway, you could shard on products and definitely fit things in RAM. 375 million sales even if condensed into 1 hour would only be 104k/second. A single db server should be able to handle the cart/checkout. Assuming ~10M page views/second, a couple racks of servers should be able to handle it.

The ad/tracking infrastructure surely can't account for the 1000x disparity in resource usage.

Uvix · 2 years ago

Depending on the margins that could be preferable.

boulos · 2 years ago

Yes, at 1% margin on those sales, that's more like $125M in revenue. It's important to remember that things like Prime Day are basically marketing that results in revenue outside the event.

dylan604 · 2 years ago

>It's important to remember that things like Prime Day are basically marketing

Be it Prime Day or Black Friday/Cyber Monday sales, I've seen the prices before the sale starts, and then once the sales start, it is the same price but with a slashed out higher MSRP type price. It's not any more of a sale during the sale than it was any of the other days.

Retric · 2 years ago

Yea, actual profit was likely 100 - 400 million or so. As such spending 102 million on a single line item would be a serious question for most companies.

Of course Amazon is paying itself that premium so they have little incentive to care.

mrbonner · 2 years ago

It’s not a surprise for me to hear that Amazon is still a heavy user of RDBMS all these years even after the so-called Rolling Stone project to get rid of Oracle DB in 2015. If Amazon can use RDBMS for their scale, I’m just furious when folks jumping up and down screaming in top of their lungs “Why do we use Postgres and not (insert some random NoSQL engine here)?” My response so far is calmly ask another question “Why not?” And let them try to find a justification to suite our scale requirements.

endisneigh · 2 years ago

It’s fascinating that this is your conclusion from the article. Mine would be that if you can make it work and believe these estimates then dynamodb is clearly more cost effective. And given that every project inevitably settles in access patterns and thus is a perfect fit for something like dynamodb, why bother with rdbms as the hot path? Just use dynamo and stream to a columnar database for analytics once your product is “finished”.

bognition · 2 years ago

It all depends on your workload, access patterns, and data model.

You can absolutely spend an arm and a leg making a system work using a RDBMS that would be simpler and cheaper using a NoSQL store. The opposite is also true.

When picking a database you should always consider the trade offs of the different technologies and weigh those against your goals and budgets.

Sometimes is okay to spend more for a system that is just simpler to manage and use. Sometimes it’s not.

orochimaaru · 2 years ago

Your application use cases should dictate the database choice - eg consistency needed, access patterns, data normalization, reliability, etc.

benjaminwootton · 2 years ago

The real cost would come in the months after whilst trying to decipher the bill adequately to track down everything you used and get it turned off. (Half a joke.)

I imagine there would be a ton of Lambda and the like in there too.

jayzalowitz · 2 years ago

Corey is probably right, but id chunk an extra 10-20% of overprovisioning/undercounting on actual bill here and considering they OWN the fleet, they probably went out of there way to have disaster recovery ready to go in a bunch more contexts.

ckdarby · 2 years ago

Even if AWS treats Amazon like any customer the article is off by a factor of 30-60%.

RIs for their RDS instances. Saving Plan for their EC2s.

1 or 3 year commit, no upfront vs all upfront, etc.

A customer at the size of Amazon using AWS would have private pricing arrangement and an EDP.

simpsond · 2 years ago

You wouldn’t commit for 3 years for increased resources of a single day.

leetrout · 2 years ago

You could and then sell the extra on the spot market for the other ~1000 days.

ckdarby · 2 years ago

True, but all of the usage is not net new and they'll have a base commit.

jayzalowitz · 2 years ago

Honestly, their EDP is probably effectively cost, set in stone to make sure that if the government breaks them up or something like that both systems are good.

jeffbee · 2 years ago

The amount of mail alone is bonkers. If we assume that half of this traffic went to the big operators, Google and Microsoft, each of them would have observed a noticeable traffic bump, 10s of 1000s of requests per second on average all day. It is fun to think about how these systems are interconnected and how they affect each other.

infinitedata · 2 years ago

Funny how folks here and from the article are fixated in comparing the $102M vs the $12.7b. They somehow forget there are product, advertising, warehouses, transportation, shipping, labor, operation and other labor cost involved. You didn’t spend $102 to earn $12,700…