"With range requests, the client can request to retrieve a part of a file, but not the entire file. ... Due to the way AWS calculates egress costs the transfer of the entire file is billed." WTF if true.
This must be a regression bug in AWS's internal system. At a past job (2020) we used S3 to store a large amount of genomic data, and a web application read range requests to visualize tiny segments of the genetic sequence in relevant genes - like 5kb out of 50GB. If AWS had billed the cost of an entire genome/exome every time we did that, we would have noticed. I monitored costs pretty closely, S3 was never a problem compared to EC2.
It also seemed like the root cause was an interrupted range request (although I wasn't fully clear on that). Even so that seems like a recent regression. It took me ages to get that stupid app working, I interrupted a lot of range requests :)
You are right, this is about canceling range requests and still getting billed, not about requesting ranges and getting billed for the complete file egress. Sorry; we'll make the post clearer.
That sounds egregious enough that I have trouble believing this can be correct. My understanding is that AWS bills for egress for every service, parts of the file that aren't transferred are not part of this so can't be billed. There could certainly be S3-specific charges that affect cases like this, no idea. But if AWS bills the full egress traffic costs for a range request I'd consider that essentially fraud.
Sorry, I think that part of our write-up is misleading (I was involved in analyzing the issue described here). To our best understanding, what happens is the following:
- A client sends range requests and cancels them quickly.
- The full range request data will be billed (NOT the whole file), so I think this should read that the entire requested range gets billed, even if it never gets transferred (the explanation we received for this is that it's due to some internal buffering S3 is doing, and they do count this as egress).
In any case, if you send and cancel such requests quickly (which is easy enough, this was not even an adversarial situation, just a bug in some client API code) the egress cost is many times higher than your theoretical bandwidth (and about 80x higher than in the AWS documentation, hence the blogpost).
The way billing is calculated should be clearly labeled along with the pricing. Azure does this too, it's super unclear what metric they're using to determine what will be billed for requests. We're having to find out via trial and error. If we request 0-2GB on a 6GB file, but the client cancels after 400MB. Are we paying 2GB or 400MB or 6GB?
Is there a billed difference between Range: 0-, no "Range" header, and Range: 0-1GB if the client downloads 400MB in each scenario?
Sorry for not having this made clearer (we'll fix this part of the post): the gotcha is not that AWS does not honor range requests, it's that canceling those will still add the full range of bytes to your egress bill (and this can add up quickly) although no bytes (or much fewer) have been transferred.
On the other hand you did ask for them so what does it mean “canceling”? Just playing devil’s advocate that they did likely start getting the data for you and that takes resources. Otherwise they would be open to a DOS attack that initiates many requests and then cancels them.
Azure is probably the most egregious example of this, AWS and GCP can at least claim they have architectural barriers to implementing a hard spending cap, but Azure already has one and arbitrarily only allows certain subscription types to use it. If you have a student account then you get a certain amount of credit each month and if you over-spend it then most services are automatically suspended until the next month, unless you explicitly opt-out of the spending limit and commit to paying the excess out of pocket. However if you have a standard account you're not allowed to set a spending limit for, uh... reasons.
That's insane as well. They already built the system, but you just can't use it because we want the option for you to screw up and pad our billing. There are many projects I've worked on where a service not being available until the 1st of the next month would not be anything more than a minor annoyance, and would much rather that happen than an unexpected bill. This is also something that I think would be a nice CYA tool when developing something in the cloud for the first time. It's easy to make a mistake when learning cloud services that could be expensive like TFA shows.
"Thank you to everyone who brought this article to our attention. We agree that customers should not have to pay for unauthorized requests that they did not initiate. We’ll have more to share on exactly how we’ll help prevent these charges shortly." — Jeff Barr, Chief Evangelist, Amazon Web Services
AWS APIs need a cleanup. I am constantly running into issues not documented in the official doc, boto3 docs, or even on StackOverflow. It's not even funny when a whole day goes by on trying to figure out why I see nothing in the body of a 200 OK response when I request data which I know is there in the bowels of AWS. Then it turns out that one param doesn't allow values below a certain number, even though the docs say otherwise.
Historically, they've been scared of versioning their APIs (not many services have done it, dynamodb has, for example).
It leads to a "bad customer experience", having to update lots of code, and also increases maintenance costs while you keep two separate code paths functional.
There's a lot about the S3 API that would be changed, including the response codes etc., if S3 engineers had freedom to change it! I remember many conversations on the topic when I worked alongside them in AWS.
It’s quite insane the levels of effort S3 engineers put in to maintain perfect API compatibility. Even tiny details such as whitespace or ordering have messed up project timelines and blocked important launches.
That could be the same root cause. You download data via range request but with no upper bound and AWS is billing you for very much more than you downloaded in reality.
I can assure you this was not AI-generated, apart from the 'symbolic image' (which should be fairly obvious :).
Maybe that's just our non-native English shining through. In any case, as a small European company in the healthcare space, we are quite used to having to explain "the cloud" (with all potential and pitfalls) to our customers. They are also (part of) the target audience for this post, hence the additional explanations.
(Not OP and not author of the article, but was involved in the write-up.)
It also seemed like the root cause was an interrupted range request (although I wasn't fully clear on that). Even so that seems like a recent regression. It took me ages to get that stupid app working, I interrupted a lot of range requests :)
- A client sends range requests and cancels them quickly.
- The full range request data will be billed (NOT the whole file), so I think this should read that the entire requested range gets billed, even if it never gets transferred (the explanation we received for this is that it's due to some internal buffering S3 is doing, and they do count this as egress).
In any case, if you send and cancel such requests quickly (which is easy enough, this was not even an adversarial situation, just a bug in some client API code) the egress cost is many times higher than your theoretical bandwidth (and about 80x higher than in the AWS documentation, hence the blogpost).
Is there a billed difference between Range: 0-, no "Range" header, and Range: 0-1GB if the client downloads 400MB in each scenario?
Deleted Comment
https://learn.microsoft.com/en-us/azure/cost-management-bill...
The simple truth is it's good for their earnings report if they have unrestricted access to your wallet
https://twitter.com/jeffbarr/status/1785386554372042890
[0]: https://news.ycombinator.com/item?id=40203126
It leads to a "bad customer experience", having to update lots of code, and also increases maintenance costs while you keep two separate code paths functional.
There's a lot about the S3 API that would be changed, including the response codes etc., if S3 engineers had freedom to change it! I remember many conversations on the topic when I worked alongside them in AWS.
https://news.ycombinator.com/item?id=40203126#40205213https://github.com/ZJONSSON/node-unzipper/issues/308
Maybe that's just our non-native English shining through. In any case, as a small European company in the healthcare space, we are quite used to having to explain "the cloud" (with all potential and pitfalls) to our customers. They are also (part of) the target audience for this post, hence the additional explanations.
(Not OP and not author of the article, but was involved in the write-up.)
- Stop using S3 and other AWS (perhaps it stands for Amazon Web Scams?) things already and switch to Cloudflare R2...