Warning: most cloud providers (Google, Amazon, Microsoft) require you to accept unlimited liability to use their services.
If you're running a business and you have lawyers, then fair enough — just play the game. But for individuals, it seems crazy that so many of us accept this sort of thing. Good luck contesting the charge with your credit card company when you already agreed to a contract that said Google could bill you thousands of dollars and then you used thousands of dollars worth of their service.
Big cloud providers are not your friend. They do not care if they destroy the lives of you and your family, unless it's happening so often that it's making mainstream news.
My advice is to go and delete your cloud accounts, and only use services that offer hard spending caps, and ideally prepaid accounts.
Maybe this doesn't leave many options. Oh well. Maybe if you can't afford big lawyers then you also can't afford the risks of using big cloud.
This is just a single data point but I had a surprise bill with Google. I talked to the support and got it waived off.
I used Amazon EC2 instances for years and I always felt in control. There were never any surprises. I knew even in the worst case situation I would be okay because I had faith in the Amazon support. With Google I felt insecure. I never played with any of Google cloud services since then.
Amazon's customer first policy is really true. They try their absolute best to make sure there are no surprises to a great extent. Even the UI is very intuitive.
Same here - incidentally was also one of the weirdest interactions with customer support I've ever had. I suspect the first point of contact was some sort of LLM/chatbot that desperately wanted to make sure I was feeling fine and that there was nothing to worry about. When I was forwarded to the billing support team the interaction went back to normal - couple of messages back and forth and some homework to set the real budget limit (the quota is just for alarms) and they waved the charge.
Same here. GCP waived off a surprise bill of $4,500 when I accidentally left a TPUv1 running for a month many years ago on a personal project (I was just toying around with the new TPU for an hour or so in my own free time, and didn't realize that unlike a GPU, the TPU has to be shut off separately from the CPU/VM or else it keeps charging by the hour.
Amazon definitely also has it's share of billing issues.
A personal example would be that we reserved an instance based on information given by our AWS account manager.
Said instance turned out to have issues linked to my original question to the account manager who answered incorrectly.
The reserved instance team then refused to refund us but also refused to tell how much they would prorate if we were to upgrade instead.
I simply don’t accept this argument, primarily because the way AWS handles NAT gateway fees is really only explainable as something that is designed to be predatory
Yeah, I have spent much more than $14k to date and would have spent much more over time, losing my business isn't rational. I think it's just another "Google can't do customer support to literally save their life" example.
All of the cloud services I have are setup only with privacy.com cards. I have each individual cards limited to just above what the monthly expected spend is. Even if there's a (reasonable) spike I can see it and I have to take manual action before the charge will go through.
That's not what privacy.com does or is for. They advertise it, but I've had transactions blow right through the façade. Specifically, the New York Times, after my trial subscription ended and I watched the stupendously-expensive charges bounce, they kept trying and eventually tried a different way and it went through.
I emailed support, and here's what I got back:
> Hi, $firstname. I've been reviewing your dispute and wanted to touch base with you to explain what happened.
> It appears that the disputed charge is a "force post" by the merchant. This happens when a merchant cannot collect funds for a transaction after repeated attempts and completes the transaction without an authorization — it's literally an unauthorized transaction that's against payment card network rules. It's a pretty sneaky move used by some merchants, and unfortunately, it's not something Privacy can block.
Doesn't stop them from trying to collect after the transaction is declined. It's not a prepaid service, you're agreeing to pay the charges _after_ you've used the service.
Will they pursue? Do they have enough info to purse? Who knows, but they can if they want to.
This is very much not what privacy.com is for, and it won't protect you from $14k in BigQuery bills. There is no clause in the GCP contract (or any other contract, for that matter) which says "if your payment method is invalid when we go to collect what you owe us, we forfeit all right to be paid."
For small charges they might just give up because it's not worth it, but when dealing with a $14k bill you should assume that they will at the very least hand the debt off to a collections agency if you try to just ignore it.
You're still liable to Google/whoever for the full amount, so it is only a temporary reprieve. Which can be useful, but does not solve the main problem.
IANAL, but if this happened to me I would be gathering as many examples as I could of this having happened to other people. The angle being: Google knows this is a huge issue. Effectively, they know that they have (presumably accidentally) created a really dangerous trap for small players, and have chosen to do nothing about it.
In some jurisdictions I think that reduces the legitimacy of their claim that you actually owe them money.
EDIT: Even better, focus on the examples where Google "forgave" the debt; you could argue that those examples prove that Google knows it's at least partly their fault.
To be honest, even the official guide [1] for BG does not have any information about how to make some info about query cost, budget, and service limits mechanisms [2].
I think the HTTP Archive team could set something in that regard.
PS: When I was an instructor for some cloud training in AWS, the first 2 hours were only to set up billing and budgets to avoid any kind of situation like this. No one would start training without all those locks in place in the first place.
Yeah, I'm basically just having to write this off so it sucks for me (a lot - I'm bootstrapping a start up), but I'm more worried about other people (especially students) getting caught up in what feels like a scam given the language on the website not, ya know, mentioning the risk of being charged $14k.
The getting started guide linked by the website states:
> Note: The size of the tables you query are important because BigQuery is billed based on the number of processed data. There is 1TB of processed data included in the free tier, so running a full scan query on one of the larger tables can easily eat up your quota. This is where it becomes important to design queries that process only the data you wish to explore
Could this be a bigger warning? Sure.
Is something a scam just because they don't explain the general implications of entering your payment information to a usage-billed product? Not really.
I understand the argument against hard circuit-breakers (yeah, seems like a good idea, but had a good traffic spike and I'm down). But it makes even me cautious with respect to scenarios where I could just fat finger something. There are some controls but there are no guarantees in most cases.
This website makes it seem like this “public” dataset is for the community to use, but it is instead a for-profit money maker for Google Cloud and you can lose tens of thousands of dollars.
Last week I ran a script on BigQuery for historical HTTP Archive data and was billed $14,000 by Google Cloud with zero warning whatsoever, and they won’t remove the fee.
This official website should be updated to warn people Google is apparently now hosting this dataset to make money. I don’t think that was the original mission, but that’s what it is today, there’s basically zero customer support, and you can lose $14k in the blink of an eye.
Academics, especially grad students, need to be aware of this before they give a credit card number to Google. In fact, I’d caution against using this dataset whatsoever with this new business model attached.
The real issue here is that you didn't quite understand what BigQuery was when you pressed the button.
What it is, roughly, is a publicly-accessible data supercomputer. If you lost $14k in a blink of the eye, then I would think you consumed at least $4k of Google's actual resources -- maybe $7k. Maybe more. That thing can move some serious data, and you apparently moved around over 2PB.
Google bears some significant responsibility for not making the cost transparent to you, it's true. But on the the other hand, don't they bear some significant credit for making such an awesome power available to a lowly peon with a credit card?
This happens because Google hides the query cost behind its abstracted "TBs scanned" (for their data format, not even open-source so it's hard to estimate in advance) or even worse "slots" mechanism. Only a fraction of people try to understand how much these slots cost and most of them are the people who got an unexpected bill after using BigQuery and became more aware of how the product works.
If GCP would return the query cost in the API and show it directly in the console when you run a query, it would be much easier for their users but unfortunately, it's not Google's interest for obvious reasons.
If the latter... I'm not sure that it's explicitly against the rules, but coopting a name of something as your handle just to complain about it is in poor taste and probably should be.
BigQuery provides various methods to estimate cost:
Use the query dry run option to estimate costs before running a query using the on-demand pricing model.
Calculate the number of bytes processed by various types of query.
Get the monthly cost based on projected usage by using the Google Cloud Pricing Calculator.
When I use the BQ interface, it estimates the bytes for each query in real time before I run it, does that turn off if the query is too big? I guess that isn't directly a cost estimate, but if I saw hundreds of TB I'd think twice before hitting "Run"...
If you're going to make a throwaway account to criticize a website, you shouldn't use that website name as your username. That makes you look like a troll even if you have legitimate complaints.
I frequently see this kind of surprising billing anecdotes across many cloud providers. Why don't they provide a way to set a hard budget limit applied for the entire account. I tried to see what can be done for GCP and this seems pretty daunting.
The reasons are probably quite complicated, because some of them are bound by hard technical limits to how quickly a system can react and thus make a hard limit actually a hard limit, but realistically that's largely solvable just by making it a softer hard limit (eg you set a limit of $1000 and the terms say you pay that plus whatever is used before the limit kicks in. More that $1000 but way less than $14000).
All of those technical reasons aside though, the commercial reason is obvious - people's mistakes and overages are a great source of revenue and profit. Companies refund the times where it'd be enough to lose the customer, or when it hits HN, but they make more money every time someone pays up. They have no incentive to fix it. It's part of the business model.
There is also the fact that if a company has critical systems go down because GCP hit some hard budget limit, it will be reported in the press as "Netflix down globally due to issue with Google Cloud".
Google doesn't want the bad press. Most real companies would prefer to have a big bill when their product surges in popularity than have unexpected downtime at the worst time.
Because then we'd see articles about how the next start up missed their opportunity whenever their site unexpectedly got discussed on the latest Rogan episode and subsequently was taken offline by the limits being tripped.
There's no "right" answer. In one case, it's checked the wrong box and got a $14K bill. In the other case, it's I checked the wrong box and my startup missed its one window. There are in-between levels of alerting etc. for both populations but they're probably unsatisfactory for the extreme conditions.
To be clear: I'd be very in favor of the major cloud providers having a "DO NOT! DO NOT! use this for production mode and your content could be deleted at any time if you screw up. But I suspect most people wouldn't use that."
I don't see the problem. Don't set a budget limit if you don't want your app to go offline. Lots of people wouldn't mind if their app went offline for a bit. They'd prefer to not suddenly get a $10,000 bill
Google AppEngine used to have that but — presumably in the interest of additional profit — they removed it. Now I have to make do with an alert that warns me long after I could be hypothetically bankrupted, i.e. in seconds.
The OP is probably a good person with strong interest in data science and building projects.
If it'd be "oh here's your $500 charge, upgrade your quota for more, 'ok fair enough, I did a mistake'", but $14k is not ok without explicit quota upgrade.
tbh, I have worked with AWS for at least 10 years, and recently their field support are quite prone to help avoid those scenarios (e.g. helped to save hundreds of thousands in a single-digit million account).
This was one of the main selling points for all portfolio companies of the group to adopt AWS in their digital transformation projects.
My cynical self sees it as how cloud providers aim to make the most money: by making billing oblique and waiting for buzzword-happy project leads to mandate stuff be put on their service without understanding what the end billing will be.
I can't say that's for certain what it is. I just know a hallmark of any business with recurring charges that are otherwise incomprehensible is so they can hit you with the charge after the fact, and you have little recourse to avoid paying it without a ton of work for yourself or your team.
Notice that their "solution" is to tell you how if you want you can spin up effectively your own custom service to watch spend and if it goes over some threshold delete the entire project[0] after some delay. This is the malicious compliance version of letting you add a limit.
[0] At least, that's how I interpret "This example removes Cloud Billing from your project, shutting down all resources. Resources might not shut down gracefully, and might be irretrievably deleted. There is no graceful recovery if you disable Cloud Billing.
You can re-enable Cloud Billing, but there is no guarantee of service recovery and manual configuration is required."
Fits in well with everyone's natural first query (SELECT * FROM everything), so people can see the type of data it's returning in order to narrow it down.
Not specifically because of BigQuery, but I have taken to adding " LIMIT 10" to that for my default query because of accidentally locking up 10TB databases too many times.
So now you say on your webpage your pricing is $1/TB or something. Great. But there is a caveat: the amount you pay depends on some complex factors such as the size of the table or the duration of your code. If the factors are so simple that no more than grade-level arithmetic is required to calculate my costs then that would be fine. But what if it gets a little bit more complex than that, such as “table size is 1PB and cost per 1TB is $1”? Did you know 1PB=1000TB rather than 0.001TB? What about “you need another $10 query to figure out the size of the table”? Or “the cost depends on number of function calls that your code makes and if you accidentally recursed yourself too many times you can’t limit it”? Or “The server is $5/mon but IP is $1/h and outbound traffic is $10/GB and if someone download something 1TB on your server you will pay $10000 within 2 hours”?
At some point the factors related to billing is going to become non-trivial and every sentence in a long 10 page document could have 100x’d your costs, what makes this service different from a scam? You could have allowed me to set a billing cap so I won’t have to pay anything beyond $10, so that “$10” is everything I have to care about, could you?
> At some point the factors related to billing is going to become non-trivial
First response on the OG link covers this with a screenshot: the size of the query is previewed beforehand, and you have to check a checkbox to acknowledge it. (I dare say listing it in PB instead of $$$ is still a scummy move, etc. - but they do resolve about half of your concerns right there)
Not the same thing, but: some pre-Web Usenet programs would have warnings before "expensive" operations:
> Version 4.3 patch 30 of rn’s Pnews.SH (September 5, 1986, published to support the new top-level groups) introduced the “thousands of machines” message:
> > This program posts news to thousands of machines throughout the entire civilized world. You message will cost the net hundreds if not thousands of dollars to send everywhere. Please be sure you know what you are doing.
The dataset IS free to download, but running a query against it on Google Cloudis what costs $$$. BigQuery is basically renting servers to scan through the data, which is the fee
The complaint says there should be a warning that processing fees can be high. Go to the front page and check out the links. Nothing really about cost. Someone follows that path and 14k gone without a word about it. That's the path that people are sent down from the website. It explicitly talks about using BQ for analysis.
A simple "running queries over the whole dataset can cause significant costs due to the size of the dataset" should be enough. And I think that's a valid and fair point.
The whole part of accusing Google should just be ignored.
BQ charges you based on the volume of data being scanned. I think this is a situation which involves scanning the whole dataset again and again without fully understanding how it works. I’ve worked with much larger datasets on BQ (petabyte scale) and managed to not spend more than $1000 in an hour. Also, BQ tells you how much data will be processed BEFORE you run the query, which makes it easier to understand the cost implications.
Again, you could fit the whole dataset in memory in an EC2 instance and do your thing.
If you're running a business and you have lawyers, then fair enough — just play the game. But for individuals, it seems crazy that so many of us accept this sort of thing. Good luck contesting the charge with your credit card company when you already agreed to a contract that said Google could bill you thousands of dollars and then you used thousands of dollars worth of their service.
Big cloud providers are not your friend. They do not care if they destroy the lives of you and your family, unless it's happening so often that it's making mainstream news.
My advice is to go and delete your cloud accounts, and only use services that offer hard spending caps, and ideally prepaid accounts.
Maybe this doesn't leave many options. Oh well. Maybe if you can't afford big lawyers then you also can't afford the risks of using big cloud.
I used Amazon EC2 instances for years and I always felt in control. There were never any surprises. I knew even in the worst case situation I would be okay because I had faith in the Amazon support. With Google I felt insecure. I never played with any of Google cloud services since then.
Amazon's customer first policy is really true. They try their absolute best to make sure there are no surprises to a great extent. Even the UI is very intuitive.
Which part of customer first drove their egress fee policies?
A personal example would be that we reserved an instance based on information given by our AWS account manager. Said instance turned out to have issues linked to my original question to the account manager who answered incorrectly.
The reserved instance team then refused to refund us but also refused to tell how much they would prorate if we were to upgrade instead.
Basically a protection racket.
Can not recommend privacy.com enough.
I emailed support, and here's what I got back:
> Hi, $firstname. I've been reviewing your dispute and wanted to touch base with you to explain what happened.
> It appears that the disputed charge is a "force post" by the merchant. This happens when a merchant cannot collect funds for a transaction after repeated attempts and completes the transaction without an authorization — it's literally an unauthorized transaction that's against payment card network rules. It's a pretty sneaky move used by some merchants, and unfortunately, it's not something Privacy can block.
Will they pursue? Do they have enough info to purse? Who knows, but they can if they want to.
For small charges they might just give up because it's not worth it, but when dealing with a $14k bill you should assume that they will at the very least hand the debt off to a collections agency if you try to just ignore it.
In some jurisdictions I think that reduces the legitimacy of their claim that you actually owe them money.
EDIT: Even better, focus on the examples where Google "forgave" the debt; you could argue that those examples prove that Google knows it's at least partly their fault.
I think the HTTP Archive team could set something in that regard.
PS: When I was an instructor for some cloud training in AWS, the first 2 hours were only to set up billing and budgets to avoid any kind of situation like this. No one would start training without all those locks in place in the first place.
[1] - https://github.com/HTTPArchive/httparchive.org/blob/main/doc... [2] - https://cloud.google.com/bigquery/docs/best-practices-costs
> Note: The size of the tables you query are important because BigQuery is billed based on the number of processed data. There is 1TB of processed data included in the free tier, so running a full scan query on one of the larger tables can easily eat up your quota. This is where it becomes important to design queries that process only the data you wish to explore
Could this be a bigger warning? Sure.
Is something a scam just because they don't explain the general implications of entering your payment information to a usage-billed product? Not really.
Last week I ran a script on BigQuery for historical HTTP Archive data and was billed $14,000 by Google Cloud with zero warning whatsoever, and they won’t remove the fee.
This official website should be updated to warn people Google is apparently now hosting this dataset to make money. I don’t think that was the original mission, but that’s what it is today, there’s basically zero customer support, and you can lose $14k in the blink of an eye.
Academics, especially grad students, need to be aware of this before they give a credit card number to Google. In fact, I’d caution against using this dataset whatsoever with this new business model attached.
What it is, roughly, is a publicly-accessible data supercomputer. If you lost $14k in a blink of the eye, then I would think you consumed at least $4k of Google's actual resources -- maybe $7k. Maybe more. That thing can move some serious data, and you apparently moved around over 2PB.
Google bears some significant responsibility for not making the cost transparent to you, it's true. But on the the other hand, don't they bear some significant credit for making such an awesome power available to a lowly peon with a credit card?
If GCP would return the query cost in the API and show it directly in the console when you run a query, it would be much easier for their users but unfortunately, it's not Google's interest for obvious reasons.
If the latter... I'm not sure that it's explicitly against the rules, but coopting a name of something as your handle just to complain about it is in poor taste and probably should be.
https://cloud.google.com/bigquery/docs/best-practices-costs
Estimate query costs
BigQuery provides various methods to estimate cost:
Use the query dry run option to estimate costs before running a query using the on-demand pricing model. Calculate the number of bytes processed by various types of query. Get the monthly cost based on projected usage by using the Google Cloud Pricing Calculator.
Public datasets are hosted for free by Google (Amazon has a similar program) to take the burden off public projects.
You didn't pay for the data, you paid for the query you ran against it.
Dead Comment
https://medium.com/@steffenjanbrouwer/how-to-set-a-hard-paym...
All of those technical reasons aside though, the commercial reason is obvious - people's mistakes and overages are a great source of revenue and profit. Companies refund the times where it'd be enough to lose the customer, or when it hits HN, but they make more money every time someone pays up. They have no incentive to fix it. It's part of the business model.
Google doesn't want the bad press. Most real companies would prefer to have a big bill when their product surges in popularity than have unexpected downtime at the worst time.
Sure, this guy fat-fingering $10k sounds amazing.
But GCP deals with businesses paying for years of service. Multi million dollar deals are common.
Google and AWS and the like could give a flying fuck about anything under $100k.
Deleted Comment
To be clear: I'd be very in favor of the major cloud providers having a "DO NOT! DO NOT! use this for production mode and your content could be deleted at any time if you screw up. But I suspect most people wouldn't use that."
The OP is probably a good person with strong interest in data science and building projects.
If it'd be "oh here's your $500 charge, upgrade your quota for more, 'ok fair enough, I did a mistake'", but $14k is not ok without explicit quota upgrade.
This was one of the main selling points for all portfolio companies of the group to adopt AWS in their digital transformation projects.
I can't say that's for certain what it is. I just know a hallmark of any business with recurring charges that are otherwise incomprehensible is so they can hit you with the charge after the fact, and you have little recourse to avoid paying it without a ton of work for yourself or your team.
Would you like Amazon to delete all your files, disks and backups once you hit your limit?
Also Static IPs, Load balancers, DNS zones?
https://cloud.google.com/billing/docs/how-to/notify#cap_disa...
Notice that their "solution" is to tell you how if you want you can spin up effectively your own custom service to watch spend and if it goes over some threshold delete the entire project[0] after some delay. This is the malicious compliance version of letting you add a limit.
[0] At least, that's how I interpret "This example removes Cloud Billing from your project, shutting down all resources. Resources might not shut down gracefully, and might be irretrievably deleted. There is no graceful recovery if you disable Cloud Billing. You can re-enable Cloud Billing, but there is no guarantee of service recovery and manual configuration is required."
To learn how to use it you’ll have to try it. Learning by doing. Trial and error.
And no you may not use blanks while learning use of footgun. No training wheels. No precautions. Has to be live - full send unlimited risk.
Actually - here are some safety squints (billing alerts) - just to give you some illusion of control.
Good luck! Yours truly big cloud
First response on the OG link covers this with a screenshot: the size of the query is previewed beforehand, and you have to check a checkbox to acknowledge it. (I dare say listing it in PB instead of $$$ is still a scummy move, etc. - but they do resolve about half of your concerns right there)
> Version 4.3 patch 30 of rn’s Pnews.SH (September 5, 1986, published to support the new top-level groups) introduced the “thousands of machines” message:
> > This program posts news to thousands of machines throughout the entire civilized world. You message will cost the net hundreds if not thousands of dollars to send everywhere. Please be sure you know what you are doing.
-- https://retrocomputing.stackexchange.com/questions/14763/wha...
A simple "running queries over the whole dataset can cause significant costs due to the size of the dataset" should be enough. And I think that's a valid and fair point.
The whole part of accusing Google should just be ignored.
https://github.com/HTTPArchive/httparchive.org/blob/main/doc...
I don't know. Google could trivially solve this problem by imposing an opt-out warning on potentially expensive queries.
"It looks like your query might cost $14k. Are you sure?"
But money.
AWS charges $27/hour for a server with 3TB of memory. Enough to run the queries in memory.
Again, you could fit the whole dataset in memory in an EC2 instance and do your thing.
A regex query on response_bodies would churn through 2.5TB of data every time it's run.