Warning: $14k BigQuery charge in 2 hours

Warning: most cloud providers (Google, Amazon, Microsoft) require you to accept unlimited liability to use their services.

If you're running a business and you have lawyers, then fair enough — just play the game. But for individuals, it seems crazy that so many of us accept this sort of thing. Good luck contesting the charge with your credit card company when you already agreed to a contract that said Google could bill you thousands of dollars and then you used thousands of dollars worth of their service.

Big cloud providers are not your friend. They do not care if they destroy the lives of you and your family, unless it's happening so often that it's making mainstream news.

My advice is to go and delete your cloud accounts, and only use services that offer hard spending caps, and ideally prepaid accounts.

Maybe this doesn't leave many options. Oh well. Maybe if you can't afford big lawyers then you also can't afford the risks of using big cloud.

romeros · 2 years ago

This is just a single data point but I had a surprise bill with Google. I talked to the support and got it waived off.

I used Amazon EC2 instances for years and I always felt in control. There were never any surprises. I knew even in the worst case situation I would be okay because I had faith in the Amazon support. With Google I felt insecure. I never played with any of Google cloud services since then.

Amazon's customer first policy is really true. They try their absolute best to make sure there are no surprises to a great extent. Even the UI is very intuitive.

theolivenbaum · 2 years ago

Same here - incidentally was also one of the weirdest interactions with customer support I've ever had. I suspect the first point of contact was some sort of LLM/chatbot that desperately wanted to make sure I was feeling fine and that there was nothing to worry about. When I was forwarded to the billing support team the interaction went back to normal - couple of messages back and forth and some homework to set the real budget limit (the quota is just for alarms) and they waved the charge.

tw04 · 2 years ago

>Amazon's customer first policy is really true.

Which part of customer first drove their egress fee policies?

reaperman · 2 years ago

Same here. GCP waived off a surprise bill of $4,500 when I accidentally left a TPUv1 running for a month many years ago on a personal project (I was just toying around with the new TPU for an hour or so in my own free time, and didn't realize that unlike a GPU, the TPU has to be shut off separately from the CPU/VM or else it keeps charging by the hour.

LunaSea · 2 years ago

Amazon definitely also has it's share of billing issues.

A personal example would be that we reserved an instance based on information given by our AWS account manager. Said instance turned out to have issues linked to my original question to the account manager who answered incorrectly.

The reserved instance team then refused to refund us but also refused to tell how much they would prorate if we were to upgrade instead.

Basically a protection racket.

SOLAR_FIELDS · 2 years ago

I simply don’t accept this argument, primarily because the way AWS handles NAT gateway fees is really only explainable as something that is designed to be predatory

httparchive · 2 years ago

Yeah, I have spent much more than $14k to date and would have spent much more over time, losing my business isn't rational. I think it's just another "Google can't do customer support to literally save their life" example.

AdamJacobMuller · 2 years ago

All of the cloud services I have are setup only with privacy.com cards. I have each individual cards limited to just above what the monthly expected spend is. Even if there's a (reasonable) spike I can see it and I have to take manual action before the charge will go through.

Can not recommend privacy.com enough.

myself248 · 2 years ago

That's not what privacy.com does or is for. They advertise it, but I've had transactions blow right through the façade. Specifically, the New York Times, after my trial subscription ended and I watched the stupendously-expensive charges bounce, they kept trying and eventually tried a different way and it went through.

I emailed support, and here's what I got back:

> Hi, $firstname. I've been reviewing your dispute and wanted to touch base with you to explain what happened.

> It appears that the disputed charge is a "force post" by the merchant. This happens when a merchant cannot collect funds for a transaction after repeated attempts and completes the transaction without an authorization — it's literally an unauthorized transaction that's against payment card network rules. It's a pretty sneaky move used by some merchants, and unfortunately, it's not something Privacy can block.

ihattendorf · 2 years ago

Doesn't stop them from trying to collect after the transaction is declined. It's not a prepaid service, you're agreeing to pay the charges _after_ you've used the service.

Will they pursue? Do they have enough info to purse? Who knows, but they can if they want to.

lolinder · 2 years ago

This is very much not what privacy.com is for, and it won't protect you from $14k in BigQuery bills. There is no clause in the GCP contract (or any other contract, for that matter) which says "if your payment method is invalid when we go to collect what you owe us, we forfeit all right to be paid."

For small charges they might just give up because it's not worth it, but when dealing with a $14k bill you should assume that they will at the very least hand the debt off to a collections agency if you try to just ignore it.

gpvos · 2 years ago

You're still liable to Google/whoever for the full amount, so it is only a temporary reprieve. Which can be useful, but does not solve the main problem.

nothttparchive · 2 years ago

Yup, I'm already having to pay legal fees - which is why you have a biz lawyer on retainer to start with - but I'm not sure I have any standing.

jeffparsons · 2 years ago

IANAL, but if this happened to me I would be gathering as many examples as I could of this having happened to other people. The angle being: Google knows this is a huge issue. Effectively, they know that they have (presumably accidentally) created a really dangerous trap for small players, and have chosen to do nothing about it.

In some jurisdictions I think that reduces the legitimacy of their claim that you actually owe them money.

EDIT: Even better, focus on the examples where Google "forgave" the debt; you could argue that those examples prove that Google knows it's at least partly their fault.

I frequently see this kind of surprising billing anecdotes across many cloud providers. Why don't they provide a way to set a hard budget limit applied for the entire account. I tried to see what can be done for GCP and this seems pretty daunting.

https://medium.com/@steffenjanbrouwer/how-to-set-a-hard-paym...

onion2k · 2 years ago

The reasons are probably quite complicated, because some of them are bound by hard technical limits to how quickly a system can react and thus make a hard limit actually a hard limit, but realistically that's largely solvable just by making it a softer hard limit (eg you set a limit of $1000 and the terms say you pay that plus whatever is used before the limit kicks in. More that $1000 but way less than $14000).

All of those technical reasons aside though, the commercial reason is obvious - people's mistakes and overages are a great source of revenue and profit. Companies refund the times where it'd be enough to lose the customer, or when it hits HN, but they make more money every time someone pays up. They have no incentive to fix it. It's part of the business model.

londons_explore · 2 years ago

There is also the fact that if a company has critical systems go down because GCP hit some hard budget limit, it will be reported in the press as "Netflix down globally due to issue with Google Cloud".

Google doesn't want the bad press. Most real companies would prefer to have a big bill when their product surges in popularity than have unexpected downtime at the worst time.

beejiu · 2 years ago

There are no conceivable "hard technical limits" that make such a system difficult. It's 100% commercial.

thehappypm · 2 years ago

It’s really not part of the business model.

Sure, this guy fat-fingering $10k sounds amazing.

But GCP deals with businesses paying for years of service. Multi million dollar deals are common.

Google and AWS and the like could give a flying fuck about anything under $100k.

Deleted Comment

deelowe · 2 years ago

Because then we'd see articles about how the next start up missed their opportunity whenever their site unexpectedly got discussed on the latest Rogan episode and subsequently was taken offline by the limits being tripped.

ghaff · 2 years ago

There's no "right" answer. In one case, it's checked the wrong box and got a $14K bill. In the other case, it's I checked the wrong box and my startup missed its one window. There are in-between levels of alerting etc. for both populations but they're probably unsatisfactory for the extreme conditions.

To be clear: I'd be very in favor of the major cloud providers having a "DO NOT! DO NOT! use this for production mode and your content could be deleted at any time if you screw up. But I suspect most people wouldn't use that."

sfn42 · 2 years ago

I don't see the problem. Don't set a budget limit if you don't want your app to go offline. Lots of people wouldn't mind if their app went offline for a bit. They'd prefer to not suddenly get a $10,000 bill

onion2k · 2 years ago

Companies could make the limit optional and pass 100% of that downside to the customer. 99.9% of customers would opt in.

thatoneguy · 2 years ago

Google AppEngine used to have that but — presumably in the interest of additional profit — they removed it. Now I have to make do with an alert that warns me long after I could be hypothetically bankrupted, i.e. in seconds.

httparchive · 2 years ago

I learned of that billing limiting mechanism after the $14k was charged to my account. As designed.

klysm · 2 years ago

What incentive to cloud providers have to give you that ability? I think they greatly appreciate the ability to accidentally spend a lot of money

rvnx · 2 years ago

An unhappy customer won't come back.

The OP is probably a good person with strong interest in data science and building projects.

If it'd be "oh here's your $500 charge, upgrade your quota for more, 'ok fair enough, I did a mistake'", but $14k is not ok without explicit quota upgrade.

summerlight · 2 years ago

To prevent a bad PR like this? When it goes viral, most customer supports escalate it to a higher level then they just eventually cancel the bill.

braza · 2 years ago

tbh, I have worked with AWS for at least 10 years, and recently their field support are quite prone to help avoid those scenarios (e.g. helped to save hundreds of thousands in a single-digit million account).

This was one of the main selling points for all portfolio companies of the group to adopt AWS in their digital transformation projects.

ToucanLoucan · 2 years ago

My cynical self sees it as how cloud providers aim to make the most money: by making billing oblique and waiting for buzzword-happy project leads to mandate stuff be put on their service without understanding what the end billing will be.

I can't say that's for certain what it is. I just know a hallmark of any business with recurring charges that are otherwise incomprehensible is so they can hit you with the charge after the fact, and you have little recourse to avoid paying it without a ton of work for yourself or your team.

Bjartr · 2 years ago

Because they aren't sufficiently incentivized to make giving them your money harder.

slyall · 2 years ago

The problem is that part of your bill may include things that cost money even if nothing is "running".

Would you like Amazon to delete all your files, disks and backups once you hit your limit?

Also Static IPs, Load balancers, DNS zones?

MrDarcy · 2 years ago

Google does provide a way, project owners can set a custom quota to limit costs.

yjftsjthsd-h · 2 years ago

So your comment made me go look it up, and if you squint hard that's kind of true...

https://cloud.google.com/billing/docs/how-to/notify#cap_disa...

Notice that their "solution" is to tell you how if you want you can spin up effectively your own custom service to watch spend and if it goes over some threshold delete the entire project[0] after some delay. This is the malicious compliance version of letting you add a limit.

[0] At least, that's how I interpret "This example removes Cloud Billing from your project, shutting down all resources. Resources might not shut down gracefully, and might be irretrievably deleted. There is no graceful recovery if you disable Cloud Billing. You can re-enable Cloud Billing, but there is no guarantee of service recovery and manual configuration is required."

twism · 2 years ago

the setup of the budget limit isn't complcated. the linked article goes thru putting the monitors/alerts on pubsub and etc. which isn't mandatory.

To be honest, even the official guide [1] for BG does not have any information about how to make some info about query cost, budget, and service limits mechanisms [2].

I think the HTTP Archive team could set something in that regard.

PS: When I was an instructor for some cloud training in AWS, the first 2 hours were only to set up billing and budgets to avoid any kind of situation like this. No one would start training without all those locks in place in the first place.

[1] - https://github.com/HTTPArchive/httparchive.org/blob/main/doc... [2] - https://cloud.google.com/bigquery/docs/best-practices-costs

Yeah, I'm basically just having to write this off so it sucks for me (a lot - I'm bootstrapping a start up), but I'm more worried about other people (especially students) getting caught up in what feels like a scam given the language on the website not, ya know, mentioning the risk of being charged $14k.

hobofan · 2 years ago

The getting started guide linked by the website states:

> Note: The size of the tables you query are important because BigQuery is billed based on the number of processed data. There is 1TB of processed data included in the free tier, so running a full scan query on one of the larger tables can easily eat up your quota. This is where it becomes important to design queries that process only the data you wish to explore

Could this be a bigger warning? Sure.

Is something a scam just because they don't explain the general implications of entering your payment information to a usage-billed product? Not really.

I understand the argument against hard circuit-breakers (yeah, seems like a good idea, but had a good traffic spike and I'm down). But it makes even me cautious with respect to scenarios where I could just fat finger something. There are some controls but there are no guarantees in most cases.

This website makes it seem like this “public” dataset is for the community to use, but it is instead a for-profit money maker for Google Cloud and you can lose tens of thousands of dollars.

Last week I ran a script on BigQuery for historical HTTP Archive data and was billed $14,000 by Google Cloud with zero warning whatsoever, and they won’t remove the fee.

This official website should be updated to warn people Google is apparently now hosting this dataset to make money. I don’t think that was the original mission, but that’s what it is today, there’s basically zero customer support, and you can lose $14k in the blink of an eye.

Academics, especially grad students, need to be aware of this before they give a credit card number to Google. In fact, I’d caution against using this dataset whatsoever with this new business model attached.

gnfargbl · 2 years ago

The real issue here is that you didn't quite understand what BigQuery was when you pressed the button.

What it is, roughly, is a publicly-accessible data supercomputer. If you lost $14k in a blink of the eye, then I would think you consumed at least $4k of Google's actual resources -- maybe $7k. Maybe more. That thing can move some serious data, and you apparently moved around over 2PB.

Google bears some significant responsibility for not making the cost transparent to you, it's true. But on the the other hand, don't they bear some significant credit for making such an awesome power available to a lowly peon with a credit card?

buremba · 2 years ago

This happens because Google hides the query cost behind its abstracted "TBs scanned" (for their data format, not even open-source so it's hard to estimate in advance) or even worse "slots" mechanism. Only a fraction of people try to understand how much these slots cost and most of them are the people who got an unexpected bill after using BigQuery and became more aware of how the product works.

If GCP would return the query cost in the API and show it directly in the console when you run a query, it would be much easier for their users but unfortunately, it's not Google's interest for obvious reasons.

judge2020 · 2 years ago

Do you run httparchive, or did you make your username "httparchive" just because it's the subject of your post?

abeyer · 2 years ago

If the latter... I'm not sure that it's explicitly against the rules, but coopting a name of something as your handle just to complain about it is in poor taste and probably should be.

Did the cost estimate calculator provide an inaccurate estimate?

https://cloud.google.com/bigquery/docs/best-practices-costs

Estimate query costs

BigQuery provides various methods to estimate cost:

Use the query dry run option to estimate costs before running a query using the on-demand pricing model. Calculate the number of bytes processed by various types of query. Get the monthly cost based on projected usage by using the Google Cloud Pricing Calculator.

cornel_io · 2 years ago

When I use the BQ interface, it estimates the bytes for each query in real time before I run it, does that turn off if the query is too big? I guess that isn't directly a cost estimate, but if I saw hundreds of TB I'd think twice before hitting "Run"...

mike_d · 2 years ago

> Google is apparently now hosting this dataset to make money

Public datasets are hosted for free by Google (Amazon has a similar program) to take the burden off public projects.

You didn't pay for the data, you paid for the query you ran against it.

Jgrubb · 2 years ago

Well sure, but how do you query the data they're hosting for free without using google services?

neurostimulant · 2 years ago

If you're going to make a throwaway account to criticize a website, you shouldn't use that website name as your username. That makes you look like a troll even if you have legitimate complaints.

Dead Comment

Havoc · 2 years ago

Here is a gigantic footgun powerful enough to nuke yourself back into the Stone Age.

To learn how to use it you’ll have to try it. Learning by doing. Trial and error.

And no you may not use blanks while learning use of footgun. No training wheels. No precautions. Has to be live - full send unlimited risk.

Actually - here are some safety squints (billing alerts) - just to give you some illusion of control.

Good luck! Yours truly big cloud

justinclift · 2 years ago

Fits in well with everyone's natural first query (SELECT * FROM everything), so people can see the type of data it's returning in order to narrow it down.

everforward · 2 years ago

Not specifically because of BigQuery, but I have taken to adding " LIMIT 10" to that for my default query because of accidentally locking up 10TB databases too many times.

renonce · 2 years ago

So now you say on your webpage your pricing is $1/TB or something. Great. But there is a caveat: the amount you pay depends on some complex factors such as the size of the table or the duration of your code. If the factors are so simple that no more than grade-level arithmetic is required to calculate my costs then that would be fine. But what if it gets a little bit more complex than that, such as “table size is 1PB and cost per 1TB is $1”? Did you know 1PB=1000TB rather than 0.001TB? What about “you need another $10 query to figure out the size of the table”? Or “the cost depends on number of function calls that your code makes and if you accidentally recursed yourself too many times you can’t limit it”? Or “The server is $5/mon but IP is $1/h and outbound traffic is $10/GB and if someone download something 1TB on your server you will pay $10000 within 2 hours”? At some point the factors related to billing is going to become non-trivial and every sentence in a long 10 page document could have 100x’d your costs, what makes this service different from a scam? You could have allowed me to set a billing cap so I won’t have to pay anything beyond $10, so that “$10” is everything I have to care about, could you?

handoflixue · 2 years ago

> At some point the factors related to billing is going to become non-trivial

First response on the OG link covers this with a screenshot: the size of the query is previewed beforehand, and you have to check a checkbox to acknowledge it. (I dare say listing it in PB instead of $$$ is still a scummy move, etc. - but they do resolve about half of your concerns right there)

neilv · 2 years ago

Not the same thing, but: some pre-Web Usenet programs would have warnings before "expensive" operations:

> Version 4.3 patch 30 of rn’s Pnews.SH (September 5, 1986, published to support the new top-level groups) introduced the “thousands of machines” message:

> > This program posts news to thousands of machines throughout the entire civilized world. You message will cost the net hundreds if not thousands of dollars to send everywhere. Please be sure you know what you are doing.

-- https://retrocomputing.stackexchange.com/questions/14763/wha...

mikeortman · 2 years ago

The dataset IS free to download, but running a query against it on Google Cloudis what costs $$$. BigQuery is basically renting servers to scan through the data, which is the fee

treffer · 2 years ago

The complaint says there should be a warning that processing fees can be high. Go to the front page and check out the links. Nothing really about cost. Someone follows that path and 14k gone without a word about it. That's the path that people are sent down from the website. It explicitly talks about using BQ for analysis.

A simple "running queries over the whole dataset can cause significant costs due to the size of the dataset" should be enough. And I think that's a valid and fair point.

The whole part of accusing Google should just be ignored.

darth_avocado · 2 years ago

The setup instructions mention what you’re asking.

https://github.com/HTTPArchive/httparchive.org/blob/main/doc...

IshKebab · 2 years ago

> The whole part of accusing Google should just be ignored.

I don't know. Google could trivially solve this problem by imposing an opt-out warning on potentially expensive queries.

"It looks like your query might cost $14k. Are you sure?"

But money.

threeseed · 2 years ago

Given how small the dataset is there is no query that justifies a $14k charge.

AWS charges $27/hour for a server with 3TB of memory. Enough to run the queries in memory.

BQ charges you based on the volume of data being scanned. I think this is a situation which involves scanning the whole dataset again and again without fully understanding how it works. I’ve worked with much larger datasets on BQ (petabyte scale) and managed to not spend more than $1000 in an hour. Also, BQ tells you how much data will be processed BEFORE you run the query, which makes it easier to understand the cost implications.

Again, you could fit the whole dataset in memory in an EC2 instance and do your thing.

Symbiote · 2 years ago

It's easy to make an enormous query by joining to other data (or to the same data), or reading a lot of data.

A regex query on response_bodies would churn through 2.5TB of data every time it's run.