https://status.cloud.google.com
https://status.cloud.google.com/incident/compute/19003
Status page reports all green, however the outage is affecting YouTube, Snapchat, and thousands of other users.
https://status.cloud.google.com/incident/compute/19003
Status page reports all green, however the outage is affecting YouTube, Snapchat, and thousands of other users.
We're having what appears to be a serious networking outage. It's disrupting everything, including unfortunately the tooling we usually use to communicate across the company about outages.
There are backup plans, of course, but I wanted to at least come here to say: you're not crazy, nothing is lost (to those concerns downthread), but there is serious packet loss at the least. You'll have to wait for someone actually involved in the incident to say more.
There's some irony in that.
I’m not in SRE so I don’t bother with all the backup modes (direct IRC channel, phone lines, “pagers” with backup numbers). I don’t think the networking SRE folks are as impacted in their direct communication, but they are (obviously) not able to get the word out as easily.
Still, it seems reasonable to me to use tooling for most outages that relies on “the network is fine overall”, to optimize for the common case.
Note: the status dashboard now correctly highlights (Edit: with a banner at the top) that multiple things are impacted because Networking. The Networking outage is the root cause.
Not long after that incident, they migrated it to something that couldn't be affected by any outage. I imagine Google will probably do the same thing after this :)
So memegen is down?
except time
Deleted Comment
Can confirm with Gmail in Europe. Everything works but it's sluggish (i.e. no immediate reaction on button clicks).
Sounds like Google and Amazon are hiring way too many optimists. I kinda blame the war on QA for part of this, but damn that’s some Pollyanna bullshit.
Shouldn't that outage system be aware when service heartbeats stop?
Could this be a solar flare?
Cloud services live and die by their reputation, so I'd be shocked if Google ever tried to get out of following an SLA contract based on a technicality like that. It would be business suicide, so it doesn't seem like something to be too worried about?
Deleted Comment
https://www.zdnet.com/article/some-internet-outages-predicte... 768k Day
According to https://twitter.com/bgp4_table, we have just exceeded 768k Border Gateway Protocol routing entries, which may be causing some routers to malfunction.
I was actually surprised, as they tend to have excellent networking. Now I'm not nearly as distrusting as I was initially, knowing it was likely their ISP getting screwed by routing table overflow.
From that linked page:
"Customer Must Request Financial Credit
In order to receive any of the Financial Credits described above, Customer must notify Google technical support within thirty days from the time Customer becomes eligible to receive a Financial Credit. Customer must also provide Google with server log files showing loss of external connectivity errors and the date and time those errors occurred. If Customer does not comply with these requirements, Customer will forfeit its right to receive a Financial Credit. If a dispute arises with respect to this SLA, Google will make a determination in good faith based on its system logs, monitoring reports, configuration records, and other available information, which Google will make available for auditing by Customer at Customer’s request."
Might be a good month to rebuild all your models ;)
I would pay a premium for a cloud provider happy to give 100 percent discount for the month for 10 minutes downtime, and 100 percent discount for the year for an hour's downtime.
Besides, a provider credit is the least of most company's concerns after an extended outage, it's a small fraction of their remediation costs and loss of customer goodwill.
> I would pay a premium for a cloud provider happy to give 100 percent discount for the month for 10 minutes downtime, and 100 percent discount for the year for an hour's downtime.
It takes a lot of effort (exponential) to reliably (I. E. Designed to fail-working) build something that is guaranteed to have this level of uptime at these penalties.
So I'm sure that I can build something that works like this, but would you pay me $100 per GB of storage per month? $100 per wall-time hour of CPU usage? $100 per GB of Ram used per hour? Because these are the premium prices for your specs.
AWS refunded me in the first reply on the same day!
GCP sales rep just copy pasted a link to a self support survey that essentially told me, after a series of YES or NO questions that they can't refund me.
So why not just tell your customers like it is? Google Cloud is super strict when it comes to billing. I have called my bank to do a chargeback and put a hold on all future billing with GCP.
I'm now back to AWS and still on a Free Tier. Apparently the $300 Trial with Google Cloud did not include some critical products, AWS Free tier makes it super clear and even still I sometimes leave something running on and discover it in my invoice....
I've yet to receive a reply from Google and its been a week now.
I do appreciate other products such as Firebase but honestly for infrastructure and for future integration with enterprise customers I feel AWS is more appropriate and mature.
I really wanted to try out their new autoML but I was paranoid of entering my credit card and getting banned from Google
I think it's weird to say you get credit in dollars and then not be able to spend it on everything. That's not how money works. But that's the way hosting providers work and afaik it's quite well known. Especially with a large sum of "free money", even if it's not well known, it was on you to check any small print.
I didn't read it that way. I thought they were complaining about poor customer service that made it difficult to understand the bill or respond to it appropriately.
>I have called my bank to do a chargeback
You're issuing a chargeback because you made a mistake and spent someone else's resources? And you're admitting to this on HN? I'm not a lawyer, but that sounds like fraud and / or theft to me.
It’s pretty convenient for companies like Comcast and Google that have poor customer service.
OP sounds like they're just defending their selves from ambiguous draconian billing robots.
Deleted Comment
The infinite money spout that is Google Ads has created a situation in which devs are at Google just to have fun - there really is no incentive to maintain anything because the money will flow regardless of quality.
Source: I interned at Google.
Deleted Comment
So no matter where you go for your cloud services, you're guaranteed a useless status page. Yippee.
Having an excel file where people enter statuses is not very useful to me as a customer. That’s more like a blog.
https://www.whoishostingthis.com/#search=status.cloud.google...
Deleted Comment
It's always interesting to see these outages at large cloud providers spider out across the rest of the internet, a lot of the world depends on Google to stay up.
When the mainframe is down terminals are useless.
Yup, I'm trying to check the Associated Press News right now and it's having trouble connecting to "storage.googleapis.com".
I don’t miss being on pager duty one bit. I see it looming in my headlights, sadly.
... but not for everybody now.
Nothing you or I or the pager can do will speed that up.
I am aware some bosses won't believe that and I am not trying to make light of it. But there really isn't much else to do except wait.
Deleted Comment