Readit News logoReadit News
ko_pivot · 10 months ago
> On-call attempts to re-enable the R2 Gateway service using our internal admin tooling, however this tooling was unavailable because it relies on R2.

It's comforting to see this happen to a big tech co!

belter · 10 months ago
My LLM weights say the word Incident and Cloudflare are never more than two clicks away... :-)

https://hn.algolia.com/?q=cloudflare+incident

Permik · 10 months ago
I'd guess that this actually affects all [corp same scale as Cloudflare], but Cloudflare is the only one that's actually transparent about it.
philipwhiuk · 10 months ago
I was surprised not to see a follow up on this item
d1sxeyes · 10 months ago
While considering that sounds sensible, it seems the on-call was able to escalate to the team with very little delay.

As far as I can tell from the timeline, it only took 11 minutes from the moment the on-call first attempted the action until the ops team began responding.

Given that this issue was caused by someone unintentionally using a level of access that they had to do something they did not intend, and the minimal impact reduction, deciding not to grant higher levels of access to the on-call seems to me to be the right decision.

S0y · 10 months ago
That line also made me chuckle quite more than it should have.
sfeng · 10 months ago
You can tell a company really builds using their own products when an abuse system can take them offline!
j45 · 10 months ago
Cloudflare has nice services that they make available to a lot of people for free.

At the same time, this reminds me the cloud is someone else's computer, and seeking input and ideas of how to have a failover with other services or something.

Does anyone know of a setup or design that can shim in a bit of redundancy with something like this? Using one cloud does kind of tie you to them a bit more.

mthmcalixto · 10 months ago
Surely it was the new intern.
AstroJetson · 10 months ago
Who just got a million dollars worth of training.
archon810 · 10 months ago
The training was done by AI. In fact, the intern is also AI.
tianice · 10 months ago
appears to be caused by service circular dependency
cjbprime · 10 months ago
Not really -- the circular dependency issue happened 28 minutes after the customer impact started. That issue was a cause of the outage being double the length it could have been, but not an original cause of the outage.