>Now, here's the fun part. @Cloudflare runs a free DNS resolver, 1.1.1.1, and lots of people use it. So Facebook etc. are down... guess what happens? People keep retrying. Software keeps retrying. We get hit by a massive flood of DNS traffic asking for http://facebook.com
Believe it or not, there are places in the world where FB products (WhatsApp specifically) are used as the primary communication platform for most people.
Second comment was saying there is no point using Signal if they are down during 2 days. Only a few hours for FB yet but curiously nobody is saying the same :)
I wonder if any big DNS servers will artificially cache a long TTL NXDOMAIN response for FB to reduce their load. Done wrong, it would extend the FB outage longer.
Clients weren't getting NXDOMAIN, they were getting SERVFAIL because the nameservers were unreachable. These responses cannot be cached for more than 5 minutes [1].
It will have been cached at closer to the edge, but once the TTL expires, so does the cache. That means all the DNS requests that would have been served via local caches end up hitting the upstream DNS servers. For a site like Facebook that will be creating an asbolute deluge of requests.
Andecdotal but the whole of the internet feels sluggish atm.
No, since the positive response will normally be cached for "some time" dependant on a number of factors. The negative response on the other hand often won't get cached, again, dependent on settings.
It's disappointingly common for cloud-backed apps and device firmware to go into a hot retry loop on any kind of network failure. A lot of engineers just haven't heard of exponential backoff, to say nothing of being able to implement and test it properly for a scenario that almost never happens.
Even if you assume Facebook's own apps have reasonable failure logic, there's all kinds of third-party apps and devices integrating with their API that probably get it wrong. Surprise botnet!
Yes. It's basically turned every device, especially mobile devices with the app running in the background, into botnet clients which are continually hitting their DNS servers.
I don't know what facebook's DNS cache expiration interval was, but assume it's 1 day. Now multiply the load on the DNS that those facebook users put by whatever polling interval the apps use.
And then remember what percentage of internet traffic (requests, not bandwidth) facebook, whatsapp, and instagram make up.
Sort of, yeah. Typically a DDoS attack is done on purpose, this is a side effect of so many clients utilizing retry strategies for failed requests. But in both cases, a lot of requests are being made, which is how a DDoS attack works.
> Software keeps retrying. We get hit by a massive flood of DNS traffic asking for http://facebook.com
If you aren’t using exponential backoff algorithms for your reconnect scheme - you should be!
I have a device in the field, only a few thousand total, but we saw issues when our shared cloud would go down and everyone hammered it to get back up.
>Our small non profit also sees a huge spike in DNS traffic. It’s really insane.
It's not crazy; people are panicking over Facebook, Instagram and WhatsApp being down and they keep trying to connect to those services. I mean I would panic too if I were social media junky.
It’s not just "social media junkies", a very pretentious phrase to use considering you’re writing it in a comment on a social network. Hundreds of thousands of apps use Facebook APIs, often in the background too (including FB's own apps).
Strongly disagree. The outage has millions of people entering "Facebook" into their search engines. Most engines will conveniently put related news at the top of the search results page. The most recent and widespread Facebook-related news story is about the whistleblower.
Plus everyone has a lot of spare time to read the article now that Facebook and Instagram are down.
The outage didn't bury the story. It amplified it. Any suggestions that Facebook did this on purpose don't even make sense.
> recent and widespread Facebook-related news story is about the whistleblower
With respect I am pretty sure that the most recent and widespread Facebook-related news story is this one.
Holistically I agree that this isn't the kind of distraction Facebook wants, although it tickles me to imagine Mark in the datacenter going Rambo with a pair of wire cutters.
> Strongly disagree. The outage has millions of people entering "Facebook" into their search engines. Most engines will conveniently put related news at the top of the search results page. The most recent and widespread Facebook-related news story is about the whistleblower.
I am seeing 0 news about the whistleblower when I google Facebook. Only outage news.
Yeah but reading about it but also being able to communicate about it on the largest network (the one in question too) are 2 separate phenomena. No one can go on there right now and say I'm deleting my account, who's with me?
Not at all. I just tried searching for "Facebook" on Google. The whistleblower story is not on the first page of search results. The outage is mentioned half a dozen times on that same page.
I assume this outage is costing millions per hour. And it's not exactly great advertising for Facebook, either. I doubt very much they would do something like this on purpose.
Right, I know that, and I usually try to avoid conspiratorial thinking, but man, Zuck doesn't make it easy.
I'm just trying to process that FB is having its historic, all-networks global outage today of all days. And I bet FB would have paid double of whatever this will eventually cost them to make that story go away.
bingo. I don't care whether it's in the realm of tinfoil hat or not, this is the very real effect that this outage has had. By the time Facebook is back up, people on Facebook will be talking about the outage, not about the whistle blower report. Intentional or not, it will certainly be in Facebook's favor.
Facebook controls the algorithm, wouldn't they just be able to down amplify how much that story is spread on it's network? (Rather than resort to this?)
I love a good tinfoil hat theory, but in this case I doubt it. I have FB blocked on my network via pihole, but I don't explicitly block Instagram. Until sometime late last week (I noticed on Saturday), blocking facebook.com also blocked Instagram. As of this weekend, Instagram works just fine even with those blocks in place.
I suspect Facebook was making some change to their DNS generally, and they made some kind of mistake in deployment that blew up this morning.
Counterpoint: I had not even heard about the whistle-blower until seeing stories about the outage. One of the largest web services in the world being out of commission for multiple hours is a big deal in 2021. It's a top story on most news sites and other social media (e.g. here at HN, reddit, twitter). If you want something to pass under the radar, it's probably best to not attract global attention.
Most people outside of the US don't even know what "60 Minutes" is. Even fewer have heard about that report. And even fewer care. But everyone has now heard about the outage. This would be the worst possible way of trying to stop the spread of the story.
The more likely scenario is that this was the final straw for some disgruntled employee who decided to pull the plug on the entire thing.
Agree. I just did a quick check and 60 Minutes averages around 10 million viewers. It's not like in 1977 when something 20%+ of the US population was watching that show.
> they leaked information about 20% of the earth's population
This is straight up false. It was scrapers extracting data from public profiles. They already incorporate anti-scraping techniques, so there's not much they can do other than require every one to set their profile to private.
If we're in "tinhat" territory: it seems extremely odd to me that this whistelblower seems to be "blowing the whistle" on the fact that facebook isn't doing enough to control what people are thinking and talking about.
Like...what? "Brave whistelblower comes out showing that facebook isn't doing enough to control what you are thinking!" is sortof arguing past the question. Should facecbook be in charge of deciding what you think?
> this whistelblower seems to be "blowing the whistle" on the fact that facebook isn't doing enough to control what people are thinking and talking about.
That is not at all what the whistleblower is alleging. Facebook already controls what content you are seeing through its news feed algorithm. The parameters to that algorithm are not a 1-dimensional "how much control", but instead uses engagement metrics for what content to show. The whistleblower claims that the engagement optimization, according to facebooks own research, prioritizes emotionally angry/hurtful/divisive content.
They are exercising that power already, they are just explicitly doing so in a way that tears down the trust in society because makes them money, rather than encouraging a less I divisive and more fact based conversation, because that doesn’t make them as much money.
> The outage has pretty much buried that story, and perhaps more importantly, stopped its spread on FB networks.
Buried the news ... which is basically as noteworthy the news that water is still wet. What exactly did she reveal that was not known before, or is it somehow newsworthy that Facebook also knew what everyone else knew? The real news ought to be how that managed to make it to the headlines.
As much as I'd love to imagine FB rage-quitting the internet because people don't seem to appreciate them enough, I'm pretty sure it's a coincidence. Probably has more to do with it being Monday (you don't put big stories on Friday and you sure don't deploy config changes on Friday!) than anything else.
Ah yes, the best way to bury a moral scandal of the kind that usually gets forgotten in a week is to undermine the trust of almost every single user worldwide. This is a very good conspiracy.
I see it as similar to Snowden, in the sense that everybody kind of knew (actually guessed) but now we actually know. It doesn't come as a shock, but it's important information to have since it can be now argued with authority.
The whistleblower revealed that Facebook knows it is bad for society. The documents also show Facebook actively optimizes its algorithms for "bad for society" content because that drives engagement which makes them more money. Furthermore Facebook doesn't do as much content moderation in regions/languages with low usage numbers because it costs more than those users make them. So calls for genocide in Myanmar basically go unchallenged and unmoderated because Facebook doesn't make much money in Myanmar. Sorry genocided minority, you should have been more valuable to Facebook.
> would actually agree to carry out something like this intentionally.
Well, they work for Facebook. In my opinion you would have to have no morals to join that corporations in the first place, so I can imagine such ask would be just another dirty task to do. They seem to love it.
The story that a woman at Facebook doesn't think they're going far enough to control speech they hate and bad-thoughts?
I think Facebook is awful, but her primary complaint seemed to me that she lacked controls for what people like her, you know, the good people have access to prevent anyone else from seeing. That she was powerless to stop users from saying the wrong things. How was her motivation anything but a desire for more authoritarianism? She said she specifically took the job on the condition she could monitor and direct posts to prevent the wrong info from being online, that's the last type of person you want in that position, the one that wants it.
I expect that we're still pretending Facebook is "just a private business", despite it being unlike any in history and that the ties to government are completely benign.
I'm not saying she was wrong in any claim about internal discussions. But, if you can not imagine yourself being on the wrong side of someone like that, you have limited imagination.
Facebook is surprisingly tolerant of controversial subjects. YouTube has gone scorched earth on millions of channels and deleted years of work of many people. Facebook was far more lenient and you could talk about non-official covid information for example where YouTube deleted anything that wasn't official narrative with extreme prejudice. Given how much bad stuff all over the world is happening to sacrifice freedom to get everyone to tow the official line on Covid that is complete science fiction level totalitarianism, I am sure Facebook made some very powerful and determined enemies with its more lenient stance. I was downvoted earlier for saying this was an intentional takedown and deleted my comment, but now I think this could be a full blown William Gibson Neuromancer Cyberpunk level corporate takedown attempt in progress!
She said she wanted FB to do something to stop misinformation and hate speech but what we've seen from Reddit is that "are mRNA vaccines actually safe?" becomes misinformation and "we shouldn't perform elective life-altering surgery on pre-teen children" becomes hate speech. There's not much I applaud Facebook for, but not listening to this woman is one of the few I do.
It also looks like its much deeper than just people not finding the site. Employees are all locked out and there's another story on the front page on HN saying employees are locked out of the building as well.
If you wanted to scrub a lot of the data and nefarious evidence the whistle blower brought out, this would be a great way to do it, under the guise of a simple "employee screw up" cover story.
Its hard for me to think something more nefarious is afoot considering FB's track record with a myriad of other things. At this point, it seems more likely something sketchy is going on and not just some random employee who screwed up and brought down the entire network with a simple change. I would assume there are several layers of decision makers who oversee the BGP records. I have a hard time thinking one person had sole access to these and brought everything down with an innocent change.
FB has too many smart people who would allow a single point of failure for their entire network such that if it goes down, it becomes "a simple error on the part of some random employee". This is not some junior dev who broke the build, its far more serious than that.
"As a result, when one types Facebook.com into a web browser, the browser has no idea where to find Facebook.com, and so returns an error page."
Not quite.
Many DoH servers are working fine. DNS isn't a problem for the browser, but it seems to be a problem for Facebook's internal setup. It's like their proxy configuration is 100% reliant on DNS lookups in order to find backends.
The FB content servers are reachable. It is only the Facebook DNS servers that are unreachable.
Don't take my word for it, try for yourself
www.facebook.com 1 IN A 179.60.192.3 (content)
static.facebook.com 1 IN A 157.240.21.16 (content)
a.ns.facebook.com 1 IN A 129.134.30.12 (DNS)
ping -c1 157.240.21.16 |grep -A1 statistics
--- 157.240.21.16 ping statistics ---
1 packets transmitted, 1 received, 0% packet loss, time 0ms
ping -c1 179.60.192.3|grep -A1 statistics
--- 179.60.192.3 ping statistics ---
1 packets transmitted, 1 received, 0% packet loss, time 0ms
ping -c1 -W2 129.134.30.12 |grep -A1 statistics
--- 129.134.30.12 ping statistics ---
1 packets transmitted, 0 received, 100% packet loss, time 0ms
The browser, i.e., client, here, curl, has an idea where to find Facebook.com
links -dump index.htm
[IMG]
Sorry, something went wrong.
We're working on it and we'll get it fixed as soon as we can.
Go Back
Facebook (c) 2020 . Help Center
grep HTTP index.htm
HTTP/1.1 503 No server is available for the request
Due to DNS being busted, all internal FB services/tooling that they'd use to push DNS config updates are probably completely inaccessible. Someone at FB will have to manually SSH into a production host (assuming they can even identify the right one), and issue some commands to repopulate the DNS records. They'll probably have to do this without any access to internal wikis, documentation, or code.
Keeping those poor network engineers in our thoughts.
Hmm... I'm always reminded of my professor telling me that it's never the fault of who pressed the button, responsibility lay upon who decided to make them able to press a button that can cause such catastrophic issues.
Somebody from my engineering class had an internship at DuPonts main facility/production line. Was implementing something that managed to complety shut down production for an entire shift & cause a large fire, ended up being something in the millions worth of damages from production loss and fire damage.
Intern wasn't even yelled at IIRC. He actually went on to do some very helpful things the rest of the internship. But man, did the person who let an intern be in the position to single handedly cause such a mess get absolutely fucked by his superiors.
So it appears that WhatsApp are in the process of restoring from backup? Why would they need to do that if it was just a DNS issue? And why would the server be accessible while backup restoration was still in progress? I feel like there is going to be a lot more to this story when it all shakes out.
Once the DNS is back up they need to basically reboot every service. Once server one can’t talk to server two, everything is out of sync and they need to resolve this somehow. They probably have mitigation plans for a few data centers going down, but when it’s all of them at once, that’s going to be a huge pain.
Who knows. I use PiHole where all DNS records are cached. Maybe this is the reason why it happens to me. And regards Twitter (obviously), I'm not the only one who is facing this weird behaviour.
Even before E2E - to my knowledge, whatsapp would only store messages until they could be delivered. They never really stored your chats once they made it to their destination - there shouldn't be any "restoring" of backups that brings back messages unless it's just a re-delivery at most. (And honestly, i'd doubt that gets backed up).
If they're restoring from backup that makes sense right? I assume backups are read-only, so deleting messages won't delete them from the backup also. It is sloppy though that you would see anything before the restore was totally done though (including re-deleting messages)
Before messages had unlimited expiry, FB would auto expire them after few weeks. When they announced messages would remain forever, I went back to check and kept scrolling up until my arm hurts and voila! there they are, messages that expired YEARS ago all of a sudden were visible!
>In addition to stranding billions of users, the Facebook outage also has stranded its employees from communicating with one another using their internal Facebook tools. That’s because Facebook’s email and tools are all managed in house and via the same domains that are now stranded.
Thanos snapped his fingers and Zuckerberg vanished with the keys.
My (very large) employer had a worldwide outage a few years ago where a single bad DNS update stopped everything in its tracks (at the time many things were still in our own data centers, now more is in Amazon/etc). It took most of the day to restart everything. But it's not something most people would have noticed like FB. Thankfully I worked in mobile so not involved.
It is hard to balance dogfooding (good) with SPOF (bad), many big companies do get it wrong (AWS with S3, Slack in the recent past) all the time.
It is easy to get it wrong if your company provides internet services that every developer typical depends in their workflows and to keep educating your own developers on how to use them and when not to use your own services.
Although, to be fair, that is kind of like praising the arsonist after he put out the fire he started (which had already smoke-damaged the whole neighborhood).
>Now, here's the fun part. @Cloudflare runs a free DNS resolver, 1.1.1.1, and lots of people use it. So Facebook etc. are down... guess what happens? People keep retrying. Software keeps retrying. We get hit by a massive flood of DNS traffic asking for http://facebook.com
https://twitter.com/jgrahamc/status/1445066136547217413
>Our small non profit also sees a huge spike in DNS traffic. It’s really insane.
https://twitter.com/awlnx/status/1445072441886265355
>This is frontend DNS stats from one of the smaller ISPs I operate. DNS traffic has almost doubled.
https://twitter.com/TheodoreBaschak/status/14450732299707637...
Two of our local mobile operators are experiencing issues with phone calls due to network overload.
https://twitter.com/claroelsalvador/status/14450819333319598...
can't use the phone network to place a call b/c of fb-errors clogging the pipe
https://news.ycombinator.com/item?id=25803010 Signal apps DDoS'ed their own server
Second comment was saying there is no point using Signal if they are down during 2 days. Only a few hours for FB yet but curiously nobody is saying the same :)
Let's hope it's done wrong.
[1] https://datatracker.ietf.org/doc/html/rfc2308#section-7.1
Even if you assume Facebook's own apps have reasonable failure logic, there's all kinds of third-party apps and devices integrating with their API that probably get it wrong. Surprise botnet!
But if the request does not resolve there’s no caching, the next request goes through the entire thing and hits the server again.
Dead Comment
I don't know what facebook's DNS cache expiration interval was, but assume it's 1 day. Now multiply the load on the DNS that those facebook users put by whatever polling interval the apps use.
And then remember what percentage of internet traffic (requests, not bandwidth) facebook, whatsapp, and instagram make up.
It's kindof beautiful.
If you aren’t using exponential backoff algorithms for your reconnect scheme - you should be!
I have a device in the field, only a few thousand total, but we saw issues when our shared cloud would go down and everyone hammered it to get back up.
It's not crazy; people are panicking over Facebook, Instagram and WhatsApp being down and they keep trying to connect to those services. I mean I would panic too if I were social media junky.
(CF decided not to honour them some years ago)
Deleted Comment
The outage has pretty much buried that story, and perhaps more importantly, stopped its spread on FB networks.
That said, I can't see how FB managers and engineers would actually agree to carry out something like this intentionally.
Strongly disagree. The outage has millions of people entering "Facebook" into their search engines. Most engines will conveniently put related news at the top of the search results page. The most recent and widespread Facebook-related news story is about the whistleblower.
Plus everyone has a lot of spare time to read the article now that Facebook and Instagram are down.
The outage didn't bury the story. It amplified it. Any suggestions that Facebook did this on purpose don't even make sense.
With respect I am pretty sure that the most recent and widespread Facebook-related news story is this one.
Holistically I agree that this isn't the kind of distraction Facebook wants, although it tickles me to imagine Mark in the datacenter going Rambo with a pair of wire cutters.
I am seeing 0 news about the whistleblower when I google Facebook. Only outage news.
Unless another disgruntled employee knew it would amplify the story.
But I haven't paid this much attention to Facebook in over a year.
I'm just trying to process that FB is having its historic, all-networks global outage today of all days. And I bet FB would have paid double of whatever this will eventually cost them to make that story go away.
bingo. I don't care whether it's in the realm of tinfoil hat or not, this is the very real effect that this outage has had. By the time Facebook is back up, people on Facebook will be talking about the outage, not about the whistle blower report. Intentional or not, it will certainly be in Facebook's favor.
I suspect Facebook was making some change to their DNS generally, and they made some kind of mistake in deployment that blew up this morning.
Could a pang of morality have struck one of the employees with the keys to the kingdom?
The more likely scenario is that this was the final straw for some disgruntled employee who decided to pull the plug on the entire thing.
Maybe this is just to cover the fact that they leaked information about 20% of the earth's population?
This is straight up false. It was scrapers extracting data from public profiles. They already incorporate anti-scraping techniques, so there's not much they can do other than require every one to set their profile to private.
Deleted Comment
https://www.youtube.com/watch?v=_Lx5VmAdZSI
Like...what? "Brave whistelblower comes out showing that facebook isn't doing enough to control what you are thinking!" is sortof arguing past the question. Should facecbook be in charge of deciding what you think?
That is not at all what the whistleblower is alleging. Facebook already controls what content you are seeing through its news feed algorithm. The parameters to that algorithm are not a 1-dimensional "how much control", but instead uses engagement metrics for what content to show. The whistleblower claims that the engagement optimization, according to facebooks own research, prioritizes emotionally angry/hurtful/divisive content.
Deleted Comment
It's not like this is a new thing. We've been getting [facebook does awful thing] news stories pretty consistently for years now.
Deleted Comment
Buried the news ... which is basically as noteworthy the news that water is still wet. What exactly did she reveal that was not known before, or is it somehow newsworthy that Facebook also knew what everyone else knew? The real news ought to be how that managed to make it to the headlines.
They can either agree to comply with the orders from up above or they face consequences? How is that hard to comprehend?
To me this seems like a million dollar mistake.
A lot. The resulting Wall Street Journal series directly led to the shut down of Instagram for Kids.
It hasn't on the BBC. They're airing both stories.
Well, they work for Facebook. In my opinion you would have to have no morals to join that corporations in the first place, so I can imagine such ask would be just another dirty task to do. They seem to love it.
I think Facebook is awful, but her primary complaint seemed to me that she lacked controls for what people like her, you know, the good people have access to prevent anyone else from seeing. That she was powerless to stop users from saying the wrong things. How was her motivation anything but a desire for more authoritarianism? She said she specifically took the job on the condition she could monitor and direct posts to prevent the wrong info from being online, that's the last type of person you want in that position, the one that wants it.
I expect that we're still pretending Facebook is "just a private business", despite it being unlike any in history and that the ties to government are completely benign.
I'm not saying she was wrong in any claim about internal discussions. But, if you can not imagine yourself being on the wrong side of someone like that, you have limited imagination.
Dead Comment
If you wanted to scrub a lot of the data and nefarious evidence the whistle blower brought out, this would be a great way to do it, under the guise of a simple "employee screw up" cover story.
Its hard for me to think something more nefarious is afoot considering FB's track record with a myriad of other things. At this point, it seems more likely something sketchy is going on and not just some random employee who screwed up and brought down the entire network with a simple change. I would assume there are several layers of decision makers who oversee the BGP records. I have a hard time thinking one person had sole access to these and brought everything down with an innocent change.
FB has too many smart people who would allow a single point of failure for their entire network such that if it goes down, it becomes "a simple error on the part of some random employee". This is not some junior dev who broke the build, its far more serious than that.
Not quite.
Many DoH servers are working fine. DNS isn't a problem for the browser, but it seems to be a problem for Facebook's internal setup. It's like their proxy configuration is 100% reliant on DNS lookups in order to find backends.
The FB content servers are reachable. It is only the Facebook DNS servers that are unreachable.
Don't take my word for it, try for yourself
The browser, i.e., client, here, curl, has an idea where to find Facebook.com Wait...Deleted Comment
Keeping those poor network engineers in our thoughts.
Just a reminder this is not a failure of a single person though and of the organization as a whole and policies in place.
Somebody from my engineering class had an internship at DuPonts main facility/production line. Was implementing something that managed to complety shut down production for an entire shift & cause a large fire, ended up being something in the millions worth of damages from production loss and fire damage.
Intern wasn't even yelled at IIRC. He actually went on to do some very helpful things the rest of the internship. But man, did the person who let an intern be in the position to single handedly cause such a mess get absolutely fucked by his superiors.
https://twitter.com/Pytlicek/status/1445072626729242637
SinglePointOfFailure.NoRedundancies.FB
My (very large) employer had a worldwide outage a few years ago where a single bad DNS update stopped everything in its tracks (at the time many things were still in our own data centers, now more is in Amazon/etc). It took most of the day to restart everything. But it's not something most people would have noticed like FB. Thankfully I worked in mobile so not involved.
It is easy to get it wrong if your company provides internet services that every developer typical depends in their workflows and to keep educating your own developers on how to use them and when not to use your own services.
Although, to be fair, that is kind of like praising the arsonist after he put out the fire he started (which had already smoke-damaged the whole neighborhood).