Readit News logoReadit News
Posted by u/uyzstvqs 2 months ago
Tell HN: HN was down
- HN errored on all authenticated requests with 502 Bad Gateway. It did still respond to a limited amount of unauthenticated requests with presumably cached pages, which did not get updated. The last post on /newest claimed "0 minutes ago", but was actually much older (1:32:57 PM GMT) and not the newest post.

- This status page actually identified the outage: https://hackernews.onlineornot.com/ - Pages by Hund and Statuspal did not show the outage.

- The last post before the outage was https://news.ycombinator.com/item?id=46301823 (1:39:59 PM GMT). The last comment was https://news.ycombinator.com/item?id=46301848 (1:41:54 PM GMT).

- There was an average of ~4 seconds per comment just prior to the outage. Based on this, HN likely went down at 1:41:58 PM GMT.

dang · 2 months ago
Yes, sorry! We're investigating, but my current theory is we got overloaded because I relaxed some of our anti-crawler protections a few days ago.

(The reason I did that is that the anti-crawler protections also unfortunately hit some legit users, and we don't want to block legit users. However, it seems that I turned the knobs down too far.)

In this case, though, we had a secondary failure: PagerDuty woke me up at 5:24am, I checked HN and it seemed fine, so I told PagerDuty the problem was resolved. But the problem wasn't resolved - at that point I was just sleeping through it.

I'll add more as we find out more, but it probably won't be till later this afternoon PST.

Edit: later than I expected, but for those still following, the main things I've learned are (1) pkill wasn't able to kill SBCL this time - we have a script that does that when HN stops responding, but it didn't work, so we'll revise the script; and (2) how to get PagerDuty not to let you go back to sleep if your site is actually still down.

shlomo_z · 2 months ago
Crazy that Dang literally manages HN in his sleep!

We all knew that but I haven't seen any confirmation before this.

easterncalculus · 2 months ago
I like hacker news but I don't think this site is worth getting paged over lol
dang · 2 months ago
failing to manage HN in my sleep is more like it
qingcharles · 2 months ago
I was today years old when I found out Dan sleeps.
sizzle · 2 months ago
What if Dang made an AI agent of himself for when he sleeps?
utopcell · 2 months ago
By demonstration he didn't.
xandrius · 2 months ago
Hey dang, don't worry. It's just a site for reading articles and reacting to them.

Enjoy your deserved sleep and if for a couple of hours it's down, so be it.

Thanks for your continued service!

powvans · 2 months ago
100%

Though I will say, HN is a pretty great source of information about major outages like the recent AWS and Cloudflare issues. I had a moment this morning where I thought, oh, is there a larger issue and then, oh, HN is down, huh, the next option is so far down my list that it's going to take me a moment to think of it.

I hope that serves as a testament to how great this site and the community is. Thanks for all your hard work keeping it that way!

neilv · 2 months ago
Maybe it would be fine if ops alerts were silenced during normal US sleeping hours?

HN is important, but unlikely much harm could be done before morning.

(Source: Lost a lot of sleep at one place, enough to realize that sleep interruption and deficit has significant costs.)

Imustaskforhelp · 2 months ago
I was personally worried if there was some major outage of the whole world or something the first time hackernews didnt work because I didnt expect hackernews to go down but rather, something even more catastrophic than aws going down must happen (because we see major cloud outage posts)

https://downforeveryoneorjustme.com/hacker-news

This website had many instances of reports, the last I saw were 52 reports in only a short frame of time, the maximum reports on this are 118 it seems.

> In this case, though, we had a secondary failure: PagerDuty woke me up at 5:24am, I checked HN and it seemed fine, so I told PagerDuty the problem was resolved. But the problem wasn't resolved - at that point I was just sleeping through it.

Its okay I suppose, have you figured out who is crawling hackernews so much tho, was it a ddos attack or an AI company trying to get data, doesn't hackernews support an api and I am sure that there are datasets for it too so Its interesting why they might crawl but we all know the reasons why as they have been discussed here.

Rooster61 · 2 months ago
No apology needed. We all needed to stop procrastinating anyways :)
tsoukase · 2 months ago
During the last week my IP was banned for unknown reason. Glad to hear it might not be a problem from my side.
dang · 2 months ago
Yes, sorry! This is the problem - we don't want to block legit users, but if we loosen the bolts, we get flooded.

If you browse HN while logged in, that should immunize you against this happening. Also, if it does happen again, you can unban your IP as described at https://news.ycombinator.com/newsfaq.html. But you have to do that from a different IP address, of course.

If those things don't work, email hn@ycombinator.com and we'll get it sorted.

andy_ppp · 2 months ago
I’d love to know more about what running a site like HN involves, would be great to get a write up of what it’s like running something like this at this scale (and what kind of traffic you guys get)!
alwa · 2 months ago
I can’t put my finger on anything within the last decade, but I seem to recall it running in something close to its current form on a single core on a single server for a long time:

https://news.ycombinator.com/item?id=5229522

Re: traffic, dang said (2022):

https://news.ycombinator.com/item?id=33454140

I took it as a good reminder that the hard part is the human part: that high-overhead features and UI fripperies are nice but not necessary (or sufficient) to keep a community healthy and vibrant over the decades.

(And on the subject of the human side, if you didn’t catch Anna Wiener’s 2019 profile, it’s here:

https://www.newyorker.com/news/letter-from-silicon-valley/th... )

giancarlostoro · 2 months ago
The transparency is deeply appreciated by me and others. We don't pay to keep HN on, so we cannot complain. Thank you and the rest of the team for all you do to give us a corner of the internet that is quite 'different' from the rest of the wild west that is the web.
rldjbpin · 2 months ago
> The reason I did that is that the anti-crawler protections also unfortunately hit some legit users, and we don't want to block legit users.

it is a shame that it needs to be this way. as a lurker who doesn't stay logged in nor use incognito mode, i have seen "Sorry" page way too often, even when opening the "past" page from the homepage.

truly hope you find a solution that reduces friction for all. personally, it is back to "Sorry" situation for now.

PS: for others facing a similar situation, it all disappears after logging in, which has been the most reliable solution thus far.

dang · 2 months ago
Yes, and I'm sorry. We do our best but it's both a hard problem and a moving target.
mmooss · 2 months ago
In a situation like this one, good crisis leadership is essential. dang, HN will help you with tips from vast collected experience (please chip in):

1. Blame: The first thing to do is to point the finger. That doesn't mean analysing the technical issue, which can delay this step and limit your options, but figuring out who is politically easiest to blame. Often, that's the new guy, but outside contractors and vendors without good connections are also a common solution. Even if you are technically responsible for hiring them, you can always push them under the bus with a little skill. This small sacrifice helps unify, focus, and motivate the rest of the team.

2. Emotion: Inject your emotion into the situation and make that the implicit, but indisputable priority. Particularly, outrage and anger - This is completely _____. These people are utterly _____ (I'd use all caps, but that's not allowed on HN). Make sure everyone's attention is over their shoulder, on your emotion, and infect the team with it. Threats are an effective tool here - this is a crisis, and anyone who is calm is not emotionally engaged. Otherwise, they won't care enough about this problem - without you driving them, they probably wouldn't care much at all. Anyway, you don't have time for niceties like empathy or even basic respect.

3. Speed: Respnsiveness to stakeholders is very important. People need answers now. Give them answers they want to hear, outcomes they will be comfortable with. Don't worry if different groups hear different things. Your team will find a way to make it all work - that's their job.

4. Communication: Good communication is essential. Make sure you clearly tell your team what they should be doing; repeat it several times to prevent misunderstanding. Especially people with experience can have minds of their own; keep them on track. The situation is a crisis so you can't take any risks; stay on top of them and everything they do, and give input if you're not certain they are doing exactly what you would be doing.

5. Victimhood: Find a way to turn the tables: Make it about you, and how you're the victim here, and feed the fire with more outrage. With this and outrage, nobody will undermine the team by challenging your ideas or authority, which is the most essential component of a successful outcome. Remember, without you this all falls apart.

Have I missed anything?

yearolinuxdsktp · 2 months ago
Engagement: make sure that every member of the team is either on the incident bridge or has dropped what they are doing to watch you diagnose the problem. The more eyes on the problem, the more awareness of the pain will be absorbed by all. If members need to leave to get food or put children to bed, tell them to order delivery and to ask their spouse to do their job. Demonstrate human touch by allowing them to turn off camera while they are eating.

Comprehensiveness: propose extreme, sweeping solutions, such as a lights-out restart of all services, shutting down all incoming requests, and restoring everything to yesterday's backup. This demonstrates that you are ready to address the problem in a maximally comprehensive way. If someone suggests a config change rollback, or a roll-forward patch, ask them why are gambling company time with localized changes, and ask them why are they willing to gamble company time on technical analysis?

Root Cause Analysis Meeting: spend the entire meeting time rehashing the events, pointing fingers and assigning blame. Be sure to mention how the incident could've been over sooner if you just restarted and rolled back every single thing. Be sure to demonstrate out-of-the-box thinking by discussing unrealistic grandiose solutions. When the time is up, run the meeting over by 30 minutes and force all to stay while realistic solution ideas are finally discussed in overtime. This makes it clear to the team that nothing is more important than this incident's RCA--their time surely is not. If someone asks to tap out to pick their kids up after school, remind them that they are making enough money to call them an Uber.

Alerting: be sure to identify anything remotely resembling leading indicators, and add Critical-level wake-you-up alerts with sensitive thresholds for those indicator. Database exceeding 50% CPU? Critical! Filesystem queue length exceeding 5? Critical! Heap usage over 50%? Critical! 100 errors in one minute on a 100000 requests per minute service? Critical! Single log line indicating DNS resolution failure anywhere in the system? Critical! (What if AWS's DNS is down again?) Service requests rate 10% higher than typical peak? Critical! If anyone objects to such critical alerts, ask them why do they want to be responsible for not preventing the next incident?

maxloh · 2 months ago
Frankly, I don't understand why someone would even try to crawl Hacker News.

There is an official dump which doesn't even require parsing HTML at all: https://console.cloud.google.com/marketplace/details/y-combi...

dang · 2 months ago
These are not, er, experienced crawlers.

https://www.youtube.com/watch?v=Sbpl3ywNlpA#t=56s

showcaseearth · 2 months ago
Short lived and driven by good intentions– all's good. Thanks again for keeping this thing going!
bicepjai · 2 months ago
Even after providing firebase endpoint, crawlers come to the site ?
Bender · 2 months ago
Most crawlers have no concept of what that is. They will follow links to this site and then follow links out of this site even after being told not to [1]. The majority of crawlers follow zero rules, RFC's, etc... The few platforms that do follow standards and rules are akin to a law abiding citizen in Mos Eisley.

[1] - rel="nofollow"

dang · 2 months ago
Oh my god. It's the crawlopalypse.
busymom0 · 2 months ago
Unfortunately, the firebase API is very bad as they even acknowledge that in their github page.
8cvor6j844qw_d6 · 2 months ago
> anti-crawler protections

Sometimes I could not open the comment section, receiving a blank page with "... We're sorry" or something along these lines when opening from new private window. It works when opening normally.

Logging in on the private window seems to resolve the issue. Can you take a look on this if possible?

dang · 2 months ago
Best to email your IP address to hn@ycombinator.com so we can see if it's blocked.
nottorp · 2 months ago
Can't speak for others, but I'm sure i'll be pretty fine if no one gets woken up if HN is down...

Of course, they'd better restore service after they wake up naturally, because I need my HN dose. But it's not worth losing sleep over it.

QuantumNomad_ · 2 months ago
> the anti-crawler protections also unfortunately hit some legit users, and we don't want to block legit users

Was the blocking returning “Sorry.” instead of any page content? A couple of days ago there was a few hours where when I’d go to HN I could load the main page as a non-logged in user. But if I tried to log in I would get “Sorry.” instead. I also got the sorry message if I tried to click on user profiles of other people and a few other pages.

I am assuming that the reason I could see the front page itself and discussions on posts on the front page is that they were in a shared cache for non-logged in users, but that when I clicked on some pages like some random user pages those were not in cache and hit the origin server and it blocked those with “Sorry.” like it did for log-in attempts.

I also tried to go to the unblock IP page, but that one also returned “Sorry.”

For a while I was scratching my head wondering if I had gotten some malware on one of my computers that was aggressively making requests to HN, and that I had become IP banned because of that. Since I think my actual request rate from browsing and commenting should be pretty average. I read HN a lot, but not that much :p

Later in the day, or the next day, things were back to normal and I could log in again. Presumably after those anti-crawler protections had been relaxed again.

dang · 2 months ago
> Was the blocking returning “Sorry.”

> Presumably after those anti-crawler protections had been relaxed again

Yup and yup. Apologies for the inconvenience! If it happens again you're welcome to email us at hn@ycombinator.com with your IP and we'll unblock it for you.

echelon · 2 months ago
I didn't realize you were carrying the pager too! Kudos!
malwrar · 2 months ago
I feel such a sense of kinship for anyone who carries a pager, almost 7 years at my current role doing it. Super cool that dang is among our number :)
walrus01 · 2 months ago
Just out of curiosity, if HN is still running on one physical system, what does a daily or weekly traffic chart look like for the switch port facing it?
irishcoffee · 2 months ago
> The reason I did that is that the anti-crawler protections also unfortunately hit some legit users

How does this happen?

Bender · 2 months ago
How does this happen?

Not the person you are asking. Bot operators have an incentive to make crawlers look as much like a human as possible so they do not get blocked. Some of them fail miserably and some nearly succeed. That makes it trivial to accidentally block a real person. I am personally fine with that given I do not pay for this site and have no SLA or contract with it.

ohhnoodont · 2 months ago
Last week if you are using a VPN + a browser that limits fingerprinting, you were likely to see error messages accessing HN.
pjc50 · 2 months ago
Every filter process has false positives and false negatives, especially when crawlers are trying to fake their status.
shmeeed · 2 months ago
Looking forward to the post mortem. :)
michelsedgh · 2 months ago
dang
_shadi · 2 months ago
> anti-crawler protections

what type of protections are used on HN? rate-limiting? ip range blacklist?

stmw · 2 months ago
dang - just to say, we've all done it...
altairprime · 2 months ago
Decades ago I had to write a Perl script to auth to the site for proper downtime checking. Some things never change :) Good luck with the triage.
racl101 · 2 months ago
dang!
Elfener · 2 months ago
I got stuck in an infinite loop.

Try opening HN -> it's down, better check HN to see everyone talking about a major website being down -> Try opening HN -> loop

neom · 2 months ago
Yeah me too. Wake up -> HN down -> That's weird, oh well it's usually only down for a few minutes -> I should check if HN is still down -> That's weird, oh well it's usually only down for a few minutes -> I should check if HN is still down -> loop.

That was a few hours ago. I'm glad this loop is broken.

squeefers · 2 months ago
sounds very much like an evil social media dopamine feedback loop. ironic given everyone on HN is so anti social media.... its clearly only bad for kids though i should add, silly of me to exclude such a detail
HPsquared · 2 months ago
Sometimes I'll catch myself absentmindedly reopening the browser and checking two or three front pages, seconds after having just checked them and closed the browser.
notachatbot123 · 2 months ago
That's a sign of addiction and I highly recommend changing your behaviour towards those pages!
Rendello · 2 months ago
Funny, I don't seem to need an outage to get stuck in a HN loop...
mustak_im · 2 months ago
I woke up and was wondering if I’ve just woken up in hell!
ErroneousBosh · 2 months ago
Not just me then?

"Shit, HN is down! Hm, I wonder if there's anything about it on HN?"

until stack overflow occurs.

Imustaskforhelp · 2 months ago
Yeah, I had assumed something very major must have happened for HN to go down, lmao and I even asked in some linux discord server regarding it asking if there is a major outage as hackernews is down
RJIb8RBYxzAMX9u · 2 months ago
HN is how I discover whether other sites are down or not, so it serves a critical function, so of course I check it frequently.

/s

manbitesdog · 2 months ago
TIL I have a "open Hacker News" hand reflex
ectospheno · 2 months ago
I learn more reading the comments here than anywhere else. Thanks everyone for my addiction.
AndrewKemendo · 2 months ago
It just reinforces for me that addiction is a human problem not a problem with technology

I know dang basically works tirelessly to not change the format in order to not induce those addictive patterns

but yet here we all are

chistev · 2 months ago
It's a website with the smartest people in the world. The level of conversations here are unrivaled in internet communities.

It's understandable to be addicted. Lol.

I visit this place multiple times a day.

dzink · 2 months ago
This one is at least healthy-ish for the mind. I’d much rather hacker news than any other news. Social Media is an emotional rage-bait cesspool these days. If it’s not for Hacker News those of us who abstain from the rest would be living in the dark.
PurpleRamen · 2 months ago
But, would the addiction become worse if HN changed, or would there be a point where they could cure it?
directmusic · 2 months ago
I'm glad I'm not the only one. If I type 'n' into any browser it autocompletes to HN.
embedding-shape · 2 months ago
Save typing hundreds of letters per day, and replace about:newtab with news.ycombinator.com, now you can just do CTRL+T :)
geocrasher · 2 months ago
I had the same thing for Slashdot.org for many, many years. Both the reflex and the browser autocomplete. I still miss the old /. It was like HN + Hackaday + Usenet.
1shooner · 2 months ago
If you're looking to put the brakes on that, I've used LeechBlock to add a 5-second timer to opening a new HN window (along with other block schedules). The timer even fails if it loses focus, so it really helps slow you down.

https://www.proginosko.com/leechblock/

embedding-shape · 2 months ago

    echo '127.0.0.1 news.ycombinator.com' | sudo tee -a /etc/hosts
Does the trick as well :) For bonus points (and so you can't workaround it with your phone), do it on your router/switch instead.

You'll still open new tabs and go to HN, but you'll be reminded quickly, and every day can be downtime day \o/ (for you, personally)

mgarciaisaia · 2 months ago
I've made https://deja.de.hueve.ar/hn so it snapshots the frontpage once per day - that way, I now there won't be new updates during the rest of the day, and the dopamine addiction goes down.
squeefers · 2 months ago
so youve got the willpower to do something about it but not enough to just stop doing it?
jstummbillig · 2 months ago
I say. Vibe coded 4 apps once I got past that, on my way to half a billion in ARR already.
wincy · 2 months ago
I’ve turned on no procrast mode and set it to ten minutes per hour. Helped me a lot!
HanClinto · 2 months ago
What are you using to control this?
thesurlydev · 2 months ago
Same! Right there with "every day must begin with coffee"
lysace · 2 months ago
⌘-T, N, <RET>

Did it like 5 times during that 1h-ish outage. :(

cbracketdash · 2 months ago
There is a noprocrast feature in your settings to specify how long you can stay on for a single session and the frequency at which you can view HN. Super helpful!
ChrisMarshallNY · 2 months ago
So do I, but it was such a shock that I just passed out, and when I woke up, it was back up.
ZuoCen_Liu · 2 months ago
Admirable~
locknitpicker · 2 months ago
> TIL I have a "open Hacker News" hand reflex

You mean it's not your homepage?

yigithan · 2 months ago
I stopped using google.com for Internet access check, I now use HN.
ErroneousBosh · 2 months ago
Do you log into things and reflexively type "ls", too?
kevin061 · 2 months ago
I did not know how addicted I was to HN until today lol
nottorp · 2 months ago
What? You mean you ... close the HN tab?
selectnull · 2 months ago
I already knew that. :)
numpad0 · 2 months ago
sosodev · 2 months ago
Yes, and I'm a little ashamed to admit my morning routine wasn't the same without it.
fedreg · 2 months ago
This was more impactful to my day than the last AWS and CloudFlare outages...
messe · 2 months ago
At least during those outages I could procrastinate on HN.
al_borland · 2 months ago
Is this still a valid account for HN status? It says it’s the official one, but with the changes at Twitter to no longer show chronological feeds (at least for users that aren’t logged in), it’s rather useless. The top 5 listed post (for me) are seemingly random from 2014 - 2022.

https://x.com/HNStatus

Is there a better place to check, beyond a basic down detector that may provide more insight or signal that the outage is acknowledged?

alexfoo · 2 months ago
https://xcancel.com/HNStatus only uses chronological ordering (after any pinned tweets) and that has the last message 12 Dec 2023.

(Basically whenever you see an x.com link just change it to xcancel.com and avoid the nonsense.)

dang · 2 months ago
We post there when we know we're down and it may take more than a few minutes. But in this case we didn't know! https://news.ycombinator.com/item?id=46303196
FuriouslyAdrift · 2 months ago
Only way I have figured out how to to change the "Following" sort order back to chronological is from the mobile app: click the down arrow on the "Following" tab. Change the sort from "popular" to "most recent."

Seems to reset it on the web view, too.

al_borland · 2 months ago
It sounds like this would only work for logged in users.
zipy124 · 2 months ago
https://hn.hund.io/ Is a status page, no idea if official or not, but it didn't register here for some reason.

I didn't read the post text, it's identified there haha, my bad! I wish the text post text wasn't grey, I gloss over it too easily.

laCour · 2 months ago
This was monitoring the unauthenticated news page, which is why it didn't catch it. It now monitors authentication as well. It is not official, and was made by a co-founder years ago.
lagniappe · 2 months ago
This site said HN was fine and green the entire time it was down.
liampulles · 2 months ago
Smart. Have to use that error budget before year end...
dylan604 · 2 months ago
I always hated the late use-it-or-loose-it at the end of the year where you end up buying the things that were denied requests from earlier in the year. You just cost me half a year of using the damn thing.
wiredpancake · 2 months ago
This problem is really bad at my work.

We commonly run into finance issues about half way through the year. We get to the point where 10x HDMI cables get declined from Finance and we get reprimanded for not tracking where each HDMI or Ethernet Cable go. Near to the end of the year, the budget refreshes and finance (without consultation to us) ends up buying a bunch of random stuff.

"Guys, we brought 11 iPad Minis that need to be setup"

Oh so we can also get the HDMI cables now?

"No sorry, we just spent all the remaining money, have you audited the cables recently?"

wavemode · 2 months ago
When it was down my thought was "damnit, I'll actually have a productive workday now."
tzs · 2 months ago
Next time you can avoid that fate by opening HN in a private browsing (or whatever your browser calls its equivalent) window. This outage, like the vast majority of HN outages, only affected logged in requests.

I suppose you could also just clear your HN cookies in regular browsing window, but then when they fix it you'd have to log in again.

dpoloncsak · 2 months ago
Huh. Dunno why, but when it failed on Firefox I tried Chrome, and it worked. I wrote it off as a Mozilla issue, but this would better explain that I think
neom · 2 months ago
Is it my imagination or did they used to automatically serve you a logged out page when it was down?
ycombinator_acc · 2 months ago
I couldn't access it in private or regular.
dwa3592 · 2 months ago
hahaha
ortusdux · 2 months ago
rozenmd · 2 months ago
Interestingly it stayed up if you weren't logged in.
jedberg · 2 months ago
If you aren't logged in you get a cached version from the CDN/cache. Reddit works the same way.
Izkata · 2 months ago
Not completely, I'm not logged in on my work laptop and it was only working some of the time (and not like some pages were cached and some weren't, I was refreshing the same page and sometimes it worked and sometimes not).
bryanrasmussen · 2 months ago
also went down if you went to login, and people's individual pages were also down. So as far as I saw the front page was up as long as you were not logged in, however I'm not sure if that wasn't just luck of the draw, I had one experience where it looked like maybe the front page was sometimes down for not logged in users as well.

on edit: ok others pointed out it was cached pages I saw. explains it.

smallerize · 2 months ago
That only worked for a while, eventually I couldn't load comment pages even logged out.
davnicwil · 2 months ago
that'll be because it's served from cache when you're not logged in