Readit News logoReadit News
laCour commented on Hacker News Has Degraded Service   hn.hund.io/... · Posted by u/iloveplants
koakuma-chan · 4 days ago
"100% Uptime"
laCour · 4 days ago
I'm with Hund. It was only monitoring the cached, unauthenticated news page, which was up throughout the downtime. However, it now monitors the authenticated news landing as well.
laCour commented on Tell HN: HN was down    · Posted by u/uyzstvqs
jonahx · 4 days ago
Is this a mistake by hund, or the configuration of hund by HN?
laCour · 4 days ago
Mistake on our part (Hund) for not monitoring authentication. This page is unofficial and was made by a co-founder several years ago.
laCour commented on Tell HN: HN was down    · Posted by u/uyzstvqs
zipy124 · 4 days ago
https://hn.hund.io/ Is a status page, no idea if official or not, but it didn't register here for some reason.

I didn't read the post text, it's identified there haha, my bad! I wish the text post text wasn't grey, I gloss over it too easily.

laCour · 4 days ago
This was monitoring the unauthenticated news page, which is why it didn't catch it. It now monitors authentication as well. It is not official, and was made by a co-founder years ago.
laCour commented on Tell HN: HN was down    · Posted by u/uyzstvqs
laCour · 4 days ago
I'm with Hund. Our hn.hund.io page did not catch this because it was requesting the cached, unauthenticated page. It now monitors authentication as well.
laCour commented on Root cause analysis: significantly elevated error rates on 2019‑07‑10   stripe.com/rcas/2019-07-1... · Posted by u/gr2020
kortilla · 6 years ago
The health check said it was ok. How would they know it needed to be recovered?

The fault was the bad health check. Not the process.

laCour · 6 years ago
They only just clarified that monitoring was in place and they were reporting as healthy. See the comments above.
laCour commented on Root cause analysis: significantly elevated error rates on 2019‑07‑10   stripe.com/rcas/2019-07-1... · Posted by u/gr2020
NikolaeVarius · 6 years ago
In many HA setups, you're supposed to not have to care if any single thing goes down because it should auto recover

The article said that the node stalled in a way that was unforseen which may have caused standard recovery mechanisms to silently fail.

laCour · 6 years ago
Right, but they didn't recover speedily. To have the cluster in such a state for so long sounds like poor monitoring to me because this can knowingly interfere with an election later.
laCour commented on Root cause analysis: significantly elevated error rates on 2019‑07‑10   stripe.com/rcas/2019-07-1... · Posted by u/gr2020
laCour · 6 years ago
"[Four days prior to the incident] Two nodes became stalled for yet-to-be-determined reasons."

How did they not catch this? It's super surprising to me that they wouldn't have monitors for this.

laCour commented on SiriusXM to Acquire Pandora   blog.pandora.com/us/siriu... · Posted by u/prostoalex
owaislone · 7 years ago
The only amazing music streaming product I've seen was already acquired and killed off: Rdio.

It was heads and shoulders above the rest, especially when it came to international content and recommendations. Their recommendations were amazing.

laCour · 7 years ago
If I'm not mistaken, Rdio's engineering team helped build Pandora's on-demand service. I've switched to it from Spotify after using Rdio in the past. Its recommendations feel similar to Rdio's, but the interface is a bit lacking. There aren't general recommendations. Instead, you must build a playlist and then you can add a set of recommended songs to the playlist. Of course, there are also radio stations.

u/laCour

KarmaCake day586July 4, 2011View Original