2gremlin181 (u/2gremlin181)

2gremlin181 commented on TP-Link Tapo C200: Hardcoded Keys, Buffer Overflows and Privacy evilsocket.net/2025/12/18... · Posted by u/sibellavia

walterbell · 4 days ago

Which AI providers have access to real-time Twitter data?

2gremlin181 · 4 days ago

Genuinely curious, what are some use cases that you require live Twitter data in your LLM for?

2gremlin181 commented on Graphite is joining Cursor cursor.com/blog/graphite... · Posted by u/fosterfriends

2gremlin181 · 4 days ago

IMO this is a smart move. A lot of these next-gen dev tools are genuinely great, but the ecosystem is fragmented and the subscriptions add up quickly. If Cursor aquires a few more, like Warp or Linear, they can become a very compelling all-in-one dev platform.

2gremlin181 commented on I made a downdetector for downdetector's downdetector's downdetector downdetectorsdowndetector... · Posted by u/halgir

2gremlin181 · a month ago

The next step: Point downdector to downdector's downdector's downdector and create a cyclic dependency

2gremlin181 commented on Ask HN: Why are most status pages delayed? · Posted by u/2gremlin181

swiftcoder · 2 months ago

Because for most major sites, updating the status page requires (a significant number of) humans in the loop.

Back when I worked at a major cloud provider (which admittedly was >5 years ago), our alarms would go off after ~3-15 minutes of degraded functionality (depending on the sensitivity settings of that specific alarm). At that point the on call gets paged in to investigate and validates that the issue is real (and not trivially correctable). There was also automatic escalation if the on call doesn't acknowledge the issue after 15 minutes.

If so, a manager gets paged in to coordinate the response, and if the manager considers the outage to be serious (or to affect a key customer), a director or above gets paged in. The director/VP has the ultimate say about posting an outage, but they in parallel consult the PR/comms team to consult on the wording/severity of the notification, any partnership managers for key affected clients, and legal re any contractual requirements the outage may be breaching...

So in a best-case scenario you'd have 3 minutes (for a fast alarm to raise) plus ~5 minutes for the on call to engage, plus ~10 minutes of initial investigation, plus ~20 minutes of escalations and discussions... all before anyone with permission to edit the status page can go ahead and do so

2gremlin181 · 2 months ago

Copying my response over from another comment:

I totally get that, but how hard would it be to actually make calls to your own API from the status page? If it fails, display a vague message saying there might be issues and that you are looking into it. Clearly these metrics and alerts exist internally too. I'm not asking for an instant RCA or confirmation of the scope of the outage. Just stop gaslighting me.

2gremlin181 commented on Ask HN: Why are most status pages delayed? · Posted by u/2gremlin181

kachapopopow · 2 months ago

because these systems are so big and the people who can validate problems might be asleep at the wheel or be pretty far up the chain and it takes time to reach it. most of the spikes on downdetector are often unrelated to the service, but a 3rd party failure.

2gremlin181 · 2 months ago

IMO if you have an endpoint or service on your status page, you most definitely have an oncall rotation for it. Regarding the second point, your service might be down due to an AWS outage. It's an upstream issue and I fully understand that but I should not have to track things upstream by guessing what cloud provider your use. Where do we draw the line too? What if its not AWS but Hetzner or some other boutique provider?

2gremlin181 commented on Ask HN: Why are most status pages delayed? · Posted by u/2gremlin181

knorker · 2 months ago

To add to the reasons others gave: It needs to be correct.

Engineers are working the problem. They have a pretty good understanding of the impact of the outage. Then an external comms person asks for an engineer to proof read the external outage comms. Which triggers rounds of "no, this part is not technically correct" and "I know the internal system scope impact, but not how that maps to external product names you want to communicate".

Sure, it'd be nice if the message "we are investigating an issue with… uh… some products" would come up faster.

2gremlin181 · 2 months ago

I totally get that, but how hard would it be to actually make calls to your own API from the status page? If it fails, display a vague message saying there might be issues and that you are looking into it. Clearly these metrics and alerts exist internally too.

I'm not asking for an instant RCA or confirmation of the scope of the outage. Just stop gaslighting me.