Readit News logoReadit News
jordanthoms · 11 years ago
Half the internet is down, Amazon.com is flapping, the AWS website doesn't have any assets, and status.aws has a Green checkmark with a little (i), 40 minutes after the problems start?!

I love AWS but they really need to improve their procedures for communicating during outages. If your parent companies' billion dollar site can be affected in any way on the night before black friday, and even your own site is down, and you are not acknowledging that the service is FUBAR - you have a problem.

erichmond · 11 years ago
Exactly this. In our experience Amazon's status page doesn't reflects actual outages we are having, Cloudwatch often sends false positives (systems being down, when in fact, they are not) and SNS messages get lost in the ether.

We've built our systems to recover from failure states once we know there is a problem. AWS's inability to do that reliably is forcing us to own the problem ourself, and as a result, we will probably migrate away to use cheaper boxes in the cloud.

On the positive side, RDS has been solid.

Deleted Comment

felixgallo · 11 years ago
Imagine what kind of conditions would have to exist for Amazon to use a red indicator.
corobo · 11 years ago
Greentick [i]: Amazon AWS is closing its doors forever, we apologise for the inconvenience
cperciva · 11 years ago
Hey, if you're able to load status.aws.amazon.com then clearly it's working!
jordanthoms · 11 years ago
It's just as well they don't use CF for anything critical on the status site...
WestCoastJustin · 11 years ago
UPDATE: RESOLVED according to the AWS status page -- 6:24 PM PST Between 4:12 PM and 6:02 PM PST, users experienced elevated error rates when making DNS queries for CloudFront distributions. The service has recovered and is operating normally. [2]

--

Cloudfront DNS is hosed. Doing a DNS lookup on my cloudfront distribution fails. Many folks on Twitter also see the issue too [1]. Maybe a failed upgrade or something, their whois info was updated today @ 2014-11-26T16:24:49-0800.

  [~]$ host d1cg27r99kkbpq.cloudfront.net
  Host d1cg27r99kkbpq.cloudfront.net not found: 2(SERVFAIL)
[1] https://twitter.com/search?q=cloudfront

[2] http://status.aws.amazon.com/

dice · 11 years ago
The AWS status page not showing anything isn't unusual. They'll probably update in half an hour to describe it as a partial failure then revise it to an "all OK" 10 minutes before it's actually fixed then retroactively downgrade it to a minor quality of service disruption.

Not that I've noticed they tend to lie through their teeth on the status page or anything....

teraflop · 11 years ago
Apparently, "DNS resolution errors" are considered neither a "service disruption" nor a "performance issue", but instead are categorized as an "informational message."
anonymuse · 11 years ago
AWS Status shows Cloudfront DNS issues, which is reflected in our page's assets not loading. Kinda makes me wish we were using something like https://github.com/etsy/cdncontrol/ but that's a fight for another day!

  <title type="text">Informational message: DNS Resolution      errors </title>

  <link>http://status.aws.amazon.com</link>

  <pubDate>Wed, 26 Nov 2014 17:00:39 PST</pubDate>
	   <guid>http://status.aws.amazon.com/#cloudfront_1417050039</guid>

  <description>We are currently investigating increased error   rates for DNS queries for CloudFront distributions.  </description>

anonymuse · 11 years ago
Anecdotally, our site is now loading quickly for me: http://canary.is/.

However, digging the Cloudfront name servers times out intermittently:

  $ dig +short @ns-666.awsdns-19.net cloudfront.net
  ;; connection timed out; no servers could be reached

Essa · 11 years ago
pestaa · 11 years ago
I too like jokes, but saying nothing else leads to superficial discussions.
ejdyksen · 11 years ago
Amazon appears to be up for me, but a number of places are still affected:

- All of Vox Media's properties (The Verge, Polygon, Vox.com, SBNation, etc)

- All of Atlasssian's services (Bitbucket, Jira OnDemand, etc)

- Flowdock

- Instagram

- aws.amazon.com has no assets

Edit: console.aws.amazon.com has no assets, either, so it's also currently worthless.

It's probably worth having a DNS failover strategy for Route53 (if that's what you're using) that doesn't involve the UI on console.aws.amazon.com.

stevekemp · 11 years ago
> It's probably worth having a DNS failover strategy for Route53 (if that's what you're using) that doesn't involve the UI on console.aws.amazon.com.

Which is one of the reasons I setup https://dns-api.com/ - A way of updating Route53 DNS via git hooks.

jbinto · 11 years ago
Amazon.com is up for me, but is entirely missing product images.
dice · 11 years ago
It's flapping in some locations. Seems to be hard down in others. From a service monitoring Cloudfront on various geographically distributed Nagios monitors (times are PST):

  Columbus:

  [11-26-2014 16:46:24] SERVICE ALERT: public-www;CDN - Logo;OK;SOFT;2;HTTP OK: HTTP/1.1 200 OK - 2960 bytes in 0.165 second response time
  [11-26-2014 16:45:34] SERVICE ALERT: public-www;CDN - Logo;CRITICAL;SOFT;1;Name or service not known
  [11-26-2014 16:39:24] SERVICE ALERT: public-www;CDN - Logo;OK;SOFT;3;HTTP OK: HTTP/1.1 200 OK - 2960 bytes in 0.030 second response time
  [11-26-2014 16:38:34] SERVICE ALERT: public-www;CDN - Logo;CRITICAL;SOFT;2;Name or service not known
  [11-26-2014 16:37:34] SERVICE ALERT: public-www;CDN - Logo;CRITICAL;SOFT;1;Name or service not known
  [11-26-2014 16:25:24] SERVICE ALERT: public-www;CDN - Logo;OK;SOFT;2;HTTP OK: HTTP/1.1 200 OK - 2960 bytes in 0.030 second response time
  [11-26-2014 16:24:34] SERVICE ALERT: public-www;CDN - Logo;CRITICAL;SOFT;1;Name or service not known
  [11-26-2014 16:21:24] SERVICE ALERT: public-www;CDN - Logo;OK;SOFT;2;HTTP OK: HTTP/1.1 200 OK - 2960 bytes in 0.066 second response time
  [11-26-2014 16:20:24] SERVICE ALERT: public-www;CDN - Logo;CRITICAL;SOFT;1;Name or service not known

  Portland:

  [11-26-2014 16:49:40] SERVICE ALERT: public-www;CDN - Logo;CRITICAL;SOFT;1;Name or service not known
  [11-26-2014 16:43:40] SERVICE ALERT: public-www;CDN - Logo;OK;SOFT;2;HTTP OK: HTTP/1.1 200 OK - 2960 bytes in 0.148 second response time
  [11-26-2014 16:42:40] SERVICE ALERT: public-www;CDN - Logo;CRITICAL;SOFT;1;Name or service not known
  [11-26-2014 16:39:40] SERVICE ALERT: public-www;CDN - Logo;OK;HARD;3;HTTP OK: HTTP/1.1 200 OK - 2960 bytes in 0.186 second response time
  [11-26-2014 16:21:40] SERVICE ALERT: public-www;CDN - Logo;CRITICAL;HARD;3;Name or service not known
  [11-26-2014 16:20:40] SERVICE ALERT: public-www;CDN - Logo;CRITICAL;SOFT;2;Name or service not known
  [11-26-2014 16:19:41] SERVICE ALERT: public-www;CDN - Logo;CRITICAL;SOFT;1;Name or service not known

  Santa Clara:

  [11-26-2014 16:24:26] SERVICE ALERT: public-www;CDN - Logo;CRITICAL;HARD;3;Name or service not known
  [11-26-2014 16:23:25] SERVICE ALERT: public-www;CDN - Logo;CRITICAL;SOFT;2;Name or service not known
  [11-26-2014 16:22:25] SERVICE ALERT: public-www;CDN - Logo;CRITICAL;SOFT;1;Name or service not known

theophrastus · 11 years ago
One wonders if this is same matter currently afflicting a certain infamous torrent site ("..downtime appears to be a routing issue as the site is still reachable in most parts of the world"): http://torrentfreak.com/the-pirate-bay-goes-down-locally-141...
cjreyes · 11 years ago
Is there a RoR Gem / configuration that will serve assets locally if an external asset host name doesn't resolve or times out?
Hengjie · 11 years ago
There isn't, but there's a frontend library that you can use to add fallbacks to <script> tags:

https://github.com/shinnn/script-fallback-from-urls

tbfrench · 11 years ago
Not a gem, but from jQuery days:

<script> window.jQuery || document.write("<script src='js/jquery-1.10.2.min.js'>\x3C/script>")</script>

jacobsenscott · 11 years ago
You can assign a proc to config.asset_host, so you could easily do it -

  config.asset_host = -> {
    cdn_up? ? "http://mycdn.com" : "http://mydomain.com"
  }

lukeschlather · 11 years ago
It's a decent idea, but it would be better to do client-side. For this sort of event, knowing that the name resolves on the server doesn't give you any confidence it will resolve for the client.

What would be really nice is if you could specify a fallback host in your DNS prefetch, and the browser would make it "just work."

thejosh · 11 years ago
Exactly, if a POP for Australia is down but works in Germany, better to serve this on the client side.
bbnnt · 11 years ago
that'd be genius