Readit News logoReadit News
tj_591 · 4 months ago
Hi all, Tushar from Docker here. We’re sorry about the impact our current outage is having on many of you. Yes, this is related to the ongoing AWS incident and we’re working closely with AWS on getting our services restored. We’ll provide regular updates on dockerstatus.com .

We know how critical Docker Hub and services are to millions of developers, and we’re sorry for the pain this is causing. Thank you for your patience as we work to resolve this incident. We’ll publish a post-mortem in the next few days once this incident is fully resolved and we have a remediation plan.

freedomben · 4 months ago
Part of me hopes that we find out that Dynamo DB (which sounds like was the root of the cascading failures) is shipped in a Docker image which is hosted on Docker Hub :-D
tj_591 · 3 months ago
We’ve published an incident report outlining what happened and the steps we’re taking to strengthen resilience in the face of upstream service interruptions. - https://www.docker.com/blog/docker-hub-incident-report-octob...
tonyabracadabra · 4 months ago
pls bring it back

Dead Comment

atymic · 4 months ago
reader_1000 · 4 months ago
> We have identified the underlying issue with one of our cloud service providers.

Isn't it everyone using multiple cloud providers nowadays? Why are they affected by single cloud provider outage?

lvncelot · 4 months ago
I think more often than not, companies are using a single cloud provider, and even when multiple are used, it's either different projects with different legacy decisions or a conscious migration.

True multi-tenancy is not only very rare, it's an absolute pain to manage as soon as people start using any vendor-specific functionality.

jelder · 4 months ago
No, that's pretty rare, and generally means you can't count on any features more sophisticated than VMs and object storage.

On the other hand, it's pretty embarrassing at this point for something as fundamental as Docker to be in a single region. Most cloud providers make inter-region failover reasonably achievable.

roywiggins · 4 months ago
You can be multi-cloud in the sense that you aren't dependent on any single provider, or in the sense that you are dependent on all of them.
postexitus · 4 months ago
Not only they are not using multiple cloud providers, they are not using multiple cloud locations.
rcxdude · 4 months ago
Because it's hard enough to distribute a service across multiple machines in the same DC, let alone across multiple DCs and multiple providers.
pmontra · 4 months ago
Because even if service A is using multiple cloud providers not all the external services they use are doing the same thing, especially the smallest one or the cheapest ones. At least one of them is on AWS East-1, fails and degrades service A or takes it down.

Being multi-cloud does not come for free: time, engineers, knowledge and ultimately money.

DiggyJohnson · 4 months ago
Multi cloud is not nearly as trivial as often implied to implement for real world complex projects. Things get challenging the second your application steps off the happy path
wredcoll · 4 months ago
> Isn't it everyone using multiple cloud providers nowadays? Why are they affected by single cloud provider outage?

No? I very much doubt anyone is doing that.

walkabout · 4 months ago
> Isn't it everyone using multiple cloud providers nowadays?

Oh yes. All of them, in fact, especially if you count what key vendors host on.

> Why are they affected by single cloud provider outage?

Every workload is only on one cloud. Nb this doesn’t mean every workflow is on only one cloud. Important distinction since that would be more stable.

madisp · 4 months ago
they are using multiple cloud providers, but judging by the cloudflare r2 outage affecting them earlier this year I guess all of them are on the critical path?
nobleach · 4 months ago
Looking at the landscape around me, no. Everyone is in crisis cost-cutting, "gotta show that same growth the C-suite saw during Covid" mode. So being multi-provider, and even in some cases, being multi-regional, is now off the table. It's sad because the product really suffers. But hey, "growth".

Dead Comment

ic4l · 4 months ago
This broke our builds since we rely on several public Docker images, and by default, Docker uses docker.io.

Thankfully, AWS provides a docker.io mirror for those who can't wait:

  FROM public.ecr.aws/docker/library/{image_name}
In the error logs, the issue was mostly related to the authentication endpoint:

https://auth.docker.io → "No server is available to handle this request"

After switching to the AWS mirror, everything built successfully without any issues.

CamouflagedKiwi · 4 months ago
Mild irony that Docker is down because of the AWS outage, but the AWS mirror repos are still running...
kerblang · 4 months ago
Also, docker.io is rate-limited, so if your organization experiences enough growth you will start seeing build failures on a regular basis.

Also, quay.io - another image hoster, from red hat - has been read-only all day today.

If you're going to have docker/container image dependencies it's best to establish a solid hosting solution instead of riding whatever bus shows up

pploug · 4 months ago
Rate limits are primarily applied to unauthenticated users, open source projects and business accounts have none/much higher tresholds
suriya-ganesh · 4 months ago
based on the solution, it seems like it is quite straightforward to switchover
firloop · 4 months ago
I wasn't able to get this working, but I was able to use Google's mirror[0] just fine.

Just had to change

    FROM {image_name}
to

    FROM mirror.gcr.io/{image_name} 
Hope this helps!

[0]: https://cloud.google.com/artifact-registry/docs/pull-cached-...

ic4l · 4 months ago
We tried this initially

  FROM mirror.gcr.io/{image_name}
We received

  failed to resolve source metadata for mirror.gcr.io/
So it looks like these services may not be true mirrors, and just functioning as a library proxy with a cache.

If you're image is not cached on one of these then you may be SOL.

geostyx · 4 months ago
public.ecr.aws was failing for me earlier with 5XX errors due to the AWS outage: https://news.ycombinator.com/item?id=45640754
anon7000 · 4 months ago
I manage a large build system and pulling from ECR has been flaking all day
KronisLV · 4 months ago
I guess people who are running their own registries like Nexus and build their own container images from a common base image are feeling at least a bit more secure in their choice right now.

Wonder how many builds or redeployments this will break. Personally, nothing against Docker or Docker Hub of course, I find them to be useful.

yandie · 4 months ago
It's actually an important practice to have a docker image cache in the middle. You never know if an upstream image is purged randomly from docker, and your K8s node gets replaced, and now can't pull the base image for your service.

Just engineering hygiene IMO.

koolba · 4 months ago
> You never know if an upstream image is purged randomly from docker, and your K8s node gets replaced, and now can't pull the base image for your service.

That doesn’t make sense unless you have some oddball setup where k8s is building the images you’re running on the fly. Theres no such thing as “base image” for tasks running in k8s. There is just the image itself and its layers which may come from some other image.

But it’s not built by k8s. Its be built in whatever is building your images and storing I. Your registers. That’s where you need your true base image caching.

tom1337 · 4 months ago
We are using base images but unfortunately some github actions are pulling docker images in their prepare phase - so while my application would build, I cannot deploy it because the CI/CD depends on dockerhub and you cannot change where these images are pulled from (so they cannot go through a pull-through cache)…
roryirvine · 4 months ago
My advice: document the issue, and use it to help justify spending time on removing those vestigial dependencies on Docker asap.

It's not just about reducing your exposure to third parties who you (presumably) don't have a contract with, it's also good mitigation against potential supply chain attacks - especially if you go as far as building the base images from scratch.

enigmo · 4 months ago
mirrors can be configured in dockerd or buildkit. if you can update the config (might need a self-hosted runner?) it’s a quick fix - see https://cloud.google.com/artifact-registry/docs/pull-cached-... for an example. aws and azure are similar.
Sphax · 4 months ago
We run Harbor and mirror every base image using its Proxy Cache feature, it's quite nice. We've had this setup for years now and while it works fine, Harbor has some rough edges.
thephyber · 4 months ago
I came here to mention that any non-trivial company depending on Docker images should look into a local proxy cache. It’s too much infra for a solo developer / tiny organization, but is a good hedge against DockerHub, GitHub repo, etc downtime and can run faster (less ingress transfer) if located in the same region as the rest of your infra.
nusl · 4 months ago
Currently unable to do much of anything new in dev/prod environments without manual workarounds. I'd imagine the impact is pretty massive.

Asside; seems Signal is also having issues. Damn.

cebert · 4 months ago
I’m not sure that the impact will be that big. Most organizations have their own mirrors for artifacts.
ai-onehealth · 4 months ago
Yes I noticed Signal being down too
yread · 4 months ago
That is nothing compared to how good i feel about not using containers at all.
bombcar · 4 months ago
You don’t want a Rube Goldberg contraption doing everything?

So not agile!

jsmeaton · 4 months ago
Guess where we host nexus..
frenkel · 4 months ago
Only if they get their base images from somewhere else...
bravetraveler · 4 months ago
Pull-through caches are still useful even when the upstream is down... assuming the image(s) were pulled recently. The HEAD to upstream will obviously fail [when checking currency], but the software is happy to serve what it has already pulled.

Depends on the implementation, of course: I'm speaking to 'distribution/distribution', the reference. Harbor or whatever else may behave differently, I have no idea.

phillebaba · 4 months ago
Shameless plug but this might be a good time to install Spegel in your Kubernetes clusters if you have critical dependencies on Docker Hub.

https://spegel.dev/

osivertsson · 4 months ago
If it really is fully open-source please make that more visible on your landing page.

It is a huge deal if I can start investigating and deploying such a solution as a techie right away, compared to having to go through all the internal hoops for a software purchase.

CaptainOfCoit · 4 months ago
How hard is it to go to the GitHub repository and open the LICENSE file that is in almost every repository? Would have taken you less time than writing that comment, and showed you it's under MIT.
mocko · 4 months ago
storm1er · 4 months ago
What's the difference with kuik? Spegel seems too complicated for my homelab, but could be a nice upgrade for my company

Kuik: https://github.com/enix/kube-image-keeper?tab=readme-ov-file...

phillebaba · 4 months ago
It's been a while since I looked at kuik, but I would say the main difference is that Spegel doesn't do any of the pulling or storage of images. Instead it relies on Containerd to do it for you. This also means that Spegel does not have to manage garbage collection. The nice thing with this is that it doesn't change how images are initially pulled from upstream and is able to serve images that exist on the node before Spegel runs.

Also it looks kuik uses CRDs to store information about where images are cached, while Spegel uses its own p2p solution to do the routing of traffic between nodes.

If you are running k3s in your homelab you can enable Spegel with a flag as it is an embedded feature.

CaptainOfCoit · 4 months ago
There is a couple of alternatives that mirrors more than just Docker Hub too, most of them pretty bloated and enterprisey, but they do what they say on the tin and saved me more than once. Artifactory, Nexus Repository, Cloudsmith and ProGet are some of them.
phillebaba · 4 months ago
Spegel does not only mirror Docker Hub, and works a lot differently than the alternatives you suggested. Instead of being yet another failure point closer to your production environment, it runs a distributed stateless registry inside of your Kubernetes cluster. By piggy backing off of Containerds image store it will distribute already pulled images inside of the cluster.
mike-cardwell · 4 months ago
This looks good, but we're using GKE and it looks like it only works there with some hacks. Is there a timeline to make it work with GKE properly?
phillebaba · 4 months ago
I am having some discussions about getting things working on GKE but I can't give an ETA as it really depends on how things align with deployment schedules. I am positive however that this will soon be resolved.
0xbadcafebee · 4 months ago
Google Cloud has its own cache of Docker Hub that you can use for free, AWS does as well

Deleted Comment

Deleted Comment

theanonymousone · 4 months ago
It's quite funny/interesting that this is higher in HN front page than the news of the AWS outage that caused it.
mcintyre1994 · 4 months ago
Not on the real secret front page! https://news.ycombinator.com/active :)
cakeday · 4 months ago
That's informative, I wasn't aware of that way to view HN, thanks.
pknopf · 4 months ago
What does the "active" page sort by?

Deleted Comment

helpfulmandrill · 4 months ago
I wonder if this is why I also can't log in to O'Reilly to do some "Docker is down, better find something to do" training...
p0w3n3d · 4 months ago
Just install a pull-through proxy that will store all the packages recently used.
m463 · 4 months ago
this is by design

docker got requests to allow you to configure a private registry, but they selfishly denied the ability to do that:

https://stackoverflow.com/questions/33054369/how-to-change-t...

redhat created docker-compatible podman and lets you close that hole

/etc/config/docker: BLOCK_REGISTRY='--block-registry=all' ADD_REGISTRY='--add-registry=registry.access.redhat.com'

compootr · 4 months ago
I still think this is an acceptable footgun (?) to have. The expressiveness of downloading an image tag with a domain included outweighs potential miscommunication issues.

For example, if you're on a team and you have documentation containing commands, but your docker config is outdated, you can accidentally pull from docker's global public registry.

A welcome change IMO would be removing global registries entirely, since it just makes it easier to tell where your image is coming from (but I severely doubt docker would ever consider this since it makes it fractionally easier to use their services)

scuff3d · 4 months ago
This is a huge stretch.

Even if you could configure a default registry to point at something besides docker.io a lot of people, I'd say the vast majority, wouldn't have bothered. So they'd still be in the same spot.

And it's not hard to just tag images. I don't have a single image pulling from docker.io at work. Takes two seconds to slap <company-repo>/ at the front of the image name.

anon7000 · 4 months ago
Sadly doesn't help if you were using ECR in us-east-1 as your private registry. :(