Hi all, Tushar from Docker here. We’re sorry about the impact our current outage is having on many of you. Yes, this is related to the ongoing AWS incident and we’re working closely with AWS on getting our services restored. We’ll provide regular updates on dockerstatus.com .
We know how critical Docker Hub and services are to millions of developers, and we’re sorry for the pain this is causing. Thank you for your patience as we work to resolve this incident. We’ll publish a post-mortem in the next few days once this incident is fully resolved and we have a remediation plan.
Part of me hopes that we find out that Dynamo DB (which sounds like was the root of the cascading failures) is shipped in a Docker image which is hosted on Docker Hub :-D
I think more often than not, companies are using a single cloud provider, and even when multiple are used, it's either different projects with different legacy decisions or a conscious migration.
True multi-tenancy is not only very rare, it's an absolute pain to manage as soon as people start using any vendor-specific functionality.
No, that's pretty rare, and generally means you can't count on any features more sophisticated than VMs and object storage.
On the other hand, it's pretty embarrassing at this point for something as fundamental as Docker to be in a single region. Most cloud providers make inter-region failover reasonably achievable.
Because even if service A is using multiple cloud providers not all the external services they use are doing the same thing, especially the smallest one or the cheapest ones. At least one of them is on AWS East-1, fails and degrades service A or takes it down.
Being multi-cloud does not come for free: time, engineers, knowledge and ultimately money.
Multi cloud is not nearly as trivial as often implied to implement for real world complex projects. Things get challenging the second your application steps off the happy path
they are using multiple cloud providers, but judging by the cloudflare r2 outage affecting them earlier this year I guess all of them are on the critical path?
Looking at the landscape around me, no. Everyone is in crisis cost-cutting, "gotta show that same growth the C-suite saw during Covid" mode. So being multi-provider, and even in some cases, being multi-regional, is now off the table. It's sad because the product really suffers. But hey, "growth".
I guess people who are running their own registries like Nexus and build their own container images from a common base image are feeling at least a bit more secure in their choice right now.
Wonder how many builds or redeployments this will break. Personally, nothing against Docker or Docker Hub of course, I find them to be useful.
It's actually an important practice to have a docker image cache in the middle. You never know if an upstream image is purged randomly from docker, and your K8s node gets replaced, and now can't pull the base image for your service.
> You never know if an upstream image is purged randomly from docker, and your K8s node gets replaced, and now can't pull the base image for your service.
That doesn’t make sense unless you have some oddball setup where k8s is building the images you’re running on the fly. Theres no such thing as “base image” for tasks running in k8s. There is just the image itself and its layers which may come from some other image.
But it’s not built by k8s. Its be built in whatever is building your images and storing I. Your registers. That’s where you need your true base image caching.
We are using base images but unfortunately some github actions are pulling docker images in their prepare phase - so while my application would build, I cannot deploy it because the CI/CD depends on dockerhub and you cannot change where these images are pulled from (so they cannot go through a pull-through cache)…
My advice: document the issue, and use it to help justify spending time on removing those vestigial dependencies on Docker asap.
It's not just about reducing your exposure to third parties who you (presumably) don't have a contract with, it's also good mitigation against potential supply chain attacks - especially if you go as far as building the base images from scratch.
We run Harbor and mirror every base image using its Proxy Cache feature, it's quite nice.
We've had this setup for years now and while it works fine, Harbor has some rough edges.
I came here to mention that any non-trivial company depending on Docker images should look into a local proxy cache. It’s too much infra for a solo developer / tiny organization, but is a good hedge against DockerHub, GitHub repo, etc downtime and can run faster (less ingress transfer) if located in the same region as the rest of your infra.
Pull-through caches are still useful even when the upstream is down... assuming the image(s) were pulled recently. The HEAD to upstream will obviously fail [when checking currency], but the software is happy to serve what it has already pulled.
Depends on the implementation, of course: I'm speaking to 'distribution/distribution', the reference. Harbor or whatever else may behave differently, I have no idea.
If it really is fully open-source please make that more visible on your landing page.
It is a huge deal if I can start investigating and deploying such a solution as a techie right away, compared to having to go through all the internal hoops for a software purchase.
How hard is it to go to the GitHub repository and open the LICENSE file that is in almost every repository? Would have taken you less time than writing that comment, and showed you it's under MIT.
It's been a while since I looked at kuik, but I would say the main difference is that Spegel doesn't do any of the pulling or storage of images. Instead it relies on Containerd to do it for you. This also means that Spegel does not have to manage garbage collection. The nice thing with this is that it doesn't change how images are initially pulled from upstream and is able to serve images that exist on the node before Spegel runs.
Also it looks kuik uses CRDs to store information about where images are cached, while Spegel uses its own p2p solution to do the routing of traffic between nodes.
If you are running k3s in your homelab you can enable Spegel with a flag as it is an embedded feature.
There is a couple of alternatives that mirrors more than just Docker Hub too, most of them pretty bloated and enterprisey, but they do what they say on the tin and saved me more than once. Artifactory, Nexus Repository, Cloudsmith and ProGet are some of them.
Spegel does not only mirror Docker Hub, and works a lot differently than the alternatives you suggested. Instead of being yet another failure point closer to your production environment, it runs a distributed stateless registry inside of your Kubernetes cluster. By piggy backing off of Containerds image store it will distribute already pulled images inside of the cluster.
I am having some discussions about getting things working on GKE but I can't give an ETA as it really depends on how things align with deployment schedules. I am positive however that this will soon be resolved.
I still think this is an acceptable footgun (?) to have. The expressiveness of downloading an image tag with a domain included outweighs potential miscommunication issues.
For example, if you're on a team and you have documentation containing commands, but your docker config is outdated, you can accidentally pull from docker's global public registry.
A welcome change IMO would be removing global registries entirely, since it just makes it easier to tell where your image is coming from (but I severely doubt docker would ever consider this since it makes it fractionally easier to use their services)
Even if you could configure a default registry to point at something besides docker.io a lot of people, I'd say the vast majority, wouldn't have bothered. So they'd still be in the same spot.
And it's not hard to just tag images. I don't have a single image pulling from docker.io at work. Takes two seconds to slap <company-repo>/ at the front of the image name.
We know how critical Docker Hub and services are to millions of developers, and we’re sorry for the pain this is causing. Thank you for your patience as we work to resolve this incident. We’ll publish a post-mortem in the next few days once this incident is fully resolved and we have a remediation plan.
Dead Comment
Isn't it everyone using multiple cloud providers nowadays? Why are they affected by single cloud provider outage?
True multi-tenancy is not only very rare, it's an absolute pain to manage as soon as people start using any vendor-specific functionality.
On the other hand, it's pretty embarrassing at this point for something as fundamental as Docker to be in a single region. Most cloud providers make inter-region failover reasonably achievable.
Being multi-cloud does not come for free: time, engineers, knowledge and ultimately money.
No? I very much doubt anyone is doing that.
Oh yes. All of them, in fact, especially if you count what key vendors host on.
> Why are they affected by single cloud provider outage?
Every workload is only on one cloud. Nb this doesn’t mean every workflow is on only one cloud. Important distinction since that would be more stable.
Dead Comment
Thankfully, AWS provides a docker.io mirror for those who can't wait:
In the error logs, the issue was mostly related to the authentication endpoint:▪ https://auth.docker.io → "No server is available to handle this request"
After switching to the AWS mirror, everything built successfully without any issues.
Also, quay.io - another image hoster, from red hat - has been read-only all day today.
If you're going to have docker/container image dependencies it's best to establish a solid hosting solution instead of riding whatever bus shows up
Just had to change
to Hope this helps![0]: https://cloud.google.com/artifact-registry/docs/pull-cached-...
If you're image is not cached on one of these then you may be SOL.
Wonder how many builds or redeployments this will break. Personally, nothing against Docker or Docker Hub of course, I find them to be useful.
Just engineering hygiene IMO.
That doesn’t make sense unless you have some oddball setup where k8s is building the images you’re running on the fly. Theres no such thing as “base image” for tasks running in k8s. There is just the image itself and its layers which may come from some other image.
But it’s not built by k8s. Its be built in whatever is building your images and storing I. Your registers. That’s where you need your true base image caching.
It's not just about reducing your exposure to third parties who you (presumably) don't have a contract with, it's also good mitigation against potential supply chain attacks - especially if you go as far as building the base images from scratch.
Asside; seems Signal is also having issues. Damn.
So not agile!
Depends on the implementation, of course: I'm speaking to 'distribution/distribution', the reference. Harbor or whatever else may behave differently, I have no idea.
https://spegel.dev/
It is a huge deal if I can start investigating and deploying such a solution as a techie right away, compared to having to go through all the internal hoops for a software purchase.
Kuik: https://github.com/enix/kube-image-keeper?tab=readme-ov-file...
Also it looks kuik uses CRDs to store information about where images are cached, while Spegel uses its own p2p solution to do the routing of traffic between nodes.
If you are running k3s in your homelab you can enable Spegel with a flag as it is an embedded feature.
Deleted Comment
Deleted Comment
Deleted Comment
docker got requests to allow you to configure a private registry, but they selfishly denied the ability to do that:
https://stackoverflow.com/questions/33054369/how-to-change-t...
redhat created docker-compatible podman and lets you close that hole
/etc/config/docker: BLOCK_REGISTRY='--block-registry=all' ADD_REGISTRY='--add-registry=registry.access.redhat.com'
For example, if you're on a team and you have documentation containing commands, but your docker config is outdated, you can accidentally pull from docker's global public registry.
A welcome change IMO would be removing global registries entirely, since it just makes it easier to tell where your image is coming from (but I severely doubt docker would ever consider this since it makes it fractionally easier to use their services)
Even if you could configure a default registry to point at something besides docker.io a lot of people, I'd say the vast majority, wouldn't have bothered. So they'd still be in the same spot.
And it's not hard to just tag images. I don't have a single image pulling from docker.io at work. Takes two seconds to slap <company-repo>/ at the front of the image name.