We self-host Harbor as well, it’s fairly painless. Has SSO out of the box, a Terraform provider that covers everything, and for the most part just works.
The issues we’ve had so far:
- No programmatic way to retrieve your token that’s required for ‘docker login’. So we had to create a robot account per user and pop their creds into our secrets store.
- Migrating between sites by cloning the underlying S3 bucket and spinning up the new Harbor instance on top of it, does not work. Weird issues with dropping pulls.
- RBAC goes down to project, not repository level, complicating some of our SDLC controls.
- CSRF errors every time you try to do anything in the UI
- Lenient API and lack of docs means things like setting up tag immutability rules via Terraform was a bit of a PITA to figure out the right syntax
So some small issues, but definitely a great piece of software.
What the upgrade story like? Their official website makes it sound like a pain (stopping the software, backing up the database, changing the settings syntax, running some installer). I would expect something built for kubernetes to just do the right thing on startup (such that upgrading is simply switching out the image).
I upgraded Harbor before, it was a pain. I think you're encouraged to use their official Helm chart and then it's supposed to be fairly seamless https://goharbor.io/docs/2.13.0/administration/upgrade/helm-... but if your predecessor decided against that option, separately adjusting the configuration for all the moving pieces is fairly annoying. Also, I misconfigured something and ended having to read Harbor source code because the error messages weren't very helpful. Fortunately, I had the presence of mind to first practice on a secondary installation created from a backup. It's definitely not something where you can stop production, install the update, and expect it to come back up in working order.
The lack of OIDC support for Harbor has been the biggest annoyance for me. I'd love to be able to push from Github Actions to Harbor without needing robot users.
For what it's worth, I have an on-prem Nexus server with a Docker repository. It has 8 cores and 16GB RAM. It has 82000 hits/day in the webserver log, though 99.9% of them transfer only a few kB, so I assume it's a metadata check and the client already has the correct version.
The same Nexus is also hosting our Maven and NodeJS repositories. That has 1,800,000 hits per day, although all but 120,000 of them return HTTP 404. (I think one of our clients has misconfigured their build server to use our Maven repository as a mirror of Maven Central, but as it's just an AWS IP I don't know who it is.)
I'm sure it's overprovisioned, but the marginal cost of 2, 4 or 8 cores and 4, 8 or 16GB RAM isn't much when we buy our own hardware.
> the marginal cost of 2, 4 or 8 cores and 4, 8 or 16GB RAM isn't much when we buy our own hardware
this is the crux of it all. RENTING bare metal (eg from hetzner) is 10x cheaper than aws ec2. So, I can only imagine how much cheaper it is when you buy the hardware directly.
I haven’t tried running this yet, but it seems worth keeping in mind. It’s relatively simple software so the idea could probably be pretty easily adapted to other situations.
I recall mentioned here ttl.sh, which as I looked it up [0], uses through Docker a CNCF project called Distribution Registry [1] which implements the core container registry functions (and appears to have additional utility, like being a pull-through cache).
Even if it is a daily number, it's not very much. There are 86,400 seconds in a day. Even limiting the time to an 8 hour business day, this is only around 1 pull per second.
I also run Harbor. I use the official Helm chart; it's a little jank, doesn't support a couple of things we want. It only works with one of ArgoCD/ExternalSecretsOperator, and it doesn't support Redis TLS.
Contrary to the author of this post, we just run one (the "source of truth") and use caching proxies in other regions. Works fine for us.
1. Doesn't work with ExternalSecretsOperator and ArgoCD, which I happen to use. This is because the author of the Harbor chart decided not to use k8s concepts like secretRef in a podTemplate. Instead, at Helm template time, it looks up the secret data and writes it into another secret, which is then included as a envFrom. This interacts poorly with ExternalSecretsOperator in general, because it breaks the lifecycle control that ESO has. It's completely broken with ArgoCD because ArgoCD disables secret lookups by charts for pretty valid security concerns. No other chart I've come across does secret lookups during helm template time. Even the helm docs tell you it's not correct.
2. Harbor requires redis, but the Helm chart doesn't correctly pipe in the connection configuration. Redis can't be behind TLS, or the chart won't work.
I'm confused on why they decided to populate the cache by replicating the entirety of Docker Hub instead of using a sort of cache that gets populated on the first pull
> pulling and pushing our images over the internet dozens of times a day caused us to hit the contracted bandwidth limit with our datacenter provider Deft repeatedly
I wonder what they were doing that resulted in blowing out their Docker layer cache on every pull and push.
Normally only a layer diff would be sent over the wire, such as a code change that didn't change your dependencies.
I'd rather have the agents prune their docker cache (or destroying and recreating agent) every night but it is not uncommon to see pipelines use the --no-cache option at every run to make sure they get the latest security updates.
ECR is kind of hard to beat if you're ok with being in the cloud.
The last time I used it earlier this year for a company already on AWS, it was ~$3 / month per region to store 8 private repos and it was really painless to have a flexible and automated life cycle policy that deleted old image tags. It supports cross region replication too. All of that comes without maintenance or compute costs and if you're already familiar with Terraform, etc. you can automate all of that in a few hours of dev time.
So what are the thoughts of folks who have used Nexus and moved to Harbor?
In my experience Nexus is a bit weird to administer and sometimes the Docker cleanup policies straight up don't work (junk left over in the blob stores even if you try to clean everything), but it also supports all sorts of other formats, such as repositories for Maven and NuGet. Kind of hungry in regards to resources, though.
Nexus can be flaky, but it's pretty universal as you say. Harbor is a hard sell for me, since generally in any organization you'll need non-OCI artifact storage at some point, and maintaining 2 tools is always a pain.
The issues we’ve had so far:
- No programmatic way to retrieve your token that’s required for ‘docker login’. So we had to create a robot account per user and pop their creds into our secrets store.
- Migrating between sites by cloning the underlying S3 bucket and spinning up the new Harbor instance on top of it, does not work. Weird issues with dropping pulls.
- RBAC goes down to project, not repository level, complicating some of our SDLC controls.
- CSRF errors every time you try to do anything in the UI
- Lenient API and lack of docs means things like setting up tag immutability rules via Terraform was a bit of a PITA to figure out the right syntax
So some small issues, but definitely a great piece of software.
One glaring omission is lack of support for proxy docker.io without the project name i.e pulling nginx:latest instead of /myproject/nginx/nginx:latest
The workaround involves URL rewrite magic in your proxy of choice
The same Nexus is also hosting our Maven and NodeJS repositories. That has 1,800,000 hits per day, although all but 120,000 of them return HTTP 404. (I think one of our clients has misconfigured their build server to use our Maven repository as a mirror of Maven Central, but as it's just an AWS IP I don't know who it is.)
I'm sure it's overprovisioned, but the marginal cost of 2, 4 or 8 cores and 4, 8 or 16GB RAM isn't much when we buy our own hardware.
this is the crux of it all. RENTING bare metal (eg from hetzner) is 10x cheaper than aws ec2. So, I can only imagine how much cheaper it is when you buy the hardware directly.
Hell Git doesn't even need the Git protocol if you do `update-server-info`
https://github.com/jpetazzo/registrish
I haven’t tried running this yet, but it seems worth keeping in mind. It’s relatively simple software so the idea could probably be pretty easily adapted to other situations.
0. https://github.com/replicatedhq/ttl.sh/blob/main/registry/en...
1. https://distribution.github.io/distribution/
They never say that it needs these resources, they say that the current VM has this config. Probably overkill by 1000%.
Contrary to the author of this post, we just run one (the "source of truth") and use caching proxies in other regions. Works fine for us.
1. Doesn't work with ExternalSecretsOperator and ArgoCD, which I happen to use. This is because the author of the Harbor chart decided not to use k8s concepts like secretRef in a podTemplate. Instead, at Helm template time, it looks up the secret data and writes it into another secret, which is then included as a envFrom. This interacts poorly with ExternalSecretsOperator in general, because it breaks the lifecycle control that ESO has. It's completely broken with ArgoCD because ArgoCD disables secret lookups by charts for pretty valid security concerns. No other chart I've come across does secret lookups during helm template time. Even the helm docs tell you it's not correct.
2. Harbor requires redis, but the Helm chart doesn't correctly pipe in the connection configuration. Redis can't be behind TLS, or the chart won't work.
I wonder what they were doing that resulted in blowing out their Docker layer cache on every pull and push.
Normally only a layer diff would be sent over the wire, such as a code change that didn't change your dependencies.
The last time I used it earlier this year for a company already on AWS, it was ~$3 / month per region to store 8 private repos and it was really painless to have a flexible and automated life cycle policy that deleted old image tags. It supports cross region replication too. All of that comes without maintenance or compute costs and if you're already familiar with Terraform, etc. you can automate all of that in a few hours of dev time.
In my experience Nexus is a bit weird to administer and sometimes the Docker cleanup policies straight up don't work (junk left over in the blob stores even if you try to clean everything), but it also supports all sorts of other formats, such as repositories for Maven and NuGet. Kind of hungry in regards to resources, though.
It is pretty easy to just run the basic registry for this purpose.
We have a similar setup for NPM and Pypi on the same machine. It doesnt really need a lot of attention. Some upgrades every once and a while.