Uncloud[0] is a container orchestrator without a control plane. Think multi-machine Docker Compose with automatic WireGuard mesh, service discovery, and HTTPS via Caddy. Each machine just keeps a p2p-synced copy of cluster state (using Fly.io's Corrosion), so there's no quorum to maintain.
I’m building Uncloud after years of managing Kubernetes in small envs and at a unicorn. I keep seeing teams reach for K8s when they really just need to run a bunch of containers across a few machines with decent networking, rollouts, and HTTPS. The operational overhead of k8s is brutal for what they actually need.
A few things that make it unique:
- uses the familiar Docker Compose spec, no new DSL to learn
- builds and pushes your Docker images directly to your machines without an external registry (via my other project unregistry [1])
- imperative CLI (like Docker) rather than declarative reconciliation. Easier mental model and debugging
- works across cloud VMs, bare metal, even a Raspberry Pi at home behind NAT (all connected together)
"I keep seeing teams reach for K8s when they really just need to run a bunch of containers across a few machines"
Since k8s is very effective at running a bunch of containers across a few machines, it would appear to be exactly the correct thing to reach for. At this point, running a small k8s operation, with k3s or similar, has become so easy that I can't find a rational reason to look elsewhere for container "orchestration".
I can only speak for myself, but I considered a few options, including "simple k8s" like [Skate](https://skateco.github.io/), and ultimately decided to build on uncloud.
It was as much personal "taste" than anything, and I would describe the choice as similar to preferring JSON over XML.
For whatever reason, kubernetes just irritates me. I find it unpleasant to use. And I don't think I'm unique in that regard.
100%. I’m really not sure why K8S has become the complexity boogeyman. I’ve seen CDK apps or docker compose files that are way more difficult to understand than the equivalent K8S manifests.
If you already know k8s, this is probably true. If you don't it's hard to know what bits you need, and need to learn about, to get something simple set up.
k3s makes it easy to deploy, not to debug any problems with it. It's still essentially adding few hundred thousand lines of code into your infrastructure, and if it is a small app you need to deploy, also wasting a bit of ram
Except it isn't just "a way to run a bunch of containers across a few machines".
It seems that way but in reality "resource" is a generic concept in k8s. K8s is a management/collaboration platform for "resources" and everything is a resource. You can define your own resource types too. And who knows, maybe in the future these won't be containers or even linux processes? Well it would still work given this model.
But now, what if you really just want to run a bunch of containers across a few machines?
My point is, it's overcomplicated and abstracts too heavily. Too smart even... I don't want my co workers to define our own resource types, we're not at a google scale company.
Totally valid concern. That was a shortcut to iterate quickly in early development. It’s time to do it properly now. Appreciate the feedback. This is exactly the kind of thing I need to hear before more people try it.
There is a `--no-install` flag on both `uc machine init` and `uc machine add` that skips that `curl | bash` install step.
You need to prepare the machine some other way first then, but it's just installing docker and the uncloud service.
I use the `--no-install` option with my own cluster, as I have my own pre-provisioning process that includes some additional setup beyond the docker/uncloud elements.
Very cool! I think I'll have some opportunity soon to give it a shot, I have just the set of projects that have been needing a tool like this. One thing I think I'm missing after perusing the docs however is, how does one onboard other engineers to the cluster after it has been set up? And similarly, how does deployment from a CI/CD runner work? I don't see anything about how to connect to an existing cluster from a new machine, or at least not that I'm recognizing.
There isn't a cli function for adding a connection (independently of adding a new machine/node) yet, but they are in a simple config file (`~/.config/uncloud/config.yaml`) that you can copy or easily create manually for now. It looks like this:
I took some inspiration from Kamal, e.g. the imperative model but kamal is more a deployment tool.
In addition to deployments, uncloud handles clustering - connects machines and containers together. Service containers can discover other services via internal DNS and communicate directly over the secure overlay network without opening any ports on the hosts.
As far as I know kamal doesn’t provide an easy way for services to communicate across machines.
Services can also be scaled to multiple replicas across machines.
I really like what is on offer here - thank you for building it. Re the private network it builds with Wireguard, how are services running within this private network supposed to access AWS services such as RDS securely? Tailscale has this: https://tailscale.com/kb/1141/aws-rds
Thanks! If you're running the ucloud cluster in AWS, service containers should be able to access RDS the same way the underlying EC2 instances can (assuming RDS is in the same VPC or reachable via VPC peering).
The private container IPs will get NATed to the underlying EC2 IPs so requests to RDS will appear as coming from those instances. The appropriate Security Group(s) need to be configured as well. The limitation is that you can't segregate access at the service level, only at the EC2 instance level.
So it's a kind of better Docker Swarm? It's interesting, but honestly I'd rather have something declarative, so I can use it with Pulumi, would it be complicated to add a declarative engine on top of the tool? Which discovers what services are already up, do a diff with the new declaration, and handles changes?
The main difference with Docker Swarm is that the reconciliation process is run on your local/CI machine as part of the 'uc deploy' CLI command execution, not on the control plane nodes in the cluster.
And it's not running in the loop automatically. If the command fails, you get an instant feedback with the errors you can address or rerun the command again.
It should be pretty straightforward to wrap the CLI logic in a Terraform or Pulumi provider. The design principals are very similar and it's written in Go.
You have a graph that shows a multi provider setup for a domain. Where would routing to either machine happen? As in which ip would you use on the dns side?
For the public cluster with multiple ingress (caddy) nodes you'd need a load balancer in front of them to properly handle routing and outage of any of them. You'd use the IP of the load balancer on the DNS side.
Note that a DNS A record with multiple IPs doesn't provide failover, only round robin. But you can use the Cloudflare DNS proxy feature as a poor man's LB. Just add 2+ proxied A records (orange cloud) pointing to different machines. If one goes down with a 52x error, Cloudflare automatically fails over to the healthy one.
We have similar backgrounds, and I totally agree with your k8s sentiment.
But I wonder what this solves?
Because I stopped abusing k8s and started using more container hosts with quadlets instead, using Ansible or Terraform depending on what the situation calls for.
It works just fine imho. The CI/CD pipeline triggers a podman auto-update command, and just like that all containers are running the latest version.
Great setup! Where Uncloud helps is when you need containers across multiple machines to talk to each other.
Your setup sounds like single-node or nodes that don't need to discover each other. If you ever need multi-node with service-to-service communication, that's where stitching together Ansible + Terraform + quadlets + some networking layer starts to get tedious. Uncloud tries to make that part simple out of the box.
You also get the reverse proxy (Caddy) that automatically reconfigures depending on what containers are running on machines. You just deploy containers and it auto-discovers them. If a container crashes, the configuration is auto-updated to remove the faulty container from the list of upstreams.
Plus a single CLI you run locally or on CI to manage everything, distribute images, stream logs. A lot of convenience that I'm putting together to make the user experience more enjoyable.
But if you don't need that, keep doing what works.
Thanks for the both great tools. just i didn't understand one thing ?
the request flow, imaging we have 10 servers where we choose this request goes to server 1 and the other goes to 7 for example. and since its zero down time, how it says server 5 is updating so till it gets up no request should go there.
I think there are two different cases here. Not sure which one you’re talking about.
1. External requests, e.g. from the internet via the reverse proxy (Caddy) running in the cluster.
The rollout works on the container, not the server level. Each container registers itself in Caddy so it knows which containers to forward and distribute requests to.
When doing a rollout, a new version of container is started first, registers in caddy, then the old one is removed. This is repeated for each service container. This way, at any time there are running containers that serve requests.
It doesn’t say any server that requests shouldn’t go there. It just updates upstreams in the caddy config to send requests to the containers that are up and healthy.
2. Service to service requests within the cluster. In this case, a service DNS name is resolved to a list of IP addresses (running containers). And the client decides which one to send a request to or whether to distribute requests among them.
When the service is updated, the client needs to resolve the name again to get the up-to-date list of IPs.
Many http clients handle this automatically so using http://service-name as an endpoint typically just works. But zero downtime should still be handled by the client in this case.
Awesome tool! Does it provide some basic features that you would get from running a control plane.
Like rescheduling automatically a container on another server if a server is down? Deploying on the less filled server first if you have set limits in your containers?
There is no automatic rescheduling in uncloud by design. At least for now. We will see how far we can get without it.
If you want your service to tolerate a host going down, you should deploy multiple replicas for that service on multiple machines in advance. 'uc scale' command can be used to run more replicas for an already deployed service.
Regarding deploying on the less filled machine first is doable but not supported right now. By default, it picks the first machine randomly and tries to distributes replicas evenly among all available machines. You can also manually specify what target machine(s) each service should run on in your Compose file.
I want to avoid recreating the complexity with placement constraints, (anti-)affinity, etc. that makes K8s hard to reason about. There is a huge class of apps that need more or less static infra, manual placement, and a certain level of redundancy. That's what I'm targeting with Uncloud.
> - uses the familiar Docker Compose spec, no new DSL to learn
But this goes with assumption that one already know docker compose spec. For exact same reason I'm in love for `podman kube play` to just use k8s manifests to quickly test run on local machine - and not bother with some "legacy" compose.
(I never liked Docker Inc. so I never learned THEIR tooling, it's not needed to build/run containers)
Neat, as you include quite a few tool for services to be reachable together (not necessarily to the outside), do you also have tooling to make those services more interoperable?
So you build an insecure version of nomad/kubernetes and co?
If you do anything professional, you better choose proven software like kubernetes or managed kubernetes or whatever else all the hyperscalers provide.
And the complexity you are solving now or have to solve, k8s solved. IaC for example, Cloud Provider Support for provisioning a LB out of the box, cert-manager, all the helm charts for observability, logging, a ecosystem to fall back to (operators), ArgoCD <3, storage provisioning, proper high availability, kind for e2e testing on cicd, etc.
I'm also aways lost why people think k8s is so hard to operate. Just take a managed k8s. There are so many options out there and they are all compatible with the whole k8s ecosystem.
Look if you don't get kubernetes, its use casees, advantages etc. fine absolutly fine but your solution is not an alternative to k8s. Its another container orchestrator like nomad and k8s and co. with it own advantages and disadvantages.
It's not a k8s replacement. It's for the small dev team with no k8s experience. For people that might not use Docker Swarm because they see it's a pretty dead project. For people who think "everyone uses k8s", so we should, too.
I need to run on-prem, so managed k8s is not an option. Experts tells me I should have 2 FTE to run k8s, which I don't have. k8s has so many components, how should I debug that in case of issues without k8s experience? k8s APIs change continuously, how should I manage that without k8s experience?
It's not a k8s replacement. But I do see a sweet spot for such a solution. We still run Docker Swarm on 5 servers, no hyperscalers, no API changes expected ;-)
Those are all sub-par cloud technologies which perform very badly and do not scale at all.
Some people would rather build their own solutions to do these things with fine-grain control and the ability to handle workloads more complex that a shopping cart website.
Having spent most of my career in kubernetes (usually managed by cloud), I always wonder when I see things like this, what is the use case or benefit of not having a control plane?
To me, the control plane is the primary feature of kubernetes and one I would not want to go without.
I know this describes operational overhead as a reason, but how it relates to the control plane is not clear to me. even managing a few hundred nodes and maybe 10,000 containers, relatively small - I update once a year and the managed cluster updates machine images and versions automatically. Are people trying to self host kubernetes for production cases, and that’s where this pain comes from?
> a few hundred nodes and maybe 10,000 containers, relatively small
That feels not small to me. For something I'm working on I'll probably have two nodes and around 10 containers. If it works out and I get some growth, maybe that will go up to, say, 5-7 nodes and 30 or so containers? I dunno. I'd like some orchestration there, but k8s feels way too heavy even for my "grown" case.
I feel like there are potentially a lot of small businesses at this sort of scale?
Not rude at all. The benefit is a much simpler model where you simply connect machines in a network where every machine is equal. You can add more, remove some. No need to worry about an HA 3-node centralised “cluster brain”. There isn’t one.
It’s a similar experience when a cloud provider manages the control plane for you. But you have to worry about the availability when you host everything yourself. Losing etcd quorum results in an unusable cluster.
Many people want to avoid this, especially when running at a smaller scale like a handful of machines.
The cluster network can even partition and each partition continues to operate allowing to deploy/update apps individually.
That’s essentially what we all did in a pre-k8s era with chef and ansible but without the boilerplate and reinventing the wheel, and using the learnings from k8s and friends.
If you are a small operation and trying to self host k3s or k8s or any number of out of the box installations that are probably at least as complex as docker compose swarms, for any non trivial production case, presents similar problems in monitoring and availability as ones you’d get with off the shelf cloud provider managed services, except the managed solutions come without the pain in the ass. Except you don’t have a control plane.
I have managed custom server clusters in a self hosted situation. the problems are hard, but if you’re small, why would you reach for such a solution in the first place? you’d be better off paying for a managed service. What situation forces so many people to reach to self hosted kubernetes?
Of course they are…? That’s half the point of k8s - if you want to self host, you can, but it’s just like backups: if you never try it, you should assume you can’t do it when you need to
For an SME with nonetheless critical workloads, 10000 containers is not small. To me that's massive in fact. I run less than 10 but I need those to be HA. uncloud sounds great for my use case.
On cloud, in my experience, you are mostly paying for compute with managed kubernetes instances. The overhead and price is almost never kubernetes itself, but the compute and storage you are provisioning, which, thanks to the control plane, you have complete control over. what am i missing?
I wouldn’t dare try to with a small shop try to self host a production kubernetes solution unless i was under duress. But I just dont see what the control plane has to do with it. It’s the feature that makes kubernetes worth it.
I'm working on a similar project (here's the v0 of its state management and the libraries its "local control plane" will use to implement a mesh https://github.com/accretional/collector) and worked on the data plane for Google Cloud Run/Functions:
IMO kubernetes is great if your job is to fiddle with Kubernetes. But damn, the overhead is insane. There is this broad swathe of middle-sized tech companies and non-tech Internet application providers (eg ecommerce, governments, logistics, etc.) that spend a lot of their employees' time operating Kubernetes clusters, and a lot of money on the compute for those clusters, which they probably overprovision and also overpay for through some kind of managed Kubernetes/hyperscaler platform + a bunch of SaaS for things like metrics and logging, container security products, alerting. A lot of these guys are spending 10-40% of their budget on compute, payroll, and SaaS to host CRUD applications that could probably run on a small number of servers without a "platform" team behind it, just a couple of developers who know what they're doing.
Unless they're paying $$$ each of these deployments is running their own control plane and dealing with all the operational and cognitive overhead that entails. Most of those are running in a small number of datacenters alongside a bunch of other people running/managing/operating kubernetes clusters of their own. It's insanely wasteful because if there were a proper multitenant service mesh implementation (what I'm working on) that was easy to use, everybody could share the same control plane ~per datacenter and literally just consume the Kubernetes APIs they actually need, the ones that let them run and orchestrate/provision their application, and forget about all the fucking configuration of their cluster. BTW, that is how Borg works, which Kubernetes was hastily cobbled-together to mimic in order to capitalize on Containers Being So Hot Right Now.
The vast majority of these Kubernetes users just want to run their applications, their customers don't know or care that Kubernetes is in the picture at all, and the people writing the checks would LOVE to not be spending so much and money on the same platform engineering problems as every other midsize company on the Internet.
> what is the use case or benefit of not having a control plane?
All that is to say, it's not having to pay for a bunch of control plane nodes and SaaS and a Kubernetes guy/platform team. At small and medium scales, it's running a bunch of container instances as long as possible without embarking on a 6-24mo, $100k-$10m+ expedition to Do Kubernetes. It's not having to secure some fricking VPC with a million internal components and plugins/SaaS, it's not letting some cloud provider own your soul, and not locking you in to something so expensive you have to hire an entire internal team of Kubernetes-guys to set it up.
All the value in the software industry comes from the actual applications people are paying for. So the better you can let people do that without infrastructure getting in the way, the better. Making developers deal with this bullshit (or deciding to have 10-30% of your developers deal with it fulltime) is what gets in the way: https://kubernetes.io/docs/concepts/overview/components/
The experience you are describing has overwhelmingly not been my own, nor anyone in my space I know.
I can only speak most recently for EKS, but the cost is spent almost entirely on compute. I’m a one man shop managing 10,000 containers. I basically only spend on the compute itself, which is not all that much, and certainly far, far less than hiring a sys admin. Self hosted anything would be a huge PITA for me and likely end up costing more.
Yes, you can avoid kubernetes and being a “slave” to cloud providers, but I personally believe you’re making infrastructure tradeoffs in a bad way, and likely spending as much in the long run anyway.
maybe my disconnect here is that I mostly deal with full production scale applications, not hobby projects I am hosting on my own network (nothing wrong with that, and I would agree k8s is overkill for something like that).
Eventually though, at scale, I strongly believe you will need or want a control plane of some type for your container fleets, and that typically ends up looking or acting like k8s.
>with swarm and traefik, I can define url rewrite rules as container labels. Is something equivalent available?
Yep, you define the mapping between the domain name and the internal container port as `x-ports: app.example.com:8000/https` in the compose file. Or you can specify a custom Caddy config for the service as `x-caddy: Caddyfile` which allows to customise it however you like. See https://uncloud.run/docs/concepts/ingress/publishing-service...
>if I deploy 2 compose 'stacks', do all containers have access to all other containers, even in the other stack?
Yes, there is no network isolation between containers from different services/stacks at the moment. Here is an open discussion on stack/namespace/environment/project concepts and isolation: https://github.com/psviderski/uncloud/discussions/94.
What's your use case and how would you want this to behave?
I like that I can put my containers to be exposed on the traefik-public network, and keep others like databases unreachable from traefik. This organisation of networks is very useful, allowing to make containers reachable across stacks, but also to keep some containers in a stack reachable only from other containers on the same network in that same stack.
Secrets -- yes, it's being tracked here: https://github.com/psviderski/uncloud/issues/75 Compose configs are already supported and can be used to inject secrets as well, but there'll be no encryption at rest there in that case, so might not be ideal for everyone.
Speaking of Swarm and your experience with it: in your opinion, is there anything that Swarm lacks or makes difficult, that tools like Uncloud could conceptually "fix"?
Swarm is not far from my dream deploy solution, but here are some points that might be better, some of them being already better in uncloud I think:
- energy in the community is low, it's hard to find an active discussion channel of swarm users
- swarm does not support the complete compose file format. This is really annoying
- sometimes, deploys fail for unclear reasons (eg a network was not found, but why as it's defined in the compose file?) and work the next try. This is never lead to problems, but doesn't feel right
- working with authenticate/custom registries is somewhat cumbersome
- having to work with registries to have the same image deployed on all nodes is sometimes annoying. It could be cool to have images spreading across nodes.
- there's no contact between devs and users. I've just discovered uncloud and I've had more contact with its devs here than in years of using swarm!
- the firewalling is not always clear/clean
- logs accessibility (service vs container) and containers identification: when a container fails to start, it's sometimes harder than needed to debug (esp when it is because the image is not available)
Nomad still has a tangible learning curve, which (in my very biased opinion) is almost non-existent with Uncloud assuming the user has already heard about Docker and Compose.
I'm pretty happy with k3s, but I'm also happy to see some development happening in the space between docker compose and full-blown kubernetes. The wireguard integration in particular intrigues me.
I'm always looking for new alternatives there, I've recently tried Coolify but it didn't feel very polished and mostly clunky. I'm still happy with Dokku at this point but would love to have a better UI for managing databases etc.
Uncloud[0] is a container orchestrator without a control plane. Think multi-machine Docker Compose with automatic WireGuard mesh, service discovery, and HTTPS via Caddy. Each machine just keeps a p2p-synced copy of cluster state (using Fly.io's Corrosion), so there's no quorum to maintain.
I’m building Uncloud after years of managing Kubernetes in small envs and at a unicorn. I keep seeing teams reach for K8s when they really just need to run a bunch of containers across a few machines with decent networking, rollouts, and HTTPS. The operational overhead of k8s is brutal for what they actually need.
A few things that make it unique:
- uses the familiar Docker Compose spec, no new DSL to learn
- builds and pushes your Docker images directly to your machines without an external registry (via my other project unregistry [1])
- imperative CLI (like Docker) rather than declarative reconciliation. Easier mental model and debugging
- works across cloud VMs, bare metal, even a Raspberry Pi at home behind NAT (all connected together)
- minimal resource footprint (<150MB ram)
[0]: https://github.com/psviderski/uncloud
[1]: https://github.com/psviderski/unregistry
Since k8s is very effective at running a bunch of containers across a few machines, it would appear to be exactly the correct thing to reach for. At this point, running a small k8s operation, with k3s or similar, has become so easy that I can't find a rational reason to look elsewhere for container "orchestration".
It was as much personal "taste" than anything, and I would describe the choice as similar to preferring JSON over XML.
For whatever reason, kubernetes just irritates me. I find it unpleasant to use. And I don't think I'm unique in that regard.
I have struggled to get things like this stood up and hit many footguns along the way
It seems that way but in reality "resource" is a generic concept in k8s. K8s is a management/collaboration platform for "resources" and everything is a resource. You can define your own resource types too. And who knows, maybe in the future these won't be containers or even linux processes? Well it would still work given this model.
But now, what if you really just want to run a bunch of containers across a few machines?
My point is, it's overcomplicated and abstracts too heavily. Too smart even... I don't want my co workers to define our own resource types, we're not at a google scale company.
While I would love to test this tool, this is not something I would run on any machine :/
I wanted to try it out but was put off by this[0]. It’s just straight up curl | bash as root from raw.githubusercontent.com.
If this is the install process for a server (and not just for the CLI) I don’t want to think about security in general for the product.
Sorry, I really wanted to like this, but pass.
[0] https://github.com/psviderski/uncloud/blob/ebd4622592bcecedb...
You need to prepare the machine some other way first then, but it's just installing docker and the uncloud service.
I use the `--no-install` option with my own cluster, as I have my own pre-provisioning process that includes some additional setup beyond the docker/uncloud elements.
In addition to deployments, uncloud handles clustering - connects machines and containers together. Service containers can discover other services via internal DNS and communicate directly over the secure overlay network without opening any ports on the hosts.
As far as I know kamal doesn’t provide an easy way for services to communicate across machines.
Services can also be scaled to multiple replicas across machines.
The private container IPs will get NATed to the underlying EC2 IPs so requests to RDS will appear as coming from those instances. The appropriate Security Group(s) need to be configured as well. The limitation is that you can't segregate access at the service level, only at the EC2 instance level.
When you run 'uc deploy' command:
- it reads the spec from your compose.yaml
- inspects the current state of the services in the cluster
- computes the diff and deployment plan to reconcile it
- executes the plan after the confirmation
Please see the docs and demo: https://uncloud.run/docs/guides/deployments/deploy-app
The main difference with Docker Swarm is that the reconciliation process is run on your local/CI machine as part of the 'uc deploy' CLI command execution, not on the control plane nodes in the cluster.
And it's not running in the loop automatically. If the command fails, you get an instant feedback with the errors you can address or rerun the command again.
It should be pretty straightforward to wrap the CLI logic in a Terraform or Pulumi provider. The design principals are very similar and it's written in Go.
Note that a DNS A record with multiple IPs doesn't provide failover, only round robin. But you can use the Cloudflare DNS proxy feature as a poor man's LB. Just add 2+ proxied A records (orange cloud) pointing to different machines. If one goes down with a 52x error, Cloudflare automatically fails over to the healthy one.
But I wonder what this solves?
Because I stopped abusing k8s and started using more container hosts with quadlets instead, using Ansible or Terraform depending on what the situation calls for.
It works just fine imho. The CI/CD pipeline triggers a podman auto-update command, and just like that all containers are running the latest version.
So what does uncloud add to this setup?
Your setup sounds like single-node or nodes that don't need to discover each other. If you ever need multi-node with service-to-service communication, that's where stitching together Ansible + Terraform + quadlets + some networking layer starts to get tedious. Uncloud tries to make that part simple out of the box.
You also get the reverse proxy (Caddy) that automatically reconfigures depending on what containers are running on machines. You just deploy containers and it auto-discovers them. If a container crashes, the configuration is auto-updated to remove the faulty container from the list of upstreams.
Plus a single CLI you run locally or on CI to manage everything, distribute images, stream logs. A lot of convenience that I'm putting together to make the user experience more enjoyable.
But if you don't need that, keep doing what works.
1. External requests, e.g. from the internet via the reverse proxy (Caddy) running in the cluster.
The rollout works on the container, not the server level. Each container registers itself in Caddy so it knows which containers to forward and distribute requests to.
When doing a rollout, a new version of container is started first, registers in caddy, then the old one is removed. This is repeated for each service container. This way, at any time there are running containers that serve requests.
It doesn’t say any server that requests shouldn’t go there. It just updates upstreams in the caddy config to send requests to the containers that are up and healthy.
2. Service to service requests within the cluster. In this case, a service DNS name is resolved to a list of IP addresses (running containers). And the client decides which one to send a request to or whether to distribute requests among them.
When the service is updated, the client needs to resolve the name again to get the up-to-date list of IPs. Many http clients handle this automatically so using http://service-name as an endpoint typically just works. But zero downtime should still be handled by the client in this case.
Like rescheduling automatically a container on another server if a server is down? Deploying on the less filled server first if you have set limits in your containers?
There is no automatic rescheduling in uncloud by design. At least for now. We will see how far we can get without it.
If you want your service to tolerate a host going down, you should deploy multiple replicas for that service on multiple machines in advance. 'uc scale' command can be used to run more replicas for an already deployed service.
Longer term, I'm thinking we can have a concept of primary/standby replicas for services that can only have one running replica, e.g. databases. Something similar to how Fly.io does this: https://fly.io/docs/apps/app-availability/#standby-machines-...
Regarding deploying on the less filled machine first is doable but not supported right now. By default, it picks the first machine randomly and tries to distributes replicas evenly among all available machines. You can also manually specify what target machine(s) each service should run on in your Compose file.
I want to avoid recreating the complexity with placement constraints, (anti-)affinity, etc. that makes K8s hard to reason about. There is a huge class of apps that need more or less static infra, manual placement, and a certain level of redundancy. That's what I'm targeting with Uncloud.
But this goes with assumption that one already know docker compose spec. For exact same reason I'm in love for `podman kube play` to just use k8s manifests to quickly test run on local machine - and not bother with some "legacy" compose.
(I never liked Docker Inc. so I never learned THEIR tooling, it's not needed to build/run containers)
I share the same concern as top comments on security but going to check out out in more detail.
I wonder if you integrated some decentralized identity layer with DIDs, if this could be turned into some distributed compute platform?
Also, what is your thinking on high availability and fail failovers?
> I’m building Uncloud after years of managing Kubernetes
did you manage Kubernetes, or did you make the fateful mistake of managing microk8s?
What specifically do you mean by ipv6 support?
If you do anything professional, you better choose proven software like kubernetes or managed kubernetes or whatever else all the hyperscalers provide.
And the complexity you are solving now or have to solve, k8s solved. IaC for example, Cloud Provider Support for provisioning a LB out of the box, cert-manager, all the helm charts for observability, logging, a ecosystem to fall back to (operators), ArgoCD <3, storage provisioning, proper high availability, kind for e2e testing on cicd, etc.
I'm also aways lost why people think k8s is so hard to operate. Just take a managed k8s. There are so many options out there and they are all compatible with the whole k8s ecosystem.
Look if you don't get kubernetes, its use casees, advantages etc. fine absolutly fine but your solution is not an alternative to k8s. Its another container orchestrator like nomad and k8s and co. with it own advantages and disadvantages.
I need to run on-prem, so managed k8s is not an option. Experts tells me I should have 2 FTE to run k8s, which I don't have. k8s has so many components, how should I debug that in case of issues without k8s experience? k8s APIs change continuously, how should I manage that without k8s experience?
It's not a k8s replacement. But I do see a sweet spot for such a solution. We still run Docker Swarm on 5 servers, no hyperscalers, no API changes expected ;-)
Some people would rather build their own solutions to do these things with fine-grain control and the ability to handle workloads more complex that a shopping cart website.
To me, the control plane is the primary feature of kubernetes and one I would not want to go without.
I know this describes operational overhead as a reason, but how it relates to the control plane is not clear to me. even managing a few hundred nodes and maybe 10,000 containers, relatively small - I update once a year and the managed cluster updates machine images and versions automatically. Are people trying to self host kubernetes for production cases, and that’s where this pain comes from?
Sorry if it is a rude question.
That feels not small to me. For something I'm working on I'll probably have two nodes and around 10 containers. If it works out and I get some growth, maybe that will go up to, say, 5-7 nodes and 30 or so containers? I dunno. I'd like some orchestration there, but k8s feels way too heavy even for my "grown" case.
I feel like there are potentially a lot of small businesses at this sort of scale?
It’s a similar experience when a cloud provider manages the control plane for you. But you have to worry about the availability when you host everything yourself. Losing etcd quorum results in an unusable cluster.
Many people want to avoid this, especially when running at a smaller scale like a handful of machines.
The cluster network can even partition and each partition continues to operate allowing to deploy/update apps individually.
That’s essentially what we all did in a pre-k8s era with chef and ansible but without the boilerplate and reinventing the wheel, and using the learnings from k8s and friends.
I have managed custom server clusters in a self hosted situation. the problems are hard, but if you’re small, why would you reach for such a solution in the first place? you’d be better off paying for a managed service. What situation forces so many people to reach to self hosted kubernetes?
Of course they are…? That’s half the point of k8s - if you want to self host, you can, but it’s just like backups: if you never try it, you should assume you can’t do it when you need to
And that's just your CI jobs, right? ;)
Is a way to run arbitrary processes on a bunch of servers.
But what if your processes are known beforehand? Than you don't need a scheduler, nor an orchestrator.
If it's just your web app with two containers and nothing more?
On cloud, in my experience, you are mostly paying for compute with managed kubernetes instances. The overhead and price is almost never kubernetes itself, but the compute and storage you are provisioning, which, thanks to the control plane, you have complete control over. what am i missing?
I wouldn’t dare try to with a small shop try to self host a production kubernetes solution unless i was under duress. But I just dont see what the control plane has to do with it. It’s the feature that makes kubernetes worth it.
A control plane makes controlling machines easier, that's the point of a control plane.
IMO kubernetes is great if your job is to fiddle with Kubernetes. But damn, the overhead is insane. There is this broad swathe of middle-sized tech companies and non-tech Internet application providers (eg ecommerce, governments, logistics, etc.) that spend a lot of their employees' time operating Kubernetes clusters, and a lot of money on the compute for those clusters, which they probably overprovision and also overpay for through some kind of managed Kubernetes/hyperscaler platform + a bunch of SaaS for things like metrics and logging, container security products, alerting. A lot of these guys are spending 10-40% of their budget on compute, payroll, and SaaS to host CRUD applications that could probably run on a small number of servers without a "platform" team behind it, just a couple of developers who know what they're doing.
Unless they're paying $$$ each of these deployments is running their own control plane and dealing with all the operational and cognitive overhead that entails. Most of those are running in a small number of datacenters alongside a bunch of other people running/managing/operating kubernetes clusters of their own. It's insanely wasteful because if there were a proper multitenant service mesh implementation (what I'm working on) that was easy to use, everybody could share the same control plane ~per datacenter and literally just consume the Kubernetes APIs they actually need, the ones that let them run and orchestrate/provision their application, and forget about all the fucking configuration of their cluster. BTW, that is how Borg works, which Kubernetes was hastily cobbled-together to mimic in order to capitalize on Containers Being So Hot Right Now.
The vast majority of these Kubernetes users just want to run their applications, their customers don't know or care that Kubernetes is in the picture at all, and the people writing the checks would LOVE to not be spending so much and money on the same platform engineering problems as every other midsize company on the Internet.
> what is the use case or benefit of not having a control plane?
All that is to say, it's not having to pay for a bunch of control plane nodes and SaaS and a Kubernetes guy/platform team. At small and medium scales, it's running a bunch of container instances as long as possible without embarking on a 6-24mo, $100k-$10m+ expedition to Do Kubernetes. It's not having to secure some fricking VPC with a million internal components and plugins/SaaS, it's not letting some cloud provider own your soul, and not locking you in to something so expensive you have to hire an entire internal team of Kubernetes-guys to set it up.
All the value in the software industry comes from the actual applications people are paying for. So the better you can let people do that without infrastructure getting in the way, the better. Making developers deal with this bullshit (or deciding to have 10-30% of your developers deal with it fulltime) is what gets in the way: https://kubernetes.io/docs/concepts/overview/components/
I can only speak most recently for EKS, but the cost is spent almost entirely on compute. I’m a one man shop managing 10,000 containers. I basically only spend on the compute itself, which is not all that much, and certainly far, far less than hiring a sys admin. Self hosted anything would be a huge PITA for me and likely end up costing more.
Yes, you can avoid kubernetes and being a “slave” to cloud providers, but I personally believe you’re making infrastructure tradeoffs in a bad way, and likely spending as much in the long run anyway.
maybe my disconnect here is that I mostly deal with full production scale applications, not hobby projects I am hosting on my own network (nothing wrong with that, and I would agree k8s is overkill for something like that).
Eventually though, at scale, I strongly believe you will need or want a control plane of some type for your container fleets, and that typically ends up looking or acting like k8s.
Some questions I have based on my swarm usage:
- do you plan to support secrets?
- with swarm and traefik, I can define url rewrite rules as container labels. Is something equivalent available?
- if I deploy 2 compose 'stacks', do all containers have access to all other containers, even in the other stack?
Yep, you define the mapping between the domain name and the internal container port as `x-ports: app.example.com:8000/https` in the compose file. Or you can specify a custom Caddy config for the service as `x-caddy: Caddyfile` which allows to customise it however you like. See https://uncloud.run/docs/concepts/ingress/publishing-service...
>if I deploy 2 compose 'stacks', do all containers have access to all other containers, even in the other stack?
Yes, there is no network isolation between containers from different services/stacks at the moment. Here is an open discussion on stack/namespace/environment/project concepts and isolation: https://github.com/psviderski/uncloud/discussions/94.
What's your use case and how would you want this to behave?
I'm deploying Swarm and traefik as described here: https://dockerswarm.rocks/traefik/#create-the-docker-compose...
I like that I can put my containers to be exposed on the traefik-public network, and keep others like databases unreachable from traefik. This organisation of networks is very useful, allowing to make containers reachable across stacks, but also to keep some containers in a stack reachable only from other containers on the same network in that same stack.
Regarding questions 2 and 3, the short answers are "not at the moment" and "yes, for now", here's a relevant discussion that touches on both points: https://github.com/psviderski/uncloud/discussions/94
Speaking of Swarm and your experience with it: in your opinion, is there anything that Swarm lacks or makes difficult, that tools like Uncloud could conceptually "fix"?
- energy in the community is low, it's hard to find an active discussion channel of swarm users
- swarm does not support the complete compose file format. This is really annoying
- sometimes, deploys fail for unclear reasons (eg a network was not found, but why as it's defined in the compose file?) and work the next try. This is never lead to problems, but doesn't feel right
- working with authenticate/custom registries is somewhat cumbersome
- having to work with registries to have the same image deployed on all nodes is sometimes annoying. It could be cool to have images spreading across nodes.
- there's no contact between devs and users. I've just discovered uncloud and I've had more contact with its devs here than in years of using swarm!
- the firewalling is not always clear/clean
- logs accessibility (service vs container) and containers identification: when a container fails to start, it's sometimes harder than needed to debug (esp when it is because the image is not available)
You can't really do anything with it except work for Hashicorp for free, or create a fork that nobody is allowed to use unless they self-host it.
BTW just looking at other variations on the theme:
- https://dokploy.com/
- https://coolify.io/
- https://demo.kubero.dev/
Feel free to add more.
I'm always looking for new alternatives there, I've recently tried Coolify but it didn't feel very polished and mostly clunky. I'm still happy with Dokku at this point but would love to have a better UI for managing databases etc.
- What databases you want to work with?
- What functionality you want from such a UI?
- What database size we are talking here?
Asking because I am tinkering with a similar idea.
If you want something even simpler, something that doesn't run on your servers at all, you can look at Kamal: https://kamal-deploy.org
What I like about Kamal is that it's backed by a company that actually fully moved out of K8s and cloud, so every release is battle-tested first.