I'd be curious what a better alternative looks like.
I'm a huge fan of keeping things simple (vertically scaling 1 server with Docker Compose and scaling horizontally only when it's necessary) but having learned and used Kubernetes recently for a project I think it's pretty good.
I haven't come across too many other tools that were so well thought out while also guiding you into how to break down the components of "deploying".
The idea of a pod, deployment, service, ingress, job, etc. are super well thought out and are flexible enough to let you deploy many types of things but the abstractions are good enough that you can also abstract away a ton of complexity once you've learned the fundamentals.
For example you can write about 15 lines of straight forward YAML configuration to deploy any type of stateless web app once you set up a decently tricked out Helm chart.. That's complete with running DB migrations in a sane way, updating public DNS records, SSL certs, CI / CD, having live-preview pull requests that get deployed to a sub-domain, zero downtime deployments and more.
> once you set up a decently tricked out Helm chart
I don't disagree but this condition is doing a hell of a lot of work.
To be fair, you don't need to do much to run a service on a toy k8s project. It just gets complicated when you layer on all production grade stuff like load balancers, service meshes, access control, CI pipelines, o11y, etc. etc.
> To be fair, you don't need to do much to run a service on a toy k8s project.
The previous reply is based on a multi-service production grade work load. Setting up a load balancer wasn't bad. Most cloud providers that offer managed Kubernetes make it pretty painless to get their load balancer set up and working with Kubernetes. On EKS with AWS that meant using the AWS Load Balancer Controller and adding a few annotations. That includes HTTP to HTTPS redirects, www to apex domain redirects, etc.. On AWS it took a few hours to get it all working complete with ACM (SSL certificate manager) integration.
The cool thing is when I spin up a local cluster on my dev box, I can use the nginx ingress instead and everything works the same with no code changes. Just a few Helm YAML config values.
Maybe I dodged a bullet by starting with Kubernetes so late. I imagine 2-3 years ago would have been a completely different world. That's also why I haven't bothered to look into using Kubernetes until recently.
> I don't disagree but this condition is doing a hell of a lot of work.
It was kind of a lot of work to get here, but it wasn't anything too crazy. It took ~160 hours to go from never using Kubernetes to getting most of the way there. This also includes writing a lot of ancillary documentation and wiki style posts to get some of the research and ideas out of my head and onto paper so others can reference it.
Thanks, that was actually a wildly misleading typo haha. I meant to write "sane" way and have updated my previous comment.
For saFeness it's still on us as developers to do the dance of making our migrations and code changes compatible with running both the old and new version of our app.
But for saNeness, Kubernetes has some neat constructs to help ensure your migrations only get run once even if you have 20 copies of your app performing a rolling restart. You can define your migration in a Kubernetes job and then have an initContainer trigger the job while also using kubectl to watch the job's status to see if it's complete. This translates to only 1 pod ever running the migration while other pods hang tight until it finishes.
I'm not a grizzled Kubernetes veteran here but the above pattern seems to work in practice in a pretty robust way. If anyone has any better solutions please reply here with how you're doing this.
It simpler than that for simple scenarios. `kubectl run` can set you up with a standard deployment + service. Then you can describe the resulting objects, save the yaml, and adapt/reuse as you need.
> For example you can write about 15 lines of straight forward YAML configuration to deploy any type of stateless web app once you set up a decently tricked out Helm chart.
I understand you might outsource the Helm chart creation but this sounds like oversimplifying a lot, to me. But maybe I'm spoiled by running infra/software in a tricky production context and I'm too cynical.
It's not too oversimplified. I have a library chart that's optimized for running a web app. Then each web app uses that library chart. Each chart has reasonable default values that likely won't have to change so you're left only having to change the options that change per app.
That's values like number of replicas, which Docker image to pull, resource limits and a couple of timeout related values (probes, database migration, etc.). Before you know it, you're at 15ish lines of really straight forward configuration like `replicaCount: 3`.
It's just not finished yet. with < 0.01% of the funding kube has, it has many times more design and elegance. Help us out. Have a look and tell me what you think. =D
My two cents is that docker compose is an order of magnitude simpler to troubleshoot or understand than Kubernetes but the problem that Kubernetes solves is not that much more difficult.
As a Kubernetes outsider, I get confused why so much new jargon had to be introduced. As well as so many little new projects coupled to Kubernetes with varying degrees of interoperability. It makes it hard to get a grip on what Kube really is for newcomers.
It also has all the hallmarks of a high-churn product where you need to piece together your solution from a variety of lower-quality information sources (tutorials, QA sites) rather than a single source of foolproof documentation.
> I get confused why so much new jargon had to be introduced.
Consider the source of the project for your answer (mainly, but not entirely, bored engineers who are too arrogant to think anybody has solved their problem before).
> It also has all the hallmarks of a high-churn product where you need to piece together your solution from a variety of lower-quality information sources (tutorials, QA sites) rather than a single source of foolproof documentation.
This describes 99% of open source libraries used.The documentation looks good because auto doc tools produce a prolific amount of boilerplate documentation. In reality the result is documentation that's very shallow, and often just a re-statement of the APIs. The actual usage documentation of these projects is generally terrible, with few exceptions.
> Consider the source of the project for your answer (mainly, but not entirely, bored engineers who are too arrogant to think anybody has solved their problem before).
This seems both wrong and contrary to the article (which mentions that k8s is a descendant of Borg, and in fact if memory serves many of the k8s authors were borg maintainers). So they clearly were aware that people had solved their problem before, because they maintained the tool that had solved the problem for close to a decade.
I always find it surprising that I have yet to see or touch Kubernetes (and I work as an SRE with container workloads for several years now), and yet HN threads about it are full of people who apparently think it's the only possible solution and are flabbergasted that people don't pray to it nightly.
I think one part of this the lack of accepted nomenclature in CS - naming convention is typically not enforced, unlike if you'd have to produce an engineering drawing for it and have it conform to a standard.
For engineering, the common way is to use a couple of descriptive words + basic noun so things do get boring quite quickly but very easy to understand, say something like Google 'Cloud Container Orchestrator' instead of Kubernetes.
The Kubernetes documentation site is the source of truth, and pretty well written, though obviously no set of docs is perfect.
The concepts and constructs do not usually change in breaking ways once they reach beta status. If you learned Kubernetes in 2016 as an end user, there are certainly more features but the core isn’t that different.
So the basic problem with *nix is its permission model. If we had truly separable security/privilege/resource domains then Linux wouldn't have needed containers and simple processes and threads could have sufficed in place of Borg/docker/Kubernetes.
There's a simpler and more powerful security model; capabilities. Capabilities fix 90% of the problems with *nix.
There's currently no simple resource model. Everything is an ad-hoc human-driven heuristic for allocating resources to processes and threads, and is a really difficult problem to solve formally because it has to go beyond algorithmic complexity and care about the constant factors as well.
The other *nix problem is "files". Files were a compromise between usability and precision but very few things are merely files. Devices and sockets sure aren't. There's a reason the 'file' utility exists; nothing is really just a file. Text files are actually text files + a context-free grammar (hopefully) and parser somewhere, or they're human-readable text (but probably with markup, so again a parser somewhere).
Plenty of object models have come and gone; they haven't simplified computers (much less distributed computers), so we'll need some theory more powerful than anything we've had in the past to express relationships between computation, storage, networks, and identities.
Most of the time the chroot functionality of Docker is a hindrance, not a feature. We need chroots because we still haven't figured out packaging properly.
(Maybe Nix will eventually solve this problem properly; some sort of docker-compose equivalent for managing systemd services is lacking at the moment.)
I mean, containers can provide isolation. Linux has had a hard time getting that to be reliable because it started with the wrong model: building containers subtractively rather than additively. Though even starting with the right model, until you have isolation for every last bit of shared context that the OS provides (harder to identify than it may seem at first blush!) you won't have a complete solution. And yes, software-based containers will tend to have some leakage. Even sharing hardware with hardware isolation features might not be enough (hello row hammer).
It would be good to have containers aim to provide the maximum possible isolation.
> Containers never solved the permission model. They solved the packaging and idempotency problem
Disagree. Containers are primarily about separation and decoupling. Multiple services on one server often have plenty of ways to interact and see each other and are interdependent in non-trivial ways (e.g. if you want to upgrade the OS, you upgrade it for all services together). Services running each in its own container provides separation by default.
OTOH, containers as a technology has nothing to do with packaging, reproducibility and deployment. Just these changes arrived together (e.g. with Docker) so they are often associated, but you can have e.g. LXC containers that can be managed in the same way as traditional servers (by ssh into a container).
> I really dislike when people assume containers give them security, it’s the wrong thing to think about.
to be fair, there is lots of published text around suggesting that this _is_ the case. many junior to semi-experienced engineers i've known have at some point thought it's plausible to "ssh into" a container. they're seen as light-weight VMs, not as what they are - processes.
> Containers allowed us to deploy reproducibly, that’s powerful.
and it was done in the most "to bake an apple pie from scratch, you must first create the universe" approach.
> There's a simpler and more powerful security model; capabilities. Capabilities fix 90% of the problems with *nix.
What do you think about using filedescriptors as capabilties? Capsicum (for FreeBSD, I think) extends this notion quite a bit. Personally I feel it is not quite "right", but I haven't sat down and thought hard about what is missing.
> we'll need some theory more powerful than anything we've had in the past to express relationships between computation, storage, networks, and identities.
Do you have any particular things in mind which points in this direction? I would like to understand what the status quo is.
I haven't looked at capsicum specifically, but from the simple overview I read it sounds like it is more similar to dropping root privileges when demonizing and not the basis for a whole-OS security model. E.g. there isn't (in my limited reading) a way to grant a new file descriptor to a process after it calls cap_enter. Consider a web browser that wants to download or upload a file; there should be a way for the operator to grant that permission to the browser from another process (the OS UI or similar) after it starts running.
To be effective capabilities also need a way to be persistent so that a server daemon doesn't have to call cap_enter but can pick up its granted capabilities at startup. Capsicum looks like a useful way to build more secure daemons within Unix using a lot of capability features.
I also think file descriptors are not the fundamental unit of capability. Capabilities should also cover processes, threads, and the objects managed by various other syscalls.
> Do you have any particular things in mind which points in this direction? I would like to understand what the status quo is.
Unfortunately I don't have great suggestions. The most secure model right now is seL4, and its capability model covers threads, message-passing endpoints, and memory allocation(subdivision) and retyping as kernel memory to create new capabilities and objects. The kernel is formally verified but afaik the application/user level is not fleshed out as a convenient development environment nor as a distributed computing environment.
For distributed computing a capability model would have to satisfy and solve distributed trust issues which probably means capabilities based on cryptographic primitives, which for practical implementations would have to extend full trust between kernels in different machines for speed. But for universality it should be possible to work with capabilities at an abstraction level that allows both deep-trust distributed computers and more traditional single-machine trust domains without having to know or care which type of capabilities to choose when writing the software, only when running it.
I think a foundation for universal capabilities needs support for different trust domains and a way to interoperate between them.
1. Identifying the controller for a particular capability, which trust domain it is in, and how to access it.
2. Converting capabilities between trust domains as the objects to which they refer move.
3. Managing any necessary identity/cryptographic tokens necessary to cross trust domains.
4. Controlling the ability to grant or use capabilities across trust domains.
A simple example; a caller wants to invoke a capability on a utility process which produces an output, to which the caller wants to receive a capability to read the output.
The processes may not live on the same machine.
The processes may not be in the same trust domain.
The resulting object may be on a third machine or trust domain.
The caller may have inherited privacy enforcement on all owned capabilities that necessitates e.g. translating the binary code of the second process into a fully homomorphically encrypted circuit which can run on a different trust domain while preserving privacy and provisioning the necessary keys for this in the local trust domain so that the capability to the new object can actually read it.
The process may migrate to a remote machine in a different trust domain in the middle of processing, in which case the OS needs to either fail the call (making for an unfortunately complicated distributed computer) or transparently snapshot or rollback the state of the process for migration, transmit it and any (potentially newly encrypted) data, and update the capabilities to reflect the new location and trust domain.
Basically if the capability model isn't capable of solving these issues for what would be very simple local computing then it's never going to satisfy the OP's desire for a more simple distributed computation model.
I think it's also clear why *nix is woefully short of being able to accomplish it. *\nix is inherently local and has a single trust domain and forces userland code to handle interaction with other trust domains except in the very limited model of network file systems (and in the case of NFS essentially an enforced single trust domain with synchronized user/group IDs)
Windows has capabilities. It's the combination of handles (file, process, etc.) and access tokens.
But you'll note no one is really deploying windows workloads to the cloud. Why? Well, because you'd still have to build a framework for managing all those permissions, and it hasn't been done. Also, you might end up with SVCHOST problem, where you host many different services/apps/whatever in one very threaded process because you can.
Capabilities aren't necessarily simpler. Especially if you can delegate them without controls -- now you have no idea what the actual running permissions are, only the cold start baseline.
No, I think the permissions thing is a red herring. Very much on the contrary, I think workload division into coarse-grained containers are great for permissions because fine-grained access control is hard to manage. Of course, you can't destroy complexity, only move it around, so if you should end up with many coarse-grained access control units then you'll still have a fine-grained access control system in the end.
Files aren't really a problem either. You can add metadata to files on Linux using xattrs (I've built a custom HTTP server that takes some response headers for static resources, like Content-Type, from xattrs). The problem you're alluding to is duck-typing as opposed to static typing. Yes, it's a problem -- people are lazy, so they don't type-tag everything in highly lazy typing systems. So what? Windows also has this problem, just a bit less so than Unix. Python and JS are all the rage, and their type systems are lazy and obnoxious. It's not a problem with Unix. It's a problem with humans. Lack of discipline. Honestly, there are very few people who could use Haskell as a shell!
> Plenty of object models have come and gone;
Yeah, mostly because they suck. The right model is Haskell's (and related languages').
> so we'll need some theory more powerful than anything we've had in the past ...
I think that's Haskell (which is still evolving) and its ecosystem (ditto).
But at the end of the day, you'll still have very complex metadata to manage.
What I don't understand is how all your points tie into Kubernetes being today's Multics.
Kubernetes isn't motivated by Unix permissions sucking. We had fancy ACLs in ZFS in Solaris and still also ended up having Zones (containers). You can totally build an application-layer cryptographic capability system, running each app as its own isolated user/container, and to some degree this is happening with OAuth and such things, but that isn't what everyone is doing, all the time.
Kubernetes is most definitely not motivated by Unix files being un-typed either.
I hope readers end up floating the other, more on-topic top-level comments in this thread back to the top.
The alternatives to Kubernetes are even more complex. Kubernetes takes a few weeks to learn. To learn alternatives, it takes years, and applications built on alternatives will be tied to one cloud.
You'd have to learn AWS autoscaling group (proprietary to AWS), Elastic Load Balancer (proprietary to AWS) or HAProxy, Blue-green deployment, or phased rollout, Consul, Systemd, pingdom, Cloudwatch, etc. etc.
Kubernetes uses all those underlying AWS technologies anyway (or at least an equivalently complex thing). You still have to be prepared to diagnose issues with them to effectively administrate Kubernetes.
At least with building to k8s you can shift to another cloud provider if those problems end up too difficult to diagnose or fix. Moving providers with a k8s system can be a weeks long project rather than a years long project which can easily make the difference between surviving and closing the doors. It's not a panacea but it at least doesn't make your system dependent on a single provider.
That hasn't been my experience. I use Kubernetes on Google cloud (because they have the best implementation of K8s), and I have never had to learn any Google-proprietary things.
cloud agnosticism is, in my experience, a red herring. It does not matter and the effort required to move from one cloud to another is still non-trivial.
I like using the primitives the cloud provides, while also having a path to - if needed - run my software on bare metal. This means: VMs, decoupling the logging and monitoring from the cloud svcs (use a good library that can send to cloudwatch for eg. prefer open source solutions when possible), do proper capacity planning (and have the option to automatically scale up if the flood ever comes), etc.
> The alternatives to Kubernetes are even more complex. Kubernetes takes a few weeks to learn.
Learning Heroku and starting using it takes maybe an hour. It's more expensive and you won't have as much control as with Kubernetes, but we used it in production for years for fairly big microservice based project without problems.
This feels like a post ranting against SystemD written from someone who likes init.
I understand that K8 does many things but its also how you look at the problem. K8 does one thing well, manage complex distributed systems such as knowing when to scale up and down if you so choose and when to start up new pods when they fail.
Arguably, this is one problem that is made up of smaller problems that are solved by smaller services just like SystemD works.
Sometimes I wonder if the Perlis-Thompson Principle and the Unix Philosophy have become a way to force a legalistic view of software development or are just out-dated.
I don't find the comparison to systemd to be convincing here.
The end-result of systemd for the average administrator is that you no longer need to write finicky, tens or hundreds of line init scripts. They're reduced to unit files which are often just 10-15 lines. systemd is designed to replace old stuff.
The result of Kubernetes for the average administrator is a massively complex system with its own unique concepts. It needs to be well understood if you want to be able to administrate it effectively. Updates come fast and loose, and updates are going to impact an entire cluster. Kubernetes, unlike systemd, is designed to be built _on top of_ existing technologies you'd be using anyway (cloud provider autoscaling, load balancing, storage). So rather than being like systemd, which adds some complexity and also takes some away, Kubernetes only adds.
> So rather than being like systemd, which adds some complexity and also takes some away, Kubernetes only adds.
Here are some bits of complexity that managed Kubernetes takes away:
* SSH configuration
* Key management
* Certificate management (via cert-manager)
* DNS management (via external-dns)
* Auto-scaling
* Process management
* Logging
* Host monitoring
* Infra as code
* Instance profiles
* Reverse proxy
* TLS
* HTTP -> HTTPS redirection
So maybe your point was "the VMs still exist" which is true, but I generally don't care because the work required of me goes away. Alternatively, you have to have most/all of these things anyway, so if you're not using Kubernetes you're cobbling together solutions for these things which has the following implications:
1. You will not be able to find candidates who know your bespoke solution, whereas you can find people who know Kubernetes.
2. Training people on your bespoke solution will be harder. You will have to write a lot more documentation whereas there is an abundance of high quality documentation and training material available for Kubernetes.
3. When something inevitably breaks with your bespoke solution, you're unlikely to get much help Googling around, whereas it's very likely that you'll find what you need to diagnose / fix / work around your Kubernetes problem.
4. Kubernetes improves at a rapid pace, and you can get those improvements for nearly free. To improve your bespoke solution, you have to take the time to do it all yourself.
5. You're probably not going to have the financial backing to build your bespoke solution to the same quality caliber that the Kubernetes folks are able to devote (yes, Kubernetes has its problems, but unless you're at a FAANG then your homegrown solution is almost certainly going to be poorer quality if only because management won't give you the resources you need to build it properly).
Right, I really dislike systemd in many ways ... but I love what it enables people to do and accept that for all my grumpyness about it, it is overall a net win in many scenarios.
k8s ... I think is often overkill in a way that simply doesn't apply to systemd.
Kubernetes removes the complexity of keeping a process (service) available.
There’s a lot to unpack in that sentence, which is to say there’s a lot of complexity it removes.
Agree it does add as well.
I’m not convinced k8s is a net increase in complexity after everything is accounted for. Authentication, authorization, availability, monitoring, logging, deployment tooling, auto scaling, abstracting the underlying infrastructure, etc…
> K8 does one thing well, manage complex distributed systems such as knowing when to scale up and down if you so choose and when to start up new pods when they fail.
K8S does very simple stateless case well, but anything more complicated and you are on your own. Statefull services is still a major pain especially thus with leader elections. There is not feedback to K8S about application state of the cluster, so it can't know which instancess are less disruptive to shut down or which shard needs more capacity.
> I understand that K8 does many things but its also how you look at the problem. K8 does one thing well, manage complex distributed systems such as knowing when to scale up and down if you so choose and when to start up new pods when they fail.
Also, in the sense of "many small components that each do one thing well", k8s is even more Unix-like than Unix in that almost everything in k8s is just a controller for a specific resource type.
I'm not sure that "fewer concepts" is a win. "Everything is a file" went too far with Linux, where you get status from the kernel by reading what appears to be various text files. But that runs into all the complexities of maintaining the file illusion. What if you read it in small blocks? Does it change while being read? If not, what if you read some of it and then just hold the file handle. Are you tying up kernel memory? Holding important locks? Or what?
Orchestration has a political and business problem, too. How does Amazon feel about something that runs most jobs on your own bare metal servers and rents extra resources from AWS only during overload situations? This appears to be the financially optimal strategy for compute-bound work such as game servers. Renting bare iron 24/7 at AWS prices is not cost effective.
Having had a play with a few variants on this theme, I think kernel based abstractions are the mistake here. It's too low level and too constrained by the low-level details of the API, as you've said yourself.
If you look at something like PowerShell, it has a variant of this abstraction that is implemented in user mode. Within the PowerShell process, there are provider plugins (DLLs) that implement various logical filesystems like "environment variables", "certificates", "IIS sites", etc...
These don't all implement the full filesystem APIs! Instead they have various subsets. E.g.: for some providers only implement atomic reads and writes, which is what you want for something like kernel parameters, but not generic data files.
I feel like we've already seen some alternatives and the industry, thus far, is still orienting towards k8s.
Hashicorp's stack, using Nomad as an orchestrator, is much simpler and more composable.
I've long been a fan of Mesos' architecture, which I also think is more composable than the k8s stack.
I just find it surprising an article that is calling for an evolution of the cluster management architecture fails to investigate the existing alternatives and why they haven't caught on.
Setting up the right parameters/eval criteria to exercise inside of a few week timebox (I'm assuming this wasn't a many month task) is extremely difficult to do for a complex system like this. At least, to me it is--maybe more ops focused folks can do it quicker.
Getting _something_ up and running quickly isn't necessarily a good indicator of how well a set of tools will work for you over time, in production work loads.
Several years ago -- so pre-K8s too -- I was tasked with setting up a Nomad cluster and failed miserably. Nomad and Consul are designed to be worked together but also designed distinctly enough that it was a bloody nightmare trying to figure out what order of priority things needed to be spun up and how they all interacted with each other. The documentation was more like a man page where you'd get a list of options but very little guidance on how to set it up, unlike K8s who's documentation has a lot of walk-through material.
Things might have improved massively for Nomad since but I honestly have no desire to learn. Having used other Hashicorp tools since, I see them make the same mistakes time and time again.
Now I'm not the biggest fan of K8s either. I completely agree that they're hugely overblown for most purposes despite being sold as a silver bullet for any deployment. But if there's one thing K8s does really well it's describing the different layers in a deployment and then wrapping that up in a unified block. There's less of the "this thing is working but is this other thing" when spinning up a K8s cluster.
For me when exploring K8s vs Nomad, Nomad looked like a clear choice. That was until I had to get Nomad + Consul running. I found it all really difficult to get running in a satisfactory manner. I never even touched the whole Vault part of the setup because it was all overwhelming.
On the other side K8s was a steep learning curve with lots of options and 'terms' to learn but never was a point into the whole exploration where I was stuck. The docs are great. the community is great and the number of examples available allows us to mix n match lots of different approaches.
There is a trap in distributed system design - seeking to scale-up from a single-host perspective. An example - we have apache and want to scale it up, so we put it in a container and generate its configuration so we can run several of them in parallel.
This leads to unnecessarily heavy systems - you do not need a container to host a server socket.
Industry puts algorithms and Big O on a pedestal. Most software projects start as someone building algorithms, with deployment and interactions only getting late attention. This is a bit like building the kitchen and bathroom before laying the foundations.
Algorithm centric design creates mathematically elegant algorithms that move gigabytes of io across the network for every minor transaction. Teams wrap commodity resource schedulers around carefully tuned worker nodes, and discover their performance is awful because the scheduler can’t deal in the domain language of the big picture problem.
I think it is interesting that the culture of Big O interviews and k8s both came out of Google.
I'm a huge fan of keeping things simple (vertically scaling 1 server with Docker Compose and scaling horizontally only when it's necessary) but having learned and used Kubernetes recently for a project I think it's pretty good.
I haven't come across too many other tools that were so well thought out while also guiding you into how to break down the components of "deploying".
The idea of a pod, deployment, service, ingress, job, etc. are super well thought out and are flexible enough to let you deploy many types of things but the abstractions are good enough that you can also abstract away a ton of complexity once you've learned the fundamentals.
For example you can write about 15 lines of straight forward YAML configuration to deploy any type of stateless web app once you set up a decently tricked out Helm chart.. That's complete with running DB migrations in a sane way, updating public DNS records, SSL certs, CI / CD, having live-preview pull requests that get deployed to a sub-domain, zero downtime deployments and more.
I don't disagree but this condition is doing a hell of a lot of work.
To be fair, you don't need to do much to run a service on a toy k8s project. It just gets complicated when you layer on all production grade stuff like load balancers, service meshes, access control, CI pipelines, o11y, etc. etc.
The previous reply is based on a multi-service production grade work load. Setting up a load balancer wasn't bad. Most cloud providers that offer managed Kubernetes make it pretty painless to get their load balancer set up and working with Kubernetes. On EKS with AWS that meant using the AWS Load Balancer Controller and adding a few annotations. That includes HTTP to HTTPS redirects, www to apex domain redirects, etc.. On AWS it took a few hours to get it all working complete with ACM (SSL certificate manager) integration.
The cool thing is when I spin up a local cluster on my dev box, I can use the nginx ingress instead and everything works the same with no code changes. Just a few Helm YAML config values.
Maybe I dodged a bullet by starting with Kubernetes so late. I imagine 2-3 years ago would have been a completely different world. That's also why I haven't bothered to look into using Kubernetes until recently.
> I don't disagree but this condition is doing a hell of a lot of work.
It was kind of a lot of work to get here, but it wasn't anything too crazy. It took ~160 hours to go from never using Kubernetes to getting most of the way there. This also includes writing a lot of ancillary documentation and wiki style posts to get some of the research and ideas out of my head and onto paper so others can reference it.
How?! Or is that more a "you provide the safe way, k8s just runs it for you" kind of thing, than a freebie?
For saFeness it's still on us as developers to do the dance of making our migrations and code changes compatible with running both the old and new version of our app.
But for saNeness, Kubernetes has some neat constructs to help ensure your migrations only get run once even if you have 20 copies of your app performing a rolling restart. You can define your migration in a Kubernetes job and then have an initContainer trigger the job while also using kubectl to watch the job's status to see if it's complete. This translates to only 1 pod ever running the migration while other pods hang tight until it finishes.
I'm not a grizzled Kubernetes veteran here but the above pattern seems to work in practice in a pretty robust way. If anyone has any better solutions please reply here with how you're doing this.
I understand you might outsource the Helm chart creation but this sounds like oversimplifying a lot, to me. But maybe I'm spoiled by running infra/software in a tricky production context and I'm too cynical.
That's values like number of replicas, which Docker image to pull, resource limits and a couple of timeout related values (probes, database migration, etc.). Before you know it, you're at 15ish lines of really straight forward configuration like `replicaCount: 3`.
https://github.com/purpleidea/mgmt/
It's just not finished yet. with < 0.01% of the funding kube has, it has many times more design and elegance. Help us out. Have a look and tell me what you think. =D
It also has all the hallmarks of a high-churn product where you need to piece together your solution from a variety of lower-quality information sources (tutorials, QA sites) rather than a single source of foolproof documentation.
Consider the source of the project for your answer (mainly, but not entirely, bored engineers who are too arrogant to think anybody has solved their problem before).
> It also has all the hallmarks of a high-churn product where you need to piece together your solution from a variety of lower-quality information sources (tutorials, QA sites) rather than a single source of foolproof documentation.
This describes 99% of open source libraries used.The documentation looks good because auto doc tools produce a prolific amount of boilerplate documentation. In reality the result is documentation that's very shallow, and often just a re-statement of the APIs. The actual usage documentation of these projects is generally terrible, with few exceptions.
This seems both wrong and contrary to the article (which mentions that k8s is a descendant of Borg, and in fact if memory serves many of the k8s authors were borg maintainers). So they clearly were aware that people had solved their problem before, because they maintained the tool that had solved the problem for close to a decade.
- containers focus on what you can do, easy to understand and you can start in 5 minutes
- kubernetes is the opposite, where verbose tutorials lose time explaining me how it works, rather than what i do with it.
https://news.ycombinator.com/item?id=27910185
https://news.ycombinator.com/item?id=27910481 - weird comparison to systemd
https://news.ycombinator.com/item?id=27910553 - another systemd comparison
https://news.ycombinator.com/item?id=27913239 - comparing it to git
For engineering, the common way is to use a couple of descriptive words + basic noun so things do get boring quite quickly but very easy to understand, say something like Google 'Cloud Container Orchestrator' instead of Kubernetes.
The concepts and constructs do not usually change in breaking ways once they reach beta status. If you learned Kubernetes in 2016 as an end user, there are certainly more features but the core isn’t that different.
There's a simpler and more powerful security model; capabilities. Capabilities fix 90% of the problems with *nix.
There's currently no simple resource model. Everything is an ad-hoc human-driven heuristic for allocating resources to processes and threads, and is a really difficult problem to solve formally because it has to go beyond algorithmic complexity and care about the constant factors as well.
The other *nix problem is "files". Files were a compromise between usability and precision but very few things are merely files. Devices and sockets sure aren't. There's a reason the 'file' utility exists; nothing is really just a file. Text files are actually text files + a context-free grammar (hopefully) and parser somewhere, or they're human-readable text (but probably with markup, so again a parser somewhere).
Plenty of object models have come and gone; they haven't simplified computers (much less distributed computers), so we'll need some theory more powerful than anything we've had in the past to express relationships between computation, storage, networks, and identities.
I really dislike when people assume containers give them security, it’s the wrong thing to think about.
Containers allowed us to deploy reproducibly, that’s powerful.
Docker replaced .tar.gz and .rpm, not chroots.
Most of the time the chroot functionality of Docker is a hindrance, not a feature. We need chroots because we still haven't figured out packaging properly.
(Maybe Nix will eventually solve this problem properly; some sort of docker-compose equivalent for managing systemd services is lacking at the moment.)
It would be good to have containers aim to provide the maximum possible isolation.
Disagree. Containers are primarily about separation and decoupling. Multiple services on one server often have plenty of ways to interact and see each other and are interdependent in non-trivial ways (e.g. if you want to upgrade the OS, you upgrade it for all services together). Services running each in its own container provides separation by default.
OTOH, containers as a technology has nothing to do with packaging, reproducibility and deployment. Just these changes arrived together (e.g. with Docker) so they are often associated, but you can have e.g. LXC containers that can be managed in the same way as traditional servers (by ssh into a container).
to be fair, there is lots of published text around suggesting that this _is_ the case. many junior to semi-experienced engineers i've known have at some point thought it's plausible to "ssh into" a container. they're seen as light-weight VMs, not as what they are - processes.
> Containers allowed us to deploy reproducibly, that’s powerful.
and it was done in the most "to bake an apple pie from scratch, you must first create the universe" approach.
What do you think about using filedescriptors as capabilties? Capsicum (for FreeBSD, I think) extends this notion quite a bit. Personally I feel it is not quite "right", but I haven't sat down and thought hard about what is missing.
> we'll need some theory more powerful than anything we've had in the past to express relationships between computation, storage, networks, and identities.
Do you have any particular things in mind which points in this direction? I would like to understand what the status quo is.
To be effective capabilities also need a way to be persistent so that a server daemon doesn't have to call cap_enter but can pick up its granted capabilities at startup. Capsicum looks like a useful way to build more secure daemons within Unix using a lot of capability features.
I also think file descriptors are not the fundamental unit of capability. Capabilities should also cover processes, threads, and the objects managed by various other syscalls.
> Do you have any particular things in mind which points in this direction? I would like to understand what the status quo is.
Unfortunately I don't have great suggestions. The most secure model right now is seL4, and its capability model covers threads, message-passing endpoints, and memory allocation(subdivision) and retyping as kernel memory to create new capabilities and objects. The kernel is formally verified but afaik the application/user level is not fleshed out as a convenient development environment nor as a distributed computing environment.
For distributed computing a capability model would have to satisfy and solve distributed trust issues which probably means capabilities based on cryptographic primitives, which for practical implementations would have to extend full trust between kernels in different machines for speed. But for universality it should be possible to work with capabilities at an abstraction level that allows both deep-trust distributed computers and more traditional single-machine trust domains without having to know or care which type of capabilities to choose when writing the software, only when running it.
I think a foundation for universal capabilities needs support for different trust domains and a way to interoperate between them.
A simple example; a caller wants to invoke a capability on a utility process which produces an output, to which the caller wants to receive a capability to read the output. I think it's also clear why *nix is woefully short of being able to accomplish it. *\nix is inherently local and has a single trust domain and forces userland code to handle interaction with other trust domains except in the very limited model of network file systems (and in the case of NFS essentially an enforced single trust domain with synchronized user/group IDs)But you'll note no one is really deploying windows workloads to the cloud. Why? Well, because you'd still have to build a framework for managing all those permissions, and it hasn't been done. Also, you might end up with SVCHOST problem, where you host many different services/apps/whatever in one very threaded process because you can.
Capabilities aren't necessarily simpler. Especially if you can delegate them without controls -- now you have no idea what the actual running permissions are, only the cold start baseline.
No, I think the permissions thing is a red herring. Very much on the contrary, I think workload division into coarse-grained containers are great for permissions because fine-grained access control is hard to manage. Of course, you can't destroy complexity, only move it around, so if you should end up with many coarse-grained access control units then you'll still have a fine-grained access control system in the end.
Files aren't really a problem either. You can add metadata to files on Linux using xattrs (I've built a custom HTTP server that takes some response headers for static resources, like Content-Type, from xattrs). The problem you're alluding to is duck-typing as opposed to static typing. Yes, it's a problem -- people are lazy, so they don't type-tag everything in highly lazy typing systems. So what? Windows also has this problem, just a bit less so than Unix. Python and JS are all the rage, and their type systems are lazy and obnoxious. It's not a problem with Unix. It's a problem with humans. Lack of discipline. Honestly, there are very few people who could use Haskell as a shell!
> Plenty of object models have come and gone;
Yeah, mostly because they suck. The right model is Haskell's (and related languages').
> so we'll need some theory more powerful than anything we've had in the past ...
I think that's Haskell (which is still evolving) and its ecosystem (ditto).
But at the end of the day, you'll still have very complex metadata to manage.
What I don't understand is how all your points tie into Kubernetes being today's Multics.
Kubernetes isn't motivated by Unix permissions sucking. We had fancy ACLs in ZFS in Solaris and still also ended up having Zones (containers). You can totally build an application-layer cryptographic capability system, running each app as its own isolated user/container, and to some degree this is happening with OAuth and such things, but that isn't what everyone is doing, all the time.
Kubernetes is most definitely not motivated by Unix files being un-typed either.
I hope readers end up floating the other, more on-topic top-level comments in this thread back to the top.
See prior discussion here: https://news.ycombinator.com/item?id=23463467
You'd have to learn AWS autoscaling group (proprietary to AWS), Elastic Load Balancer (proprietary to AWS) or HAProxy, Blue-green deployment, or phased rollout, Consul, Systemd, pingdom, Cloudwatch, etc. etc.
Oh it's Wednesday, ALB controller has shat itself again!
I like using the primitives the cloud provides, while also having a path to - if needed - run my software on bare metal. This means: VMs, decoupling the logging and monitoring from the cloud svcs (use a good library that can send to cloudwatch for eg. prefer open source solutions when possible), do proper capacity planning (and have the option to automatically scale up if the flood ever comes), etc.
Learning Heroku and starting using it takes maybe an hour. It's more expensive and you won't have as much control as with Kubernetes, but we used it in production for years for fairly big microservice based project without problems.
I understand that K8 does many things but its also how you look at the problem. K8 does one thing well, manage complex distributed systems such as knowing when to scale up and down if you so choose and when to start up new pods when they fail.
Arguably, this is one problem that is made up of smaller problems that are solved by smaller services just like SystemD works.
Sometimes I wonder if the Perlis-Thompson Principle and the Unix Philosophy have become a way to force a legalistic view of software development or are just out-dated.
The end-result of systemd for the average administrator is that you no longer need to write finicky, tens or hundreds of line init scripts. They're reduced to unit files which are often just 10-15 lines. systemd is designed to replace old stuff.
The result of Kubernetes for the average administrator is a massively complex system with its own unique concepts. It needs to be well understood if you want to be able to administrate it effectively. Updates come fast and loose, and updates are going to impact an entire cluster. Kubernetes, unlike systemd, is designed to be built _on top of_ existing technologies you'd be using anyway (cloud provider autoscaling, load balancing, storage). So rather than being like systemd, which adds some complexity and also takes some away, Kubernetes only adds.
Here are some bits of complexity that managed Kubernetes takes away:
* SSH configuration
* Key management
* Certificate management (via cert-manager)
* DNS management (via external-dns)
* Auto-scaling
* Process management
* Logging
* Host monitoring
* Infra as code
* Instance profiles
* Reverse proxy
* TLS
* HTTP -> HTTPS redirection
So maybe your point was "the VMs still exist" which is true, but I generally don't care because the work required of me goes away. Alternatively, you have to have most/all of these things anyway, so if you're not using Kubernetes you're cobbling together solutions for these things which has the following implications:
1. You will not be able to find candidates who know your bespoke solution, whereas you can find people who know Kubernetes.
2. Training people on your bespoke solution will be harder. You will have to write a lot more documentation whereas there is an abundance of high quality documentation and training material available for Kubernetes.
3. When something inevitably breaks with your bespoke solution, you're unlikely to get much help Googling around, whereas it's very likely that you'll find what you need to diagnose / fix / work around your Kubernetes problem.
4. Kubernetes improves at a rapid pace, and you can get those improvements for nearly free. To improve your bespoke solution, you have to take the time to do it all yourself.
5. You're probably not going to have the financial backing to build your bespoke solution to the same quality caliber that the Kubernetes folks are able to devote (yes, Kubernetes has its problems, but unless you're at a FAANG then your homegrown solution is almost certainly going to be poorer quality if only because management won't give you the resources you need to build it properly).
k8s ... I think is often overkill in a way that simply doesn't apply to systemd.
Wouldn't the hundreds of lines of finicky, bespoke Ansible/Chef/Puppet configs required to manage non-k8s infra be the equivalent to this?
There’s a lot to unpack in that sentence, which is to say there’s a lot of complexity it removes.
Agree it does add as well.
I’m not convinced k8s is a net increase in complexity after everything is accounted for. Authentication, authorization, availability, monitoring, logging, deployment tooling, auto scaling, abstracting the underlying infrastructure, etc…
K8S does very simple stateless case well, but anything more complicated and you are on your own. Statefull services is still a major pain especially thus with leader elections. There is not feedback to K8S about application state of the cluster, so it can't know which instancess are less disruptive to shut down or which shard needs more capacity.
Also, in the sense of "many small components that each do one thing well", k8s is even more Unix-like than Unix in that almost everything in k8s is just a controller for a specific resource type.
Deleted Comment
Orchestration has a political and business problem, too. How does Amazon feel about something that runs most jobs on your own bare metal servers and rents extra resources from AWS only during overload situations? This appears to be the financially optimal strategy for compute-bound work such as game servers. Renting bare iron 24/7 at AWS prices is not cost effective.
Having had a play with a few variants on this theme, I think kernel based abstractions are the mistake here. It's too low level and too constrained by the low-level details of the API, as you've said yourself.
If you look at something like PowerShell, it has a variant of this abstraction that is implemented in user mode. Within the PowerShell process, there are provider plugins (DLLs) that implement various logical filesystems like "environment variables", "certificates", "IIS sites", etc...
These don't all implement the full filesystem APIs! Instead they have various subsets. E.g.: for some providers only implement atomic reads and writes, which is what you want for something like kernel parameters, but not generic data files.
Hashicorp's stack, using Nomad as an orchestrator, is much simpler and more composable.
I've long been a fan of Mesos' architecture, which I also think is more composable than the k8s stack.
I just find it surprising an article that is calling for an evolution of the cluster management architecture fails to investigate the existing alternatives and why they haven't caught on.
Getting _something_ up and running quickly isn't necessarily a good indicator of how well a set of tools will work for you over time, in production work loads.
Things might have improved massively for Nomad since but I honestly have no desire to learn. Having used other Hashicorp tools since, I see them make the same mistakes time and time again.
Now I'm not the biggest fan of K8s either. I completely agree that they're hugely overblown for most purposes despite being sold as a silver bullet for any deployment. But if there's one thing K8s does really well it's describing the different layers in a deployment and then wrapping that up in a unified block. There's less of the "this thing is working but is this other thing" when spinning up a K8s cluster.
On the other side K8s was a steep learning curve with lots of options and 'terms' to learn but never was a point into the whole exploration where I was stuck. The docs are great. the community is great and the number of examples available allows us to mix n match lots of different approaches.
This leads to unnecessarily heavy systems - you do not need a container to host a server socket.
Industry puts algorithms and Big O on a pedestal. Most software projects start as someone building algorithms, with deployment and interactions only getting late attention. This is a bit like building the kitchen and bathroom before laying the foundations.
Algorithm centric design creates mathematically elegant algorithms that move gigabytes of io across the network for every minor transaction. Teams wrap commodity resource schedulers around carefully tuned worker nodes, and discover their performance is awful because the scheduler can’t deal in the domain language of the big picture problem.
I think it is interesting that the culture of Big O interviews and k8s both came out of Google.