While this is a very welcome improvement in terms of functionality, I can't help by feel that the re-use of "restartPolicy" to mean something similar, but different, when used in a different context, is a very poor decision.
Kubernetes already has an issue with having a (perceived) high barrier to entry, and I'm not sure that "restartPolicy on a container means this, unless isn't used in this list of containers, in which case it means this".
I would have preferred to see a separate attribute (such as `sidecar: true`), rather than overloading (and in my opinion, abusing) the existing `restartPolicy`.
The challenge with a separate attribute is that it is not forward compatible with new features we might add to pods around ordering and lifecycle. If we used a simple boolean, eventually we’d have to have it interact with other fields and deal with conflicting behaviors between what “sidecar” means and more flexibility.
The only difference today between init containers and regular containers is:
a) init containers have an implicit default restart policy of OnFailure, and regular containers inherit the pods restartPolicy
b) init containers are serial, regular containers are parallel
We are leaving room for the possibility that init containers can fail the pod, and be parallelized, as well as regular containers having unique restartPolicies. Both of those would allow more control for workflow / job engines to break apart monolith containers and get better isolation.
The key design point was that “sidecars aren’t special containers” - because we want to leave room for future growth.
It's par for the course. Most of K8s's design has been shoving whatever crap they feel like in, regardless of confusion, difficulty, complexity, etc for the end user.
At some level it seems deliberate so that administration of the complexity can be sold to you for a price once you realise that you can't hack it on your own, but are now too invested to back out.
I've been brushing up on my Kubernetes knowledge recently and came across so much gross stuff like this. "If field X is set to Y, then value Z for key V is invalid." Jesus christ. I wish they put more effort into approachability.
That is very annoying. I remember having spent some time with this same issue in Google App Engine as well, which also runs Cloud SQL Proxy as a sidecar container.
Just FYI for people who don't know about it yet: with cloudsql-proxy v2 there's a new parameter called "--quitquitquit" that starts up an HTTP endpoint to be used for graceful shutdowns. Basically your main container makes a POST to this endpoint, and sidecar exits.
The other hand, one of the main criticisms of Kubernetes is that it has no composition or orchestration capabilities. It's great about defining pieces of state, but managing blocks of state & multiple things at once is left almost entirely to external tools.
The ability to compose &sequence multiple containers feels like a very specific example of a much broader general capability. There's bedevilling infinite complexity to trying to figure out a fully expressive state of state management system - I get why refining a couple specialized existing capabilities is the way - but it does make me a little sad to see a lack of appetite for the broader crosscutting system problem at the root here.
Yeah I work on the team that builds Amazon Elastic Container Service so I can't help but compare this implementation with how we solved this same problem in ECS.
Inside of an ECS task you can add multiple containers and on each container you can specify two fields: `dependsOn` and `essential`. ECS automatically manages container startup order to respect the dependencies you have specified, and on shutdown it tears things down in reverse order. Instead of having multiple container types with different hardcoded behaviors there is one container type with flexible, configurable behavior. If you want to chain together 4 or 5 containers to start up one by one in a series you can do that. If you want to run two things in parallel and then once both of them have become healthy start a third you can do that. If you want a container to run to completion and then start a second container only if the first container had a zero exit code you can do that. The dependency tree can be as complex or as simple as you want it to be: "init containers" and "sidecar containers" are just nodes on the tree like any other container.
In some places I love the Kubernetes design philosophy of more resource types, but in other aspects I prefer having fewer resource types that are just more configurable on a resource by resource basis.
Your approach sounds a lot like systemd's, with explicit dependencies in units coupling them to each other.
It's pretty cool how one can have a .device or what not that then wants a service- plug in a device & it's service starts. The arbitrary composability enables lots of neat system behaviors.
In general, the intent here is to leave open room for just that.
dependsOn was proposed during the kep review but deferred. But because init containers and regular containers share the same behavior and shape, and differ only on container restart policy, we are taking a step towards “a tree of container node” without breaking forward or backward compatibility.
Given the success of mapping workloads to k8s, the original design goal was to not take on that complexity originally, and it’s good to see others making the case for bringing that flexibility back in.
I've a question that I've been wondering about for a while. Why does ECS impose a 10 container limit on a task? It proves very limiting in some cases and I've to find hacky workarounds like dividing a task into two when it should all have lived and does together.
I like it this way to be honest. We needed to create a custom controller for Dask clusters consisting of a single scheduler, an auto-scaling set of nodes, an ingress and a myriad of secrets, configmaps and other resources.
It wasn’t simple, but with meta controller[1] it was relatively easy to orchestrate the complex state transitions this single logical resource needed and to treat the whole thing as a single unit.
I’m not saying Kubernetes can’t make simple patterns easier, but baking it into core leads to the classic “tragedy of the standard library” problem where it becomes hard to change that implementation. And the k8s ecosystem is definitely all about change.
This is all true, and if you read the KEPs they were thinking about this. One camp was advocating for solving the problem of specifying the full dependency graph spec (of which sidecars are one case), another advocating for just solving the most needed case with a sidecar-specific solution to get a solution shipped. The latter was complicated by a desire to at least leave the door open for the former.
Absolutely, no shortage of things atop. Helm is probably the most well used composition tool.
It seems unideal to me to forever bunt on this topic, leaving it out of core forever. Especially when we are slowly adding im very specialized composition orchestration tools in core.
Compositions of blocks of state may not end up with a more reliable software. Each of state management are controlled by independent processes that may interact with each other (example: horizontal pod autoscalers are not directly aware of cluster-autoscaler). The whole system is more like an ecology or a complex adaptive system than it is something you can reason directly with abstractions.
In the Cynefin framework (https://en.wikipedia.org/wiki/Cynefin_framework), you can reason through "complicated" domains the way you are suggesting, but it will not work when working in the "complex" domain. And I think what Kubernetes help manage is in "complex" not "complicated" domain.
Orchestration of k8s wouldn't be necessary if they had made K8s' operation immutable. As it stands now you just throw some random YAML at it and hope for the best. When that stops working, you can't just revert back to the old working version, you have to start throwing more crap at it and running various operations to "fix" the state. So you end up with all these tools that are effectively configuration management tools to continuously "fix" the cluster back to where you want it.
I hope the irony is lost on no one that this is an orchestration tool for an immutable technology, and the orchestrator isn't immutable.
Worth noting that this is hitting Alpha in Kubernetes 1.28, so won't be available by default at this stage.
If you've got self-managed clusters, it'd be possible to enable with a feature gate on the API server, but it's unlikely to be available on managed Kubernetes until it gets to GA.
In case anyone else was looking for a clear, concise summary of the new feature:
"The new feature gate "SidecarContainers" is now available. This feature introduces sidecar containers, a new type of init container that starts before other containers but remains running for the full duration of the pod's lifecycle and will not block pod termination."
It's a shame it took so long. If the main container shutdown (i.e connection drain, processing inflight queue items) takes a while, and your service mesh dies (nice go binary) and main container cannot communicate with internet anymore.
But I'm not sure about initContainers being used. init keyword implies it'd run and die in order for others to continue. Using restartPolicy with init instead of a dedicated sideCars field feels weird.
We did that to leave open more complex ordering of both init containers and sidecars (regular containers do not have a restart order). For instance, you might have a service mesh that needs a vault secret - those both might be sidecars, and you may need to ensure the vault sidecar starts first if both go down. Eventually we may want to add parallelism to that start order, and a separate field would prevent simple ordering from working now.
Also, these are mostly init containers that run longer, and you want a sidecar not starting to be able to block regular pods, and adding a new container type (like ephemeral containers) is extremely disruptive to other parts of the system (security, observability, and UI), so we looked to minimize that disruption.
Without restart policy, a failing init container is retried forever. With a policy of never, the entire pod is marked as having failed. The init containers still have to run and succeed before the main pod continues.
So, until now, a sidecar container was just the idea of running containers in you Kubernetes pod, along with your main service, that were 'helpers' for something: connection to databases or vpns, mesh networking, pulling secrets or config, debugging... But they didn't have special status, they were just regular containers in your pod.
This sometimes posed some problems because they weren't available for the full life cycle of the pod, notably on the init process. So if your init containers needed secrets, connections, networking... that was being provided via a sidecar container, you were going to have a hard time.
With this change, among other things, sidecars containers are going to be available for the whole life cycle of the pod.
There are other implications, probably, but I still haven't finished reading the KEP [0]. Check it out, and there you'll find its motivation and several interesting examples.
The KEP (Kubernetes Enhancement Proposal) is linked to in the PR [1]. From the summary:
> Sidecar containers are a new type of containers that start among the Init containers, run through the lifecycle of the Pod and don’t block pod termination. Kubelet makes a best effort to keep them alive and running while other containers are running.
TLDR: Introduce a restartPolicy field to init containers and use it to indicate that an init container is a sidecar container. Kubelet will start init containers with restartPolicy=Always in the order with other init containers, but instead of waiting for its completion, it will wait for the container startup completion.
Kubernetes already has an issue with having a (perceived) high barrier to entry, and I'm not sure that "restartPolicy on a container means this, unless isn't used in this list of containers, in which case it means this".
I would have preferred to see a separate attribute (such as `sidecar: true`), rather than overloading (and in my opinion, abusing) the existing `restartPolicy`.
The only difference today between init containers and regular containers is:
a) init containers have an implicit default restart policy of OnFailure, and regular containers inherit the pods restartPolicy
b) init containers are serial, regular containers are parallel
We are leaving room for the possibility that init containers can fail the pod, and be parallelized, as well as regular containers having unique restartPolicies. Both of those would allow more control for workflow / job engines to break apart monolith containers and get better isolation.
The key design point was that “sidecars aren’t special containers” - because we want to leave room for future growth.
Deleted Comment
https://cloud.google.com/sql/docs/postgres/connect-kubernete...
https://github.com/GoogleCloudPlatform/cloudsql-proxy/issues...
The other hand, one of the main criticisms of Kubernetes is that it has no composition or orchestration capabilities. It's great about defining pieces of state, but managing blocks of state & multiple things at once is left almost entirely to external tools.
The ability to compose &sequence multiple containers feels like a very specific example of a much broader general capability. There's bedevilling infinite complexity to trying to figure out a fully expressive state of state management system - I get why refining a couple specialized existing capabilities is the way - but it does make me a little sad to see a lack of appetite for the broader crosscutting system problem at the root here.
Inside of an ECS task you can add multiple containers and on each container you can specify two fields: `dependsOn` and `essential`. ECS automatically manages container startup order to respect the dependencies you have specified, and on shutdown it tears things down in reverse order. Instead of having multiple container types with different hardcoded behaviors there is one container type with flexible, configurable behavior. If you want to chain together 4 or 5 containers to start up one by one in a series you can do that. If you want to run two things in parallel and then once both of them have become healthy start a third you can do that. If you want a container to run to completion and then start a second container only if the first container had a zero exit code you can do that. The dependency tree can be as complex or as simple as you want it to be: "init containers" and "sidecar containers" are just nodes on the tree like any other container.
In some places I love the Kubernetes design philosophy of more resource types, but in other aspects I prefer having fewer resource types that are just more configurable on a resource by resource basis.
It's pretty cool how one can have a .device or what not that then wants a service- plug in a device & it's service starts. The arbitrary composability enables lots of neat system behaviors.
dependsOn was proposed during the kep review but deferred. But because init containers and regular containers share the same behavior and shape, and differ only on container restart policy, we are taking a step towards “a tree of container node” without breaking forward or backward compatibility.
Given the success of mapping workloads to k8s, the original design goal was to not take on that complexity originally, and it’s good to see others making the case for bringing that flexibility back in.
It wasn’t simple, but with meta controller[1] it was relatively easy to orchestrate the complex state transitions this single logical resource needed and to treat the whole thing as a single unit.
I’m not saying Kubernetes can’t make simple patterns easier, but baking it into core leads to the classic “tragedy of the standard library” problem where it becomes hard to change that implementation. And the k8s ecosystem is definitely all about change.
1. https://metacontroller.github.io/metacontroller/intro.html
Pragmatism won out, thankfully IMO.
Edit to add: see this better description from one of the senior k8s maintainers: https://news.ycombinator.com/item?id=36666359
It seems unideal to me to forever bunt on this topic, leaving it out of core forever. Especially when we are slowly adding im very specialized composition orchestration tools in core.
In the Cynefin framework (https://en.wikipedia.org/wiki/Cynefin_framework), you can reason through "complicated" domains the way you are suggesting, but it will not work when working in the "complex" domain. And I think what Kubernetes help manage is in "complex" not "complicated" domain.
I hope the irony is lost on no one that this is an orchestration tool for an immutable technology, and the orchestrator isn't immutable.
If you've got self-managed clusters, it'd be possible to enable with a feature gate on the API server, but it's unlikely to be available on managed Kubernetes until it gets to GA.
Dead Comment
"The new feature gate "SidecarContainers" is now available. This feature introduces sidecar containers, a new type of init container that starts before other containers but remains running for the full duration of the pod's lifecycle and will not block pod termination."
But I'm not sure about initContainers being used. init keyword implies it'd run and die in order for others to continue. Using restartPolicy with init instead of a dedicated sideCars field feels weird.
Also, these are mostly init containers that run longer, and you want a sidecar not starting to be able to block regular pods, and adding a new container type (like ephemeral containers) is extremely disruptive to other parts of the system (security, observability, and UI), so we looked to minimize that disruption.
This sometimes posed some problems because they weren't available for the full life cycle of the pod, notably on the init process. So if your init containers needed secrets, connections, networking... that was being provided via a sidecar container, you were going to have a hard time.
With this change, among other things, sidecars containers are going to be available for the whole life cycle of the pod.
There are other implications, probably, but I still haven't finished reading the KEP [0]. Check it out, and there you'll find its motivation and several interesting examples.
Edit: corrected syntaxDeleted Comment
> Sidecar containers are a new type of containers that start among the Init containers, run through the lifecycle of the Pod and don’t block pod termination. Kubelet makes a best effort to keep them alive and running while other containers are running.
[1] https://github.com/kubernetes/enhancements/tree/master/keps/...
TLDR: Introduce a restartPolicy field to init containers and use it to indicate that an init container is a sidecar container. Kubelet will start init containers with restartPolicy=Always in the order with other init containers, but instead of waiting for its completion, it will wait for the container startup completion.