Enhancements: Sidecar Containers

Created on 29 Jan 2019  ·  154Comments  ·  Source: kubernetes/enhancements

Enhancement Description

  • One-line enhancement description: Containers can now be a marked as sidecars so that they startup before normal containers and shutdown after all other containers have terminated.
  • Primary contact (assignee): @Joseph-Irving
  • Responsible SIGs: sig-apps, sig-node
  • Design proposal link: https://github.com/kubernetes/enhancements/blob/master/keps/sig-node/0753-sidecarcontainers.md
  • Link to e2e and/or unit tests:
  • Reviewer(s): @sjenning, @SergeyKanzhelev
  • Approver: @kow3ns, @derekwaynecarr, @dchen1107
  • Enhancement target (which target equals to which milestone):

    • Alpha release target (tbd)

    • Beta release target (tbd)

    • Stable release target (tbd)

/kind feature
/sig apps
/sig node

kinapi-change kinfeature siapps sinode stagalpha trackeno

Most helpful comment

Big thanks to everyone who posted messages of support (publicly or privately) that was very much appreciated ❤️

There was a valiant effort by members of the community to try and get this into 1.18, including the release team who accepted an extension request, but alas, the decision has been made to defer this to 1.19. You can see the relevant conversation starting from this comment: https://github.com/kubernetes/kubernetes/pull/80744#issuecomment-595292034.

Despite it not getting into 1.18, this has had a lot more attention in the past few days than it has had in quite a while, so I'm hoping that this momentum will carry forward into 1.19.

cc @jeremyrickard, @kikisdeliveryservice

All 154 comments

@enisoc @dchen1107 @fejta @thockin @kow3ns @derekwaynecarr, opened this tracking issue so that we can discuss.

/assign

@derekwaynecarr I've done some scoping out of the kubelet changes required for next week's sig-node meeting, I _believe_ that changes are only needed in the kuberuntime package, specifically kuberuntime_manager.go in and kuberuntime_container.go.

In kuberuntime_manager.go you could modify computePodActions to implement the shutdown triggering (kill sidecars when all non-sidecars have permanently exited), and starting up the sidecars first.

In kuberuntime_container.go you could modify killContainersWithSyncResult for terminating the sidecars last and sending the preStop hooks (the preStop hooks bit was a bit debatable, it wasn't settled whether that should be done or not. @thockin had a good point about why you might not want to encourage that behaviour, see comment).

Let me know if you want me to investigate any further.

@kow3ns The discussion makes more sense to me if maybe we can define a full description of containers sequence in Pod spec (sig-app), and how to handle the sequence in kubelet for start, restart and cascading consideration (sig-node). Let's catch the Feb 5 sig-node meeting to give more inputs.

cc @Joseph-Irving

The proposal says that sidecars only run after the init containers run. But what if the use-case requires the sidecar to run while/before the init containers run. For example, if you'd like route the pod's traffic through a proxy running as a sidecar (as in Istio), you probably want that proxy to be in place while the init containers run in case the init container itself does network calls.

@luksa I think there's the possibility of looking at having sidecars that run in init phase at some point but currently the proposal is not going to cover that use case. There is currently no way to have concurrent containers running in the init phase so that would be potentially a much larger/messier change than what is being suggested here.

Update on this KEP:
I've spoken to both @derekwaynecarr and @dchen1107 from sig-node about this and they did not express any major concerns about the proposal. I will raise a PR to the KEP adding some initial notes around implementation details and clarifying a few points that came up during the discussion.

We still need to agree on the API, it seems there is consensus that a simple way of marking containers as sidecars is prefered over more in depth ordering flags. Having a bool is somewhat limiting though so perhaps something more along the lines of containerLifecycle: Sidecar would be preferable so that we have the option of expanding in the future.

@Joseph-Irving Actually, neither the boolean nor the containerLifecycle: Sidecar are appropriate for proper future extensibility. Instead, containerLifecycle should be an object, just like deployment.spec.strategy, with type: Sidecar. This would allow us to then introduce additional fields. For the "sidecar for the whole lifetime of the pod" solution, it would be expressed along these lines:

containerLifecycle: 
  type: Sidecar
  sidecar:
    scope: CompletePodLifetime

as opposed to

containerLifecycle: 
  type: Sidecar
  sidecar:
    scope: AfterInit

Please forgive my bad naming - I hope the names convey the idea.

But there is one problem with the approach where we introduce containerLifecycle to pod.spec.containers. Namely, it's wrong to have sidecars that run parallel to init containers specified under pod.spec.containers. So if you really want to be able to extend this to init containers eventually, you should find an alternative solution - one that would allow you to mark containers as sidecars at a higher level - i.e. not under pod.spec.containers or pod.spec.initContainers, but something like pod.spec.sidecarContainers, which I believe you already discussed, but dismissed. The init containers problem definitely calls for a solution along these lines.

@luksa You could also solve the init problem by just allowing an init container to be marked as a sidecar and have that run alongside the init containers. As I understand it, the problem is that init containers sometimes need sidecars, which is different from needing a container that runs for the entire lifetime of the pod.

The problem with pod.spec.sidecarContainers is that it's a far more complex change, tooling would need to updated and the kubelet would require a lot of modifying to support another set of containers. The current proposal is far more modest, it's only building on what's already there.

@Joseph-Irving We could work with that yes. It's not ideal for the sidecar to shut down after the init containers run and then have the same sidecar start up again, but it's better than not having that option. The bigger problem is that older Kubelets wouldn't handle init-sidecar containers properly (as is the case with main-sidecar containers).

I'd just like you to keep init-sidecars in mind when finalizing the proposal. In essence, you're introducing the concept of "sidecar" into k8s (previously, we basically only had a set of containers that were all equal). Now you're introducing actual sidecars, so IMHO, you really should think this out thoroughly and not dismiss a very important sidecar use-case.

I'd be happy to help with implementing this. Without it, Istio can't provide its features to init containers (actually, in a properly secured Kubernetes cluster running Istio, init containers completely lose the ability to talk to _any_ service).

In relation to the implementation discussion in https://github.com/kubernetes/enhancements/pull/841, I've opened a WIP PR containing a basic PoC for this proposal https://github.com/kubernetes/kubernetes/pull/75099. It's just a first draft and obviously not perfect but the basic functionality works and gives you an idea of the amount of change required.

cc @enisoc

I put together a short video just showing how the PoC currently behaves https://youtu.be/4hC8t6_8bTs. Seeing it in action can be better than reading about it.
Disclaimer: I'm not a pro youtuber.

I've opened two new PRs:

Any thoughts or suggestions will be much appreciated.

@Joseph-Irving Sorry if I'm commenting late in the design process, but I have a potential use case for sidecar containers which may not be supported in the current design proposal. I just wanted to raise it for consideration. The gist is that I have a scenario where on pod termination, 1 sidecar should be terminated before the main container, while another sidecar should be terminated after the main container.

A concrete example might be a pod with a Django app container, a consul sidecar for service registration, and a pgbouncer sidecar for managing connections to the database. When the pod is terminated, I'd like the consul sidecar to be stopped first (so no more traffic is routed to the pod), then the app container (ideally after a short grace period), and then the pgbouncer sidecar. The current proposal looks great for handling the app <-> pgbouncer container dependency, but doesn't seem expressive enough to capture the case where I'd like to tear down a sidecar _before_ the primary container.

@currankaushik, in the scenario you described you could potentially use a pre-stop hook to tell the consul container to prepare for shutdown and stop routing requests to you (assuming it can support something like that). pre stop hooks will be sent to sidecars first before the termination of containers begins.

The motivation for this was so that proxy sidecars like istio could enter a state where they're not routing traffic to you but are still allowing traffic out while your application finishes up and shuts down.

Sounds good, thanks @Joseph-Irving. So just to confirm my understanding at a high level: pre-stop hooks will be sent to sidecars first, followed by pre-stop hooks to the non-sidecars, SIGTERM to non-sidecars, and then (after all non-sidecars have exited) SIGTERM to sidecars? The design proposal (https://github.com/kubernetes/enhancements/blob/master/keps/sig-apps/sidecarcontainers.md) seems to imply this but also says:

PreStop Hooks will be sent to sidecars and containers at the same time.

@currankaushik yeah what you described is the intended behaviour.

That line you quoted needs rewording. I had some misconceptions about how the prestop hooks were sent to the containers when I wrote that. Thanks for pointing it out.

@Joseph-Irving is this feature targeting alpha inclusion for 1.15?

@kacole2 yeah that is the plan, assuming we can get the KEP to implementable in time for enhancement freeze (april 30th). Once the api has been finalised https://github.com/kubernetes/enhancements/pull/919 and the test plan agreed https://github.com/kubernetes/enhancements/pull/951 I think we should be all set.

/milestone v1.15
/stage alpha

@Joseph-Irving Kubernetes 1.15 Enhancement Freeze is 4/30/2019. To be included in the Kubernetes 1.15 milestone, KEPs are required to be in an "Implementable" state with proper test plans and graduation criteria. Please submit any PRs needed to make this KEP adhere to inclusion criteria. If this will slip from the 1.15 milestone, please let us know so we can make appropriate tracking changes.

@mrbobbytables unfortunately the PRs opened to get this to an implementable state have not had much movement on them so I think we will need to delay this until 1.16.

No worries. Thanks for being so quick to respond and letting us know!
/milestone clear

Please keep in mind, this KEP is very important for Istio !

It's a show stopper for all projects using service frameworks with coordinated bootstrap/shutdown (akka cluster, lagom etc.) together with istio service mesh see.

cc @jroper

@Joseph-Irving sry about the late comment, but I don't see the following in the design doc, and I was wondering what is the intended behavior of these:

if we see sidecar failure, do we always restart them if main container is not finished (disregarding restartPolicy in pod)? This would be useful as sidecar used to work as proxy, load balancing, house keeping role, and it doesn't matter if it fails couple of times as long as main container can continue to work

Also, when computing pod phase, if all main container succeeded, and sidecar failed (which is very common as if sidecar does not catch SIGTERM the return code will be like 143 or something), is the pod phase still "Succeeded"?

@zhan849 currently sidecar containers obey pod restart policy and are counted when computing pod phase such as Succeeded.

We did debate this quite a bit earlier in the process but the general feeling was that we should diverge from a normal container as little as possible, only doing so if it enables the described use cases.

In regards to pod phase: I would argue that all applications running in kubernetes should be handling SIGTERMs (especially sidecars), but also sometimes you would want to know if your sidecars exited in a bad way and that should be reflected in the pod phase, hiding that info could lead to confusion.

For restart policy, it only seems like that would be an issue if restart policy is never and your sidecar is prone to crashing. I'm not sure if the complication of diverging them from pod restart policy is worth it, especially as some people may want their sidecars to obey pod restart policy.

Both of these things are just in line with what a normal container does and what currently happens.
Changing them didn't seem to be required to achieve the goals listed in the Kep.

If you have some widespread use cases for why changing them is needed to achieve those goals, that would be useful. As it makes it easier to justify a more complicated change to the code base.

@Joseph-Irving we have some simpler side car impls that has been running internally for some immediate needs (we did not contribute as this is already in progress in the community), and here are what we learned.

Regarding pod phase:

  1. Container exist status is already reflected in pod.status.containerStatuses so we don't lose the information. Also, since a big use case of sidecar is in Job (or what ever run-to-finish pods such as those in Kubeflow), meaningful workloads will be applied to only main container and if pod phase is marked as Failed due to sidecar failure, there will result in unnecessary retries and lead to other misleading consequences such as Job fail, etc.

  2. Although it is ideal for sidecars to handle SIGTERMs, in production, there could be plenty of sidecars that is simply built upon opensource software and they are not handling SIGTERMs nicely, including kube-proxy, postfix, rsyslogd, and many others (and even if SIGTERM is handled, for non-catchable SIGKILL, it will for sure not be 0)

Regarding restart policy (it could be arguable but have sidecars strictly obey restartPolicy is kind of not realistic in production):

  1. Forcing sidecar to restart when main containers are still running by setting "OnFailure" is not an option as this will restart failed main containers and is confusing along with Job level retry limit.

  2. Usually when handling sidecars, main containers usually have plenty of retry logics for sidecar unavailable, and these are done before the community has side car support with explicit container start order. Such historical error handlings are not very easy to change given the scope of it. Not restarting sidecar will cause main containers to hang and retry

  3. Propagating failures to upper level controllers will trigger chains of reconciliation and a lot of api calls so unnecessary escalation of errors can make kubernetes less scalable.
    A more specific example: if a job's main containers are still running and sidecar fails, restarting sidecar will have just 1 PATCH pod status operation and at most 1 event related api call. But if failing the pod entirely will result in reconciliation of Job, and more hire level controllers such as CronJob and other CRDs and there could be many more times API call.

wanna also see if other people has seen similar issues (/cc @kow3ns )

Would this change incorporate the behavior desired in https://github.com/kubernetes/community/pull/2342, such that there'd be a way to configure the entire pod (or just the non-sidecar container) to restart if a sidecar fails?

@JacobHenner there's currently no plans to implement that kind of mechanism in this KEP, we did discuss incorporating it, but it doesn't really have much dependency on this KEP and could be developed independently of this. So it seems better suited to having its own KEP.

@Joseph-Irving just to share our impl that addressed the above mentioned pitfalls for your reference (https://github.com/zhan849/kubernetes/commits/kubelet-sidecar) since we our goal is to wait for official support, we try to keep change as local as possible in this commit.

so for a job restart policy == Never, with 1 main container, 1 bad sidecar that constantly crashes, 1 good sidecar that keeps running, pod status will look like this after main container quits with the above impl.

containerStatuses:
  - containerID: xxxxx
    image: xxxxx
    imageID: xxxxx
    lastState: {}
    name: main
    ready: false
    restartCount: 0
    state:
      terminated:
        containerID: xxxxx
        exitCode: 0
        finishedAt: "2019-05-24T17:59:53Z"
        reason: Completed
        startedAt: "2019-05-24T17:59:43Z"
  - containerID: xxxxx
    image: xxxxxx
    imageID: xxxxx
    lastState: {}
    name: sidecar-bad
    ready: false
    restartCount: 1
    state:
      terminated:
        containerID: xxxxx
        exitCode: 1
        finishedAt: "2019-05-24T17:59:46Z"
        reason: Error
        startedAt: "2019-05-24T17:59:45Z"
  - containerID: xxxxx
    image: xxxxxxx
    imageID: xxxxx
    lastState: {}
    name: sidecar-healthy
    ready: false
    restartCount: 0
    state:
      terminated:
        containerID: xxxxx
        exitCode: 137
        finishedAt: "2019-05-24T18:00:24Z"
        reason: Error
        startedAt: "2019-05-24T17:59:44Z"
  hostIP: 10.3.23.230
  phase: Succeeded
  podIP: 192.168.1.85
  qosClass: BestEffort
  startTime: "2019-05-24T17:59:41Z"

I in general agree that a sidecar KEP needs to take into account pod phase and restart policy before it can go to an implementable state. I don't care whether it's this KEP or not, but I agree in general with @zhan849's arguments and it needs to be dealt with here.

thanks @smarterclayton !
@Joseph-Irving let us know if there is anything else you'd like us to share with sidecar in practice.

@smarterclayton @zhan849, I don't particularly disagree with the points your making, just trying to give some counter points. It was a conscious choice not to change Pod Phases/Restart Policy as that would further increase the scope of this proposal and nobody felt very strongly about it.

I will take this feedback back to sig-apps/sig-node and see what they think. sig-node in particular were keen on keeping the sidecars as close to normal containers as possible, if @derekwaynecarr or @dchen1107 want to chime in that would be appreciated.

The test plan https://github.com/kubernetes/enhancements/pull/951 and API design https://github.com/kubernetes/enhancements/pull/919 PRs have now been merged.

I've opened https://github.com/kubernetes/enhancements/pull/1109 to get the KEP marked as implementable, once everyone is happy with that we should be able to start development for this as alpha in 1.16 🤞

This Kep has been marked implementable so I will be raising PRs to get this into 1.16 starting next week!

I've raised https://github.com/kubernetes/kubernetes/pull/79649 to implement the API, I will have a separate PR for the Kubelet changes.

Hi @Joseph-Irving , I'm the 1.16 Enhancement Lead. Is this feature going to be graduating alpha/beta/stable stages in 1.16? Please let me know so it can be added to the 1.16 Tracking Spreadsheet. If not's graduating, I will remove it from the milestone and change the tracked label.

Once coding begins or if it already has, please list all relevant k/k PRs in this issue so they can be tracked properly.

Milestone dates are Enhancement Freeze 7/30 and Code Freeze 8/29.

Thank you.

@Joseph-Irving If you want/need some extra people to implement this, I have a lot of interest in this landing, so I'm happy to lend a hand.

Hi @kacole2 this is targeting Alpha for 1.16, the KEP has been marked implementable.
The only PR for this currently is kubernetes/kubernetes#79649 for the API

@mhuxtable I will be raising the PR for the kubelet changes fairly soon, just finishing off some things, I would greatly appreciate some help having a look at that. I will link it here when it's raised.

I've opened https://github.com/kubernetes/kubernetes/pull/80744 which implements the kubelet changes.

Please note that kubernetes/kubernetes#79649 (api) is still open so this PR contains commits from it, making it seem large. I've broken it down into commits that each implement a different bit of functionality so it should be easy to review it that way.

I've not quite finished doing all the test cases for this, but the first draft of working implementation is done so I'd like people to take a look.

cc @kacole2

@Joseph-Irving

I'm one of the v1.16 docs shadows.
Does this enhancement (or the work planned for v1.16) require any new docs (or modifications to existing docs)? If not, can you please update the 1.16 Enhancement Tracker Sheet (or let me know and I’ll do so)

If so, just a friendly reminder we're looking for a PR against k/website (branch dev-1.16) due by Friday, August 23rd, it can just be a placeholder PR at this time. Let me know if you have any questions!

Hi @daminisatya, yes this will need updates to Docs, I've raised https://github.com/kubernetes/website/pull/15693 as a placeholder PR.
I'd be interested to know if anyone has any opinions on where the Docs should go, I've put something in content/en/docs/concepts/workloads/pods/pod-lifecycle.md for now.

With less than one week to go before Code Freeze, it's looking very unlikely that this will be able to make it into 1.16.
We've still got two relatively large open PRs kubernetes/kubernetes#80744 and kubernetes/kubernetes#79649 which have struggled to get any reviews.
Hopefully there will be more reviewer bandwidth next release cycle to look at these.

/assign

Could this allow to write a sidecar that can start the actual service on demand (and destroy it)?

Like scale to zero, but only thing that is running while idle is the sidecar. When request comes it spins up the actual service and after a last response e.g. 30s it will close it down. This could allow a simple way to do scaling to nearly zero (with only sidecars left running).

@Ciantic With Operator Framework you can do that and much more. Take a look

@janosroden I looked, but it seems pretty difficult to understand how would I elevate running services to a zero scalable.

The problem is not that there isn't available options e.g. Osiris, Keda or knative. I tried the last one, but it hogs 8Gb of memory, hard to say it's 'serverless' at that point.

The problem is that most of those implementations need a new resources etc. it's much easier to think this so that one can inject a sidecar which can control the whole lifecycle (including starting and restarting on demand) so that it can control the service beyond just sitting there.

Why would this be beneficial? It's really useful in low utilisation and low memory situations, e.g. k3s with Raspberry Pi, or Digital Ocean droplet for hobby projects. Many of us have lot's of web services that need not to be running all the time, just having a sidecar which can wake them up on demand would be enough.

Not sure this really works for your use case. I totally see the desire to do what you want to do on such resource constrained systems. But to be really stable, you need to use resource requests to help schedule the workload. These would need to be specified up front so regardless if the workload is running or not, it should be reserving the resource.

To work around this, you pretty much need a pod of its own to do the initial connection receiving and make a new pod request to k8s, wait for it to spin it up, then send the traffic to it. Sidecar container enhancements aren't needed in this case I think. You need something more like an xinetd for k8s I think.

Hey there @Joseph-Irving -- 1.17 Enhancements lead here. I wanted to check in and see if you think this Enhancement will be graduating to alpha in 1.17?

The current release schedule is:

  • Monday, September 23 - Release Cycle Begins
  • Tuesday, October 15, EOD PST - Enhancements Freeze
  • Thursday, November 14, EOD PST - Code Freeze
  • Tuesday, November 19 - Docs must be completed and reviewed
  • Monday, December 9 - Kubernetes 1.17.0 Released

If you do, once coding begins please list all relevant k/k PRs in this issue so they can be tracked properly. 👍

Thanks!

Hi @mrbobbytables, assuming we can get everything reviewed in time the plan is to graduate to alpha in 1.17.

The current open PRs are:
https://github.com/kubernetes/kubernetes/pull/79649 - API Changes
https://github.com/kubernetes/kubernetes/pull/80744 - Kubelet Changes

Let me know if you need anything else!

Great! Thanks @Joseph-Irving I'll add the info to the tracking sheet 👍

@Joseph-Irving

I'm one of the v1.17 docs shadows.
Does this enhancement (or the work planned for v1.17) require any new docs (or modifications to existing docs)? If not, can you please update the 1.17 Enhancement Tracker Sheet (or let me know and I’ll do so)

If so, just a friendly reminder we're looking for a PR against k/website (branch dev-1.17) due by Friday, Nov 8th, it can just be a placeholder PR at this time. Let me know if you have any questions!

Thanks!

Hi @VineethReddy02 yeah this will require documentation, placeholder PR is here https://github.com/kubernetes/website/pull/17190

I've raised a PR to update the KEP https://github.com/kubernetes/enhancements/pull/1344 it's based on some discussion we were having in the implementation PR https://github.com/kubernetes/kubernetes/pull/80744.

I'd appreciate any comments

Hey @Joseph-Irving 1.17 Enhancement Shadow here! 👋 I am reaching out to check in with you to see how this enhancement is going.

The Enhancement team is currently tracking kubernetes/kubernetes#79649 and kubernetes/kubernetes#80744 in the tracking sheet. Are there any other k/k PRs that need to be tracked as well?

Also, another friendly reminder that we're quickly approaching code freeze (Nov. 14th).

Hi @annajung, yep those are the only PRs that need tracking.

Hi @Joseph-Irving , Tomorrow is code freeze for 1.17 release cycle. It looks like the k/k PRs have not been merged. We’re flagging this enhancement as At Risk in the 1.17 tracking sheet.

Do you think all necessary PRs will be merged by the EoD of the 14th (Thursday)? After that, only release-blocking issues and PRs will be allowed in the milestone with an exception.

Hi @annajung, unfortunately it is looking very unlikely that they are going to be merged before code-freeze. We've made a lot of progress this release cycle so hopefully we can merge them early into 1.18.

Hey @Joseph-Irving Thank you for the update. I'll update the milestone accordingly and mark this enhancement as deferred to 1.18. Thank you!

/milestone v1.18

Hey @Joseph-Irving. 1.18 Enhancements Lead here 👋 .

The 1.18 Release started yesterday, so I'm reaching out to see if you plan on landing this in the 1.18 release? I think this missed 1.17 because of code freeze. How are things looking for 1.18? I see the PRs are still open.

Thanks!

Hi @jeremyrickard,

Yeah the plan is to get this in the 1.18 release.

The API PR (https://github.com/kubernetes/kubernetes/pull/79649) got a review from thockin, the other day, he had a few points but once those are addressed we will close that PR and incorporate the commits into (https://github.com/kubernetes/kubernetes/pull/80744) so we can merge the API and implementation together.

As for the Kubelet PR (https://github.com/kubernetes/kubernetes/pull/80744) it just needs reviewing, I'm hoping we can get some sig-node bandwidth to review it this cycle.

Thanks for the update @Joseph-Irving

Added it into the tracking sheet!

Sorry for being late to the party. This is a significant improvement for common cases. But does not seem to cover a more advanced case.

Consider a case of sidecar which exports logs which also depends on Istio sidecar. If Istio sidecar shuts down first, some sensitive logs might not get exported.

A more generic approach would be to define explicit dependencies across containers. Regardless if they are sidecars or not.

What do you think of such api definition instead:

kind: Pod
spec:
  containers:
  - name: myapp
    dependsOn: ["exporter", "istio"]
  - name: exporter
    dependsOn: ["istio"]
  - name: istio

@rubenhak that gets messy really quick. What needs to be satisfied for the dependency to be clear to proceed? There is often a gap between started and ready that I think dependsOn would really care about that that api does not address.

@kfox1111 How does the proposed design determine that the sidecar is started and ready in order to launch the main container? The only difference I propose is that instead of marking containers as "sidecar" is to use a more generic way of defining dependency.

I don't think dependsOn should specify criteria. It could be specified in the dependent container. Wouldn't readinessProbe and/or livenessProbe be sufficient? If not, there can be a startupProbe - success of which indicate that dependent containers can be started.

Hello @Joseph-Irving I'm one of the v1.18 docs shadows.
Does this enhancement for (or the work planned for v1.18) require any new docs (or modifications to existing docs)? If not, can you please update the 1.18 Enhancement Tracker Sheet (or let me know and I'll do so)

If so, just a friendly reminder we're looking for a PR against k/website (branch dev-1.18) due by Friday, Feb 28th., it can just be a placeholder PR at this time. Let me know if you have any questions!

Hi @irvifa I've raised a placeholder PR here https://github.com/kubernetes/website/pull/19046

Hi @Joseph-Irving Thank you for your swift response!

@rubenhak - I agree with @kfox1111 that having a full graph of dependencies can get pretty messy pretty quickly. What about starting the containers in the order in the pod spec, and then tearing them down in the reverse order (like a stack)? This would be a lot simpler to implement, and covers most of the common ordering use-cases that I can think of.

@rgulewich, could you elaborate bit more, what exactly can get messy? Deriving order from graph is a trivial task, especially considering the fact that no sane operator would run more than 15 sidecars (already a stretch).

Idea of order is ok, but since most of sidecars are injected using admission controllers it would be really hard to guarantee correctness of order. There is a need for indirection.

@rubenhak there could be a cycle in the dependency order of the containers, how does k8s/kubelet decide to break the cycle and decide what order to start/stop the containers? Thinking out loud maybe this could be an API side validation.

Hey @Joseph-Irving,

Just a friendly reminder that code freeze for 1.18 is March 05, 2020.

As we track toward code freeze, please list out/link to any PRs you are working on toward graduating this enhancement!

Hey @jeremyrickard,

Is the pr to track https://github.com/kubernetes/kubernetes/pull/80744
This PR contains API changes but will have the commits merged into the PR above once review has finished https://github.com/kubernetes/kubernetes/pull/79649

@rubenhak there could be a cycle in the dependency order of the containers, how does k8s/kubelet decide to break the cycle and decide what order to start/stop the containers? Thinking out loud maybe this could be an API side validation.

@bjhaid, API side can do the validation. Loop detection is a trivial algorithm with a linear time complexity (just like DFS traversal).

There might also be a need to rerun validation after sidecar injection as well.

I've been thinking about this for a while... Most of the issues with dependencies really boil down I think to service meshes. (maybe someone can think of another example though)

The service mesh proxy is a sidecar that needs to start and become ready before anything else, and needs to exit after anything else. They are long running so are more of a sidecar then an init container.

But initContainers ideally should be all able to use the service mesh too.

But initContainers may need to init other sidecar containers.

While we could design some kind of intricate dependency system involving init containers, sidecar containers, and regular containers, maybe we should just have two classes of sidecars? regular sidecars, and network sidecars?

network sidecars must all become ready at the very beginning. service mesh proxies go here.
init containers run next in order.
sidecars all start and become ready. This can include stuff like auth proxies, loggers, etc.
regular containers start and become ready.

tear down is in reverse.

Would this eliminate the dependency issue while still solving all the issues service meshes seem to have with container ordering? I'm thinking so?

@kfox1111, Vault now does secret injection using sidecars. Which class should it fit into? Also, depending on the case, vault could depend on service mesh, or the other way around.

All I'm saying is that such design would eventually explode into 10 sidecar classes. Such approach implies even stronger opinion on how things should run. People would start hacking with classes, just to achieve the order require to launch the application.

If the only purpose of those classes is to define the order, why not to do that explicitly?

Answering your question, while such design would cover some use cases, it doesn't help with other cases like vault sidecars, logging sidecars, etc. This already a proposal for redesign of original feature. Since this is a second attempt, its worth to make it right this time.

I don't how dependencies are intricate. Could you elaborate more on this? Dependencies make YAML definitions more obvious, there is no hidden logic. Approach with hardcoded classes would require hidden logic and a lot more documentation explaining why networking sidecars should run after other sidecars, etc.

What if we introduce a field into Container ?

    // +optional
    Priority int

This field is effective among containers of the same type (sidecar, normal).
For sidecar containers, the sidecar container with higher priority would be instantiated first / torn down last.

@tedyu, dependency has much more metadata and value compared with "priority". It takes 30 lines of c++ code to produce priority order given dependency https://www.geeksforgeeks.org/find-the-ordering-of-tasks-from-given-dependencies/. The other way around is not possible.

Another benefit is that given dependency graph certain containers can be started at the same time.
In the following example: "A -> B, B -> C, D -> C" containers B and D can be started at the same time if C is initialized. I'm not telling implementation has to support that, but at least it can be very valuable if the API allows such definition.

Integer priority won't work nicely – people will use all kind of different, non-standardized numbers as they do with the CSS z-index property (like -9999).

@rubenhak What you're suggesting at this point is basically an entirely different feature to the one being describe in this KEP, it's not a small tweak, it's a total rewrite. It would require getting everyone who had previously agreed to this feature (it took a year to get this approved by all parties) to revaluate this.
If you are passionate about such a feature I would suggest that you create your own KEP and take it to the various SIGs to get their feedback on it.

The idea of a dependency graph was discussed at length when this proposal started back in 2018, the conclusion was unanimous in that although it would enable some more use cases, they were not strong enough use cases and that the added complexity was not worth it.

I think you are somewhat underestimating the degree of change required to implement what you're describing. But if it is as a simple as you seem to think, you could create your own Proof of Concept of this working in Kubernetes and present that to the SIGs to help strengthen your case.

This KEP is not yet a GA feature, if your KEP gets approved and implemented then we can remove this one. They are not yet mutually exclusive.

This change may not be the perfect solution for every single use case, but it dramatically improves the experience for most and I think we would be much better off getting this merged in then to debate for another year about the implementation.

if your KEP gets approved and implemented then we can remove this one. They are not yet mutually exclusive.

Would they _ever_ be mutually exclusive?

I'm asking myself if this feature has value _even if_ more explicit container startup/shutdown ordering (which I think would be great) is enabled through another enhancement in the future... and I am thinking yes.

Any startup / shutdown order, that's implied by the classification of containers as init, sidecar, or "regular" set aside, these classifications also express _other_ useful, and arguably unrelated aspects of a container's desired behavior, do they not?

For example, it's useful in a pod with restartPolicy != Always (a pod that implements a job, perhaps), that containers designated as sidecars have no bearing on a pod entering a completed state.

@kfox1111, Vault now does secret injection using sidecars. Which class should it fit into? Also, depending on the case, vault could depend on service mesh, or the other way around.

We worked through csi ephemeral drivers so that things like vault would not need sidecars/init containers. I believe there is a vault driver in the works.

Though regular sidecar with emptyDir seems like it would fit for a sidecar that needs to use the network sidecars?

@Joseph-Irving, I by no means was trying to block this KEP from going in. I realize that you started it almost 2 years back and there are quite a lot of folks waiting for this to be released.

Do you have a link to prior discussion related to dependency graph?

Hey @Joseph-Irving,

Friendly reminder that we are closing in pretty quickly on code freeze on 05 March 2020. It does not look like your PRs have merged yet, are you still feeling like you're on track for code freeze for this enhancement?

Hey @jeremyrickard, The API review (https://github.com/kubernetes/kubernetes/pull/79649) is basically done. We will be closing that PR and moving entirely into the implementation PR so it can all (API and Implementation) be merged in one PR.

The implementation PR (https://github.com/kubernetes/kubernetes/pull/80744) has been thoroughly reviewed, so I'm trying to get a sig-node approver to take a look for final approval.

Whether this happens in time for code freeze is somewhat hard for me to say, it's very dependent on whether I manage to get the attention of some approvers in time.

Would love to see this get in by code freeze. It would make Istio's solution to https://github.com/istio/istio/issues/7136 both simpler and better.

Any movement on this getting in to 1.18? Seems necessary for istio sidecars to work as expected with fast running jobs.

I've tried reaching out to the sig-node approvers in a variety of ways but I've not had any responses. So I'm not very optimistic that this will make it into 1.18.

@Joseph-Irving the #pr-reviews slack channel was created for these cases. Have you tried that? It’s a way to get an escalation on pr reviews. (I didn’t see it in there)

Hey @Joseph-Irving ,

We're only a few days out from code freeze now. Do you want to defer this to 1.19 based on the reviewer bandwidth? Or try and make a push?

Hey @jeremyrickard ,

No one has responded to me regarding getting these PRs merged in 1.18 so I highly doubt it will happen.

We can defer to 1.19 but I'm starting to wonder if there's any point in doing so.
This KEP has been in flight for almost two years (original alpha target was 1.15), the PRs in question have been open for almost 1 year, there's never any "reviewer bandwidth" for them.
I have politely emailed, slacked, gone to sig-meetings, and even found people in-person to get reviews yet we have made very little progress.
Any reviews I have managed to get have only ever suggested minor changes, it's not like large rewrites have been requested, the PRs are all basically the same as they were a year ago just a bit more polished.
I don't know if I'm meant to be aggressively pinging people every day until they respond to me but that's really not something I'm comfortable doing.

I think the problem is more that nobody actually cares about this feature, I am the only one driving this feature forward, nobody in the SIGs seem interested in seeing this through. If it takes two years to get in to alpha, how long will it take to get it to beta/GA? (as in when most people can actually use this)

Frustratingly there does seem to be interest from the wider community and end users in getting this feature in, the reason I did this in the first place is that I saw it was an issue, asking the SIGs if they were gonna fix it, and they said "we don't have the capacity but we'd be happy for you to do it".
So I did everything they asked, I wrote the KEP, I got it approved by all parties, I wrote all the code, all the tests, constantly kept it up to date as each release passed, and yet here we are.

Every time we delay this, I feel like I'm letting a load of people down, is this just all my fault? is the code so terrible nobody will even comment? am I just not aggressive enough in trying to get attention?
I just feel that I can't get this done on my own, and I'm getting a bit tired of beating this dead horse.

I'm sorry for the long rant (not directed at you personally Jeremy or anyone personally for that matter), but this has been slowly eating away at my soul for a long time now.

Frustratingly there does seem to be interest from the wider community and end users in getting this feature in, the reason I did this in the first place is that I saw it was an issue, asking the SIGs if they were gonna fix it, and they said "we don't have the capacity but we'd be happy for you to do it".

@Joseph-Irving On this. As an active user, watching this thread because I'm interested (and so are two of my colleagues). Activity on the issue, pull request, slack channels or sigs might not be the best indicator of interest in this feature.

@dims Maybe you can shed some light?

@thockin I listened you being interviewed in the Kubernetes Podcast a ~year ago and you talked about contributing to Kubernetes. Maybe it was you or somebody else in another podcast episode who felt really bad that this sidecar KEP didn't make it to 1.16. Well, here we are again.

This issue seems to be a prime example of how difficult it may be to contribute if you're not an employee of eg. Google, RedHat or other big player.

I asked in Slack as well for help getting it reviewed but was just told there's an explicit hold by @thockin so I'm not sure the path forward either.

I spent a lot of time on the ephemeral csi driver feature. Getting it through was similarly frustrating and there were times where I wasn't sure it was going to make it after so many delays and redesigns. So, I feel your pain. It would be great if we could find a way to make it less painful. That being said, we also did get it in eventually after missing a few major releases. So please don't give up/loose hope! The ship can be hard to turn, but it does eventually.

Anyone running any kind of topology that depends on a network sidecar is most likely hitting container startup / shutdown ordering issues that this KEP would potentially solve for. Ctrl-F for "Istio" on this ticket, and you'll see a bunch of annoyances related to container ordering.

Are there any Istio maintainers on here? A lot are Googlers, and might have some more sway with the K8s folks internally.

As an Istio / K8s shop, we're absolutely rooting for you getting this landed, @Joseph-Irving! ✊❤️

kudos for @Joseph-Irving for making sidecar this far.

Even for sidecar lifecycle management, any batch job would require this feature or Kubernetes just does not work for them, and that’s why we spent a lot of time also helping out code reviews and providing feedbacks!

We’ve been forking k8s for a while because of this and we’re really looking forward to see such important feature supported officially.

As an Istio + Kubernetes shop, we have also been waiting anxiously for this feature. And growing increasingly frustrated that it slips from release to release. We're not pleased to have to resort to hacks to kill the sidecars on job workloads. For our needs, this has been the single most important feature we need in Kubernetes, for well over a year.

@thockin It's been reported above that you've put an explicit hold on this. Can you please explain why.

There are a lot of Linkerd users who are eagerly waiting for this as well. Hang in there @Joseph-Irving, we're rooting for ya.

Not sure if everyone else here saw but after doing some digging and watching a kubecon video, I found that Lyft had done something similar to this. Here is the mentioned commit from their fork of kubernetes: https://github.com/lyft/kubernetes/commit/ba9e7975957d61a7b68adb75f007c410fc9c80cc

As an Istio + Kubernetes shop, we have also been waiting anxiously for this feature. And growing increasingly frustrated that it slips from release to release.

I'm a potential istio user but have kept to the sidelines a bit due to waiting for a feature like this. During the discussions above though, I keep seeing things that make me think that the sidecar feature alone as discussed here will not fix all the problems the istio sidecar has with the workflow. It may help though. Which I think is part of the reason this has stalled.

How does running istio in a sidecar when using the istio cni driver work? I believe init containers trying to reach the network will still fail to function properly as documented in the istio documentation.

hence my question above if network sidecars are its own thing.

This issue seems to be a prime example of how difficult it may be to contribute if you're not an employee of eg. Google, RedHat or other big player.

Hah! What you don't know is that those people get stuck sometimes too!

Seriously, I am sorry. I have excuses, but that sucks so I won't bother.

For clarification:
I'm not implying that we shouldn't merge this as alpha to get some feedback on the approach. In general I think it is sound. I think there are a few holes in the use cases such as service meshes that it doesn't quite cover. But that is not a reason to block getting this in asap so we can find all the other use cases that it doesn't cover so that we can make the beta version of the feature work well for everyone. That is precisely what an alpha is for IMO.

I'm just mentioning what I did, specifically to the folks hoping this will be a silver bullet to the existing service mesh issue. I don't think the alpha as proposed will fully fix that particular issue. So don't get your hopes up too high just yet. But please, lets not block this feature just because it doesn't support everybody just yet.

I've requested for an exception, let's see if we can try to get this in:
https://groups.google.com/d/msg/kubernetes-sig-release/RHbkIvAmIGM/nNUthrQsCQAJ

Maybe it was you or somebody else in another [Kubernetes Podcast] episode who felt really bad that this sidecar KEP didn't make it to 1.16

Please see episodes 72, with Lachie Evenson, and 83, with Guinevere Saenger. I even called out this week that PR reviews are required to get this one issue over the line. We can do it!

Are there any Istio maintainers on here? A lot are Googlers, and might have some more sway with the K8s folks internally.

@duderino and @howardjohn have both commented on this thread already.

To be clear we need merged:
kubernetes/kubernetes#79649
kubernetes/kubernetes#80744

Are there any other PRs we should be tracking?

Thanks!

  • Enhancements Team

Big thanks to everyone who posted messages of support (publicly or privately) that was very much appreciated ❤️

There was a valiant effort by members of the community to try and get this into 1.18, including the release team who accepted an extension request, but alas, the decision has been made to defer this to 1.19. You can see the relevant conversation starting from this comment: https://github.com/kubernetes/kubernetes/pull/80744#issuecomment-595292034.

Despite it not getting into 1.18, this has had a lot more attention in the past few days than it has had in quite a while, so I'm hoping that this momentum will carry forward into 1.19.

cc @jeremyrickard, @kikisdeliveryservice

Great stuff @Joseph-Irving, sounds like some of your frustrations have been worthwhile and listened to. Thanks for persevering.

/milestone v1.19

Hi all. A group of us have been discussing this topic over the last week.

First, we apologize for what has happened here. We are not happy about it.

This PR and associated KEP have brought to light a number of things that the project can do better. We would like to separate the social, procedural, and technical concerns.

Socially, this feature fell victim to our desire to please each other. Derek approved the KEP, despite reservations expressed within the SIG, because Clayton and Tim were pushing for it. We all trust each other, but apparently we don’t always feel like we’re able to say “no”. We know this because we have all done the exact same thing. None of us want to be the blocker for the next great idea.

Trusting each other has to include trusting that we can say “no” and trusting that when someone says “no”, they are doing so for good reasons. This technical area spans SIGs, and we should NOT pressure sig-node, who will ultimately be the ones to field problems, into accepting new features they are not yet comfortable to support.. This is not about Tim or Derek or Clayton in particular, but ALL of the high-level approvers and SIG leads and “senior” contributors.

This feature also fell victim to procedural uncertainty around KEPs. As a KEP reviewer, am I obligated to be a code reviewer? To delegate to a code reviewer? Or just to read the KEP? As KEPs span releases, how do we ensure a shepherd is available for the set of changes budgeted in a particular span of releases. If a KEP spans SIGs, how do we budget and allocate time across the SIGs? We need to clarify this. We’re going to work on some KEP change-proposals (KEP KEPs) to strengthen the definition of roles in the KEP process.

Technically, this feature fell victim to both time and attention. Reviewers didn’t make enough time to review it, or it simply was not high enough priority for them. Back-and-forth discussions take time. Circumstances and our understanding of the problem space change over time.

As more users adopt Kubernetes, we see an increasing number of weird edge-cases or flakes get reported to sig-node. Since the Pod lifecycle is core to Kubernetes, any change made to that subsystem MUST be undertaken carefully. Our ability to merge new features must be balanced with our desire to improve reliability. How we are thinking about the Pod lifecycle today is a bit different than how we thought of it when this feature was started. This does not diminish the use-cases leading up to this in any way, but it does suggest that long-running KEPs need to be periodically re-reviewed over time.

We think we need to do a bit of first-principles thinking around Pod lifecycle. What do we really want? We tried not to descend into too much complexity, but we fear we merely broke that complexity up into multiple phases, and the net result may be MORE complex than just tackling it head-on.

What does that mean for this PR and the associated KEP? We’re not 100% sure. It probably means we should NOT push this through yet, though.

Derek raised some concerns around the shutdown sequencing. The KEP called them out of scope for now, but there’s some hesitation. We already don’t respect graceful termination on node shutdown, and that has surprised many users. That’s not this KEP’s fault, but it let’s call it “extenuating circumstances”. If anyone uses sidecars to “clean up” their pods (e.g. to drain cached logs into a logging service), they will expect (reasonably) some clear and useful semantics around shutdown, which this KEP doesn’t guarantee.

Tim has concerns that init-sidecars will need to become a thing, and that doesn’t feel right. He waived that concern in the past, but it still bothers him.

We need SIG Node to help define what the medium-term goal is for pod lifecyle, and what their appetite is for taking that on. If we can agree that this is an incremental step toward the goal, we can unblock it, but unless we know the goal, we’re probably over-driving our headlights.

Let us all be the first to say that this stinks. We have real problem statements, a passionate contributor, and a set of well-meaning maintainers, and we ended up ... here. Tim will volunteer his time to help brainstorm and design. Derek will push node-shutdown work for the current pod lifecycle to ensure we have a stable base to grow it further. We’ll need to spec very carefully what guarantees we can and cannot make in the face of unplanned machine failures.

Thanks,
Clayton, David, Dawn, Derek, John, Tim

To try to spur some forward movement: Derek or Dawn - is there anyone is sig-node who can make time to do some brain-storming about a more holistic pod and container lifecycle?

@thockin will add this to sig-node agenda.

@thockin @derekwaynecarr whats the tl;dr as to why this could not go in?

One-line enhancement description: Containers can now be a marked as sidecars so that they startup before normal containers and shutdown after all other containers have terminated.

Sounds like something that would make life easier in this new era of service mesh sidecars.

Furthermore any recommendations for having sidecars start before main app containers and shutdown after main app container termination today?

... whats the tl;dr as to why this could not go in?

@naseemkullah From https://github.com/kubernetes/enhancements/issues/753#issuecomment-597372056 ... 👇

What does that mean for this PR and the associated KEP? We’re not 100% sure. It probably means we should NOT push this through yet, though.

Derek raised some concerns around the shutdown sequencing. The KEP called them out of scope for now, but there’s some hesitation. We already don’t respect graceful termination on node shutdown, and that has surprised many users. That’s not this KEP’s fault, but it let’s call it “extenuating circumstances”. If anyone uses sidecars to “clean up” their pods (e.g. to drain cached logs into a logging service), they will expect (reasonably) some clear and useful semantics around shutdown, which this KEP doesn’t guarantee.

[...]

We need SIG Node to help define what the medium-term goal is for pod lifecyle, and what their appetite is for taking that on. If we can agree that this _is_ an incremental step toward the goal, we can unblock it, but unless we know the goal, we’re probably over-driving our headlights.

Respectfully, I am curious as to whether any leads plan to prioritize sorting this out. @Joseph-Irving put an enormous amount of work into this and a staggering number of people who would have been happy with his solution are anxious to hear some superior solution from those who nixed this.

Minimally, even though there are qualms with a few aspects of it, I think it is still reasonable to get in as an Alpha in order to find what issues will show in practice. Can we get this merged? The issues can block it from going Beta so I don't think its critical to get perfect before an initial Alpha is made.

will add this to sig-node agenda.

@thockin @derekwaynecarr is there any update on the current state of this? I looked through the sig-node meeting notes and don't see anything about this.

There are a large number of devs on this thread who would be more than happy to contribute time to getting this implemented, as its critical for many use cases (the KEP itself has 2.5x as many :+1: as any other KEP). What can we do to make this happen? Having a list of prerequisites to stability of this area, even if it may span many releases to accomplish, that we could start actively working on would be a huge improvement from where we are today.

Hi @Joseph-Irving @thockin @khenidak @kow3ns -- 1.19 Enhancements Lead here, I wanted to check in if you think this enhancement would graduate in 1.19?

In order to have this part of the release:

  1. The KEP PR must be merged in an implementable state
  2. The KEP must have test plans
  3. The KEP must have graduation criteria.

The current release schedule is:

  • Monday, April 13: Week 1 - Release cycle begins
  • Tuesday, May 19: Week 6 - Enhancements Freeze
  • Thursday, June 25: Week 11 - Code Freeze
  • Thursday, July 9: Week 14 - Docs must be completed and reviewed
  • Tuesday, August 4: Week 17 - Kubernetes v1.19.0 released

@palnabarun, As per this comment https://github.com/kubernetes/enhancements/issues/753#issuecomment-597372056, this KEP has been put on indefinite hold, so no it won't be graduating in 1.19.

Thank you @Joseph-Irving for clarifying the situation. :+1:

Appreciate your efforts!

To everyone who is eager to get this in, and again to @Joseph-Irving - I am personally very sorry for this situation. I want this (or something like it), too, but the fact of the matter is that sig-node has more work to do than people to do it right now, and they are not ready to consider this.

It sucks. I get it. I really really do.

The best way people could help is to jump into sig-node and help make more capacity by taking on code-reviews and issue triage, by fixing bugs and tests, and by building toward a place where the sig-node experts have more capacity and confidence in making such a change.

sig-node has more work to do than people to do it right now

Understood. We've been promoting, with emphasis, sig-node's capacity needs internally. We are bringing on and mentoring sig-node OSS volunteers, some experienced some new, all with a desire to work in this space (four so far). I'll be citing to your comment @thockin, thank you!

/milestone clear

The best way people could help is to jump into sig-node and help make more capacity by taking on code-reviews and issue triage, by fixing bugs and tests, and by building toward a place where the sig-node experts have more capacity and confidence in making such a change.

@thockin Could you provide the links like repositories, mailing lists, guides etc. ? That would help people get an idea on how to engage with sig-node effectively. This particular feature request is over 2 years old with no resolution in sight.

@tariq1890 folks writing this KEP have done everything right. they have left no stones unturned. The issue here is exactly what @thockin said, there's tech debt we need to fix first and hands are needed for that before we can consider this one. So the ask is for folks to help out with what needs to be done.

Please see the latest update here : https://github.com/kubernetes/enhancements/pull/1874

@dims I think I've been misunderstood. What I meant to say is that we need a list of actionable targets and goals. If there is tech debt to be dealt with, then we could maintain a GitHub Milestone or provide a bulleted list of pending action items in the OPs comment so that people visiting this issue can know right away what needs to be addressed.

I am definitely willing to offer my help to sig/node with advancing this KEP, but just don't know how

@tariq1890 the specific ask is here : "prerequisite on the (not yet submitted KEP) kubelet node graceful shutdown" https://github.com/kubernetes/enhancements/pull/1874/files#diff-c6212b56619f2b462935ad5f631d772fR94

We need to get that started. Someone has to take point and get that going.

-- Dims

So the summarise https://github.com/kubernetes/enhancements/pull/1874 for those in this issue: Sig-node (and others) think it is unwise to introduce a new feature like this KEP, which adds more complex behaviour to pod termination, while there is still the more generic problem of pod termination while a node is being shut down.
So it's been decided that this feature won't progress until the solution to node termination has been implemented.
There's currently a google doc here: https://docs.google.com/document/d/1mPBLcNyrGzsLDA6unBn00mMwYzlP2tSct0n8lWfuRGE
Which contains a lot of the discussion around the issue, but the KEP for this is yet to be submitted.
There are still open questions so commenting on there could be helpful, I believe @bobbypage and @mrunalp are leading this effort so perhaps they can share any other ways people could assist with moving this forward.

@Joseph-Irving thanks a ton for summarizing. I am hoping all the +ve energy on this enhancement translates to more participation from everyone in sig-node on a regular basis and not just a one off for features. There's plenty of work to do and very few hands.

Hi! One more comment regarding this KEP, though: I raised some edge cases about this KEP in past SIG-node meetings (June 23 if you want to watch the recordings) and we decided that the proper way to continue that discussion is opening PRs about those issues so we can decide how is best to proceed.

I'm currently working on a PR to state those issues and some alternatives I can think of.

Also, the KEP state is now provisional (instead of implementable) so it can be reviewed and only set to implementable again when all people agree feel comfortable to move forward with the KEP.

I think this was the only missing bit of information in this issue. Thanks!

@rata Did you open issues/PRs on the proper way to handle the issues?

@mattfarina This is the PR https://github.com/kubernetes/enhancements/pull/1913
It contains a number of proposed solutions to current problems/edge cases in the KEP
Also contains details a number of alternatives that were discussed and decided against, so that we have a better log of why certain decisions have been made.

I would very much like to see the sidecar functionality also cover scaling:
Today, HPA scaling is based on a metric (such as cpu). If the pod contains more than one container, the average across all containers is used (as far as I know). For pods with sidecar (app+nginx etc) this makes it very hard to make scaling function correctly. I was hoping that the sidecar implementation in Kubernetes would include marking one container in the pod as "authorative" in terms of metrics used for HPA scaling.

I would very much like to see the sidecar functionality also cover scaling:

I agree this would be useful but its not necessarily "sidecar" specific and since the implementation is uncoupled from this it may make sense to make it a separate issue - this one is already very complex. I am also not convinced you want to just ignore the sidecar. We may want per-container HPA scaling instead, for example. Not sure - would need exploring as its own issue I think.

Does anyone have any reference to, or could be so kind to share, the current workaround for this issue, specifically for the case of the Istio Envoy sidecar?

I recall a possible workaround involving:

  • a custom Envoy image which ignores SIGTERM.
  • invoking /quitquitquit on Envoy from within the application container on shutdown (similar to the Job completion workaround)

Does anyone have any reference to, or could be so kind to share, the current workaround for this issue, specifically for the case of the Istio Envoy sidecar?

We use a custom daemon image like a supervisor to wrap the user's program. The daemon will also listen to a particular port to convey the health status of users' programs (exited or not).

Here is the workaround:

  • Using the daemon image as initContainers to copy the binary to a shared volume.
  • Our CD will hijack users' command, let the daemon start first. Then, the daemon runs the users' program until Envoy is ready.
  • Also, we add preStop, a script that keeps checking the daemon's health status, for Envoy.

As a result, the users' process will start if Envoy is ready, and Envoy will stop after the process of users is exited.

It's a complicated workaround, but it works fine in our production environment.

yeah it was moved in https://github.com/kubernetes/enhancements/pull/1913, I've updated the link

Does anyone have any reference to, or could be so kind to share, the current workaround for this issue, specifically for the case of the Istio Envoy sidecar?

@shaneqld for startup issues, the istio community came up with a quite clever workaround which basically injects envoy as the first container in the container list and adds a postStart hook that checks and wait for envoy to be ready. This is blocking and the other containers are not started making sure envoy is there and ready before starting the app container.

We had to port this to the version we're running but is quite straightforward and are happy with the results so far.

For shutdown we are also 'solving' with preStop hook but adding an arbitrary sleep which we hope the application would have gracefully shutdown before continue with SIGTERM.

Hi @Joseph-Irving @thockin and everyone else :smile:

Enhancements Lead here. I see that there is still a ton of ongoing conversation, but as a reminder, please keep us updated of any plans to include this in 1.20 are decided so we can track the progress.

Thanks!
Kirsten

@kikisdeliveryservice will keep you posted, thanks!

Does anyone have any reference to, or could be so kind to share, the current workaround for this issue, specifically for the case of the Istio Envoy sidecar?

@shaneqld for startup issues, the istio community came up with a quite clever workaround which basically injects envoy as the first container in the container list and adds a postStart hook that checks and wait for envoy to be ready. This is blocking and the other containers are not started making sure envoy is there and ready before starting the app container.

We had to port this to the version we're running but is quite straightforward and are happy with the results so far.

For shutdown we are also 'solving' with preStop hook but adding an arbitrary sleep which we hope the application would have gracefully shutdown before continue with SIGTERM.

Could you show some insights in detail how to do these? How to achieve adding 'pre-stop' to the Istio-proxy sidecar? It seems it needs some custom configuration or use custom sidecar. I face the same issue that when pods scales down, the main container is trying to finish the jobs but it loses connection to outside probably because the Istio-sidecar closed immediately after SIGTERM. Right now I just use the default sidecar injection. Thank you!

Ok this thread is getting hijacked. Let's stay on topic, please.

Just a gentle reminder that Enhancements Freeze is next week, Tuesday, October 6th. By that time the KEP would need to be updated to be marked implementable.

Also the KEP is using an older format, so updating would be great (once you finish hammering out the details): https://github.com/kubernetes/enhancements/tree/master/keps/NNNN-kep-template

@kikisdeliveryservice thanks for the remainder. Will do if it is decided to be included for 1.20. Thanks! :)

This won't be part of 1.20. Thanks a lot for pinging! :)

I have an interest in this issue, and wanted to thank both @Joseph-Irving and @howardjohn for their insights on this, which helped resolve some of my questions.

I don't want to hijack this proposal, but based on the conversations above, I wonder if this is maybe a slightly broader/larger issue than has been recognised so far.

I can imagine the following solutions to this issue -

  1. Define a new container entity "sidecar container" which starts after initContainers, before "main containers", and terminate after "main containers" terminate (per @Joseph-Irving original proposal)
  2. Define an additional field on (1) which sets whether the "sidecar container" starts before the initContainer(s) per @luksa suggestion).
  3. Go broader.

Personally, option (2) solves my immediate problem.

But I'm wondering if these questions don't speak to a more strategic issue in K8s around scheduling and how we define a pod. In my specific (Istio related) case, I suggested something like runlevels within pods.

Option (2) solves my problem too, but I can imagine even more complex dependency structures which might call for embedding a DAG of container dependencies within a pod/statefulSet/daemonSet/whatever - this is the option (3) I am thinking of.

Just wondering if this issue should really be re-focused on the pod definition itself, with a view to creating something more generic? I originally thought in terms of a runlevels analogy, but maybe an Airflow-like DAG structure would have the broadest applicability.

What about adding envoy as init container as well? This way it will provide network for other init containers. When init will finish it would 'exit 0' as well, and then regular envoy (not init) would take over

@michalzxc If I'm not wrong, init containers are executed one by one sequentially, so you currently can't have an envoy next to another container as an init-container.

Hi!

The sidecar discussion continued on this places (I've updated sig-node slack, github PR that started this and several mailing lists):
https://groups.google.com/g/kubernetes-sig-node/c/w019G3R5VsQ/m/bbRDZTv5CAAJ
https://groups.google.com/g/kubernetes-sig-node/c/7kUaX-jBN3M/m/dhI3E8LOAAAJ

As you can see, we are collecting use cases now, after having some more use cases different small groups can create pre-proposals addressing them. Feel free to add your use case (if it's not there yet) or join later for the pre-proposals part :-)

Please, let's keep this enhacement issue on topic (and probably closed). You are welcome to join the conversation on those places :)

This KEP will not be progressing, Sig-node and others feel that this is not an incremental step in the right direction so they've gone back to the drawing board and will be coming up with some new KEPs that can hopefully solve all the use cases stated in this KEP as well as others.

Please see @rata's previous comment https://github.com/kubernetes/enhancements/issues/753#issuecomment-707014341
For places where you can contribute to the discussion.

It's unfortunate that all the work done on this KEP will not be getting used, but a wider group of people are now thinking about these issues so hopefully the solution they come up with will be what's best for everyone.
I've already spent over two years trying to get this through so I think this a good time for me to move on, @rata and others will be leading these initiatives going forward.

/close

@Joseph-Irving: Closing this issue.

In response to this:

This KEP will not be progressing, Sig-node and others feel that this is not an incremental step in the right direction so they've gone back to the drawing board and will be coming up with some new KEPs that can hopefully solve all the use cases stated in this KEP as well as others.

Please see @rata's previous comment https://github.com/kubernetes/enhancements/issues/753#issuecomment-707014341
For places where you can contribute to the discussion.

It's unfortunate that all the work done on this KEP will not be getting used, but a wider group of people are now thinking about these issues so hopefully the solution they come up with will be what's best for everyone.
I've already spent over two years trying to get this through so I think this a good time for me to move on, @rata and others will be leading these initiatives going forward.

/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

Was this page helpful?
0 / 5 - 0 ratings

Related issues

prameshj picture prameshj  ·  9Comments

robscott picture robscott  ·  11Comments

justaugustus picture justaugustus  ·  7Comments

dekkagaijin picture dekkagaijin  ·  9Comments

justaugustus picture justaugustus  ·  3Comments