Enhancements: Generic port forwarding

Created on 4 Jun 2020  路  15Comments  路  Source: kubernetes/enhancements

Enhancement Description

  • One-line enhancement description (can be used as a release note): Generic Port Forwarding
  • Kubernetes Enhancement Proposal: (link to kubernetes/enhancements file, if none yet, link to PR)
  • Primary contact (assignee): @ianlewis
  • Responsible SIGs: sig-node
  • Enhancement target (which target equals to which milestone):

    • Alpha release target (x.y)

    • Beta release target (x.y)

    • Stable release target (x.y)

The current implementation of port forwarding in containerd and cri-o forwards a port from the Pod's network namespace directly. This works for normal Linux containers but fails for some runtimes (particularly sandbox runtimes) due to the fact that networking is handled differently.

A more generic way of running port forward would be necessary in order to make this work for more runtimes.
Options discussed on https://github.com/containerd/cri/issues/1325 :

Option 1a: Support entirely in CRI implementation with socat

This option makes no changes to Kubernetes itself, instead CRI implementations deal with the implementation entirely. containerd's shim implementation would run a hidden socat container in the pod.

Pros:

  • No changes needed to Kubernetes

Cons:

  • CRI implementations need to have a way to specify which container to run socat in. Pods can be fairly dynamic, so it's likely that CRI implementations would need to maintain their own socat image and execute their own container inside the Pod sandbox.

    • Lifecycle of the container would not be managed by the kubelet. Race conditions with deleting the Pod etc. could be possible and would need to be managed by the CRI runtime.

    • Whether the socat container is confined by resource limits may depend on the runtime. Sandbox runtimes such as gVisor or Kata Containers would need to run socat in a resource limited sandbox whereas normal Linux containers would allow running outside the Pod sandbox.

Option 1b: Support entirely in CRI implementation in shim

This option makes no changes to Kubernetes itself, instead CRI implementations deal with the implementation entirely. containerd shim implementations implement logic much like what is in cri-o or containerd's cri plugin with a runtime specific implementation.

Pros:

  • No changes needed to Kubernetes

Cons:

  • CRI implementations need maintain the runtime specific code necessary for port-forwarding.
  • Possible inconsistency with how resources are constrained. port-forwarding logic for gVisor/kata would be confined

Option 2: Add socat to the pause container

This option adds socat to the k8s.gcr.io/pause image and executes socat in the pause container for port-forward.

Pros:

  • Generic for Linux based container runtimes (i.e. would work for gVisor, Kata Containers, etc.)
  • No changes needed to Kubernetes API. Only the pause container would need to change to include socat.

Cons:

  • Increased complexity of the pause container. The pause container is an implementation detail that could theoretically be removed in the future.
  • Resource limits would need to be placed on the pause container.
  • The pause container runs with a very low oom_score_adj. socat would inherit this score without special processing. Without special processing to set oom_score_adj for socat, there is a possibility that socat would use up memory on the host and the host would be unable to reclaim the memory. Given the nature of sandbox runtimes like gVisor/Kata, simply setting /proc/PID/oom_score_adj for the socat process won't work.

Option 3: Add port forwarding to Ephemeral Containers

This option runs socat as an Ephemeral Container with new support for port forwarding. The Ephemeral Container API will be extended with a port forwarding endpoint to allow port forwarding into the container. kubectl port-forward creates an Ephemeral Container using a new socat image that the Kubernetes project maintains and runs port forwarding on it.

  • kubectl port-forward could include new options to specify the --image and --command used
  • Alternatively kubectl port-forward could be deprecated and kubectl debug ... --port-forward XXX:XXX ... could be introduced for port forwarding support instead (to enable providing a custom image etc.)

Pros:

  • Keeps with the spirit of Ephemeral Containers being used for debugging.
  • Resource and lifecycle management takes advantage of logic for Ephemeral Containers.
  • Generic. Allows for other use cases via the API other than using the generic socat image. Users could use their own generic debug image that includes socat and other tools.

Cons:

  • Requires changes to the Kubernetes API and kubectl and deprecation of old port-forward endpoint.
  • Requires maintaining a new image for socat.

References:

/sig node

sinode

All 15 comments

The current implementation of port forwarding to a Pod uses nsenter to run socat in the Pod's network namespace

just for the record, we were able to remove the socat dependency in containerd and CRIO recently:
https://github.com/containerd/cri/pull/1470
https://github.com/cri-o/cri-o/pull/3749

just for the record, we were able to remove the socat dependency in containerd and CRIO recently:
containerd/cri#1470
cri-o/cri-o#3749

Yeah. The unfortunate part of the implementation is that entering the netns won't work for sandbox runtimes so we'd likely have to add the socat dependency back. At least for those runtimes.

Yeah, it has been a long standing issue for Kata Containers as well. Thank you for bringing it up again!

All the above mentioned three options can work for Kata Containers. Also it is possible for Kata Containers to remove the socat dependency as well, by implementing the similar functionality as containerd/CRI-O has done in kata-agent(which is a long running daemon inside the sandbox). Is it possible for gvisor to do the same? If so, that can form a forth option in the KEP.

@bergwolf Yes, I think it would be possible to add port-forwarding to the containerd shim API and have each shim implementation do the necessary work to set up the connection. gVisor could support that as well. That would make it effectively a containerd CRI implementation detail and wouldn't require any changes to Kubernetes (and as such wouldn't require a KEP?), but would make shim implementations a bit more complicated.

I added option 1b that describes what I mean. This would work for containerd but cri-o would need to have it's own mechanism for oci runtime specific behavior.

(CRI-O maintainer here) Out of all the options, 1b or 3 sound the best to me.

3 seems like a good use case for ephemeral containers, and I think is my favorite, but is also the most difficult.

CRI-O already branches on the logic for different runtimes, so 1b should be fairly simple to plug in a new implementation for PortForward for the vm type runtime.

We're working on moving away from pause container (as mentioned) so if at all possible I'd like to not go for option 2

options may need some tweaking .. socat was removed here https://github.com/containerd/cri/pull/1470 and https://github.com/containerd/cri/pull/1477

@mikebrow Yeah, I'm aware. We would need to bring them back or put that logic somewhere in an agent for each OCI runtime as runtimes like kata and gVisor remove the IP address from the veth in the Pod's netns. The goal of options 1a, 2, and 3 was to think of options that would be OCI runtime independent and would work across runc, gVisor, Kata, etc. 1b was added later and is the opposite approach which moves the logic you pointed out into the runtime shims so that they can be OCI runtime specific.

So assuming I understand this, the underlying problem is that there are "high-level runtimes" (containerd, cri-o) which support multiple "low-level runtimes" (runc, gVisor, kata, etc), and some or all of the high-level runtimes are implementing the CRI PortForward API in a way that is not compatible with all of the low-level runtimes that they support?

In which case, 1a and 1b are basically "solve this in the runtimes, it's not Kubernetes/CRI's problem", and 3 is "solve this in Kubernetes (but in a way that fits in well with other Kubernetes features and may facilitate other use cases)", and 2 just seems clearly pretty wrong ("make changes in Kubernetes to simplify things for certain runtimes that want to implement a purely-runtime-internal feature in a specific way")

As with the removal of socat from the container runtimes.. the container runtimes are also leaning to removing the pause container. Perhaps we could use an NRI plugin (1.c). Should do a call with @crosbymichael and @mrunalp to discuss 1.b and 1.c as possible routes.

can anybody describe the networking problem for a non-runtime person?
I can't understand why socat is a requirement and why current implementation can not handle other runtimes
https://github.com/containerd/cri/pull/1470

@danwinship /cc @aojea Yes. "low level" runtimes that run the workload in a sandbox often remove the IP address from the veth because they handle the network stack themselves and only send raw packets. This means port-forwarding doesn't work for those runtimes because containerd or cri-o's implementations simply forward the port from inside the Pod's net namespace. No IP address, no port to forward :(. Your summary of the options and motivations is accurate.

@aojea The idea for using socat as a requirement was a thought experiment to make port-forwarding support just be running socat in a container in the pod and piping stdout. For sandbox runtimes, running it as a container in the Pod means it gets run in the sandbox and can forward the port properly and has the benefit that any (Linux) runtime should work and shim's wouldn't need to implement it. For example, if done right, 1a could be implemented entirely in containerd/cri and would avoid needing to update the shim API, dealing with backwards compatibility etc.

@mikebrow /cc @crosbymichael @mrunalp NRI seems interesting though it looks to be in early development. Happy to set up a call if it's in the afternoon U.S. time. If there's a consensus on one of the options I can of course put together a more detailed proposal or POC.

@mikebrow @crosbymichael @mrunalp (and anyone else who wants to join) can we maybe meet on Mon Sept 26 at around 4pm to decide on a path forward? I think Meet or Zoom should work for me.

Sorry @ianlewis missed your ping here. So many messages of late. Yes I'd love to meet to discuss a path forward.

Thx for the discussion. SGTM to POC 1.b

Was this page helpful?
0 / 5 - 0 ratings