The current implementation of port forwarding in containerd and cri-o forwards a port from the Pod's network namespace directly. This works for normal Linux containers but fails for some runtimes (particularly sandbox runtimes) due to the fact that networking is handled differently.
A more generic way of running port forward would be necessary in order to make this work for more runtimes.
Options discussed on https://github.com/containerd/cri/issues/1325 :
This option makes no changes to Kubernetes itself, instead CRI implementations deal with the implementation entirely. containerd's shim implementation would run a hidden socat container in the pod.
Pros:
Cons:
socat
in. Pods can be fairly dynamic, so it's likely that CRI implementations would need to maintain their own socat
image and execute their own container inside the Pod sandbox.This option makes no changes to Kubernetes itself, instead CRI implementations deal with the implementation entirely. containerd shim implementations implement logic much like what is in cri-o or containerd's cri plugin with a runtime specific implementation.
Pros:
Cons:
This option adds socat to the k8s.gcr.io/pause image and executes socat in the pause container for port-forward.
Pros:
Cons:
This option runs socat as an Ephemeral Container with new support for port forwarding. The Ephemeral Container API will be extended with a port forwarding endpoint to allow port forwarding into the container. kubectl port-forward
creates an Ephemeral Container using a new socat image that the Kubernetes project maintains and runs port forwarding on it.
kubectl port-forward
could be deprecated and kubectl debug ... --port-forward XXX:XXX ...
could be introduced for port forwarding support instead (to enable providing a custom image etc.)Pros:
Cons:
References:
/sig node
The current implementation of port forwarding to a Pod uses
nsenter
to runsocat
in the Pod's network namespace
just for the record, we were able to remove the socat dependency in containerd and CRIO recently:
https://github.com/containerd/cri/pull/1470
https://github.com/cri-o/cri-o/pull/3749
just for the record, we were able to remove the socat dependency in containerd and CRIO recently:
containerd/cri#1470
cri-o/cri-o#3749
Yeah. The unfortunate part of the implementation is that entering the netns won't work for sandbox runtimes so we'd likely have to add the socat dependency back. At least for those runtimes.
Yeah, it has been a long standing issue for Kata Containers as well. Thank you for bringing it up again!
All the above mentioned three options can work for Kata Containers. Also it is possible for Kata Containers to remove the socat dependency as well, by implementing the similar functionality as containerd/CRI-O has done in kata-agent
(which is a long running daemon inside the sandbox). Is it possible for gvisor to do the same? If so, that can form a forth option in the KEP.
@bergwolf Yes, I think it would be possible to add port-forwarding to the containerd shim API and have each shim implementation do the necessary work to set up the connection. gVisor could support that as well. That would make it effectively a containerd CRI implementation detail and wouldn't require any changes to Kubernetes (and as such wouldn't require a KEP?), but would make shim implementations a bit more complicated.
I added option 1b that describes what I mean. This would work for containerd but cri-o would need to have it's own mechanism for oci runtime specific behavior.
(CRI-O maintainer here) Out of all the options, 1b or 3 sound the best to me.
3 seems like a good use case for ephemeral containers, and I think is my favorite, but is also the most difficult.
CRI-O already branches on the logic for different runtimes, so 1b should be fairly simple to plug in a new implementation for PortForward for the vm
type runtime.
We're working on moving away from pause container (as mentioned) so if at all possible I'd like to not go for option 2
options may need some tweaking .. socat was removed here https://github.com/containerd/cri/pull/1470 and https://github.com/containerd/cri/pull/1477
@mikebrow Yeah, I'm aware. We would need to bring them back or put that logic somewhere in an agent for each OCI runtime as runtimes like kata and gVisor remove the IP address from the veth in the Pod's netns. The goal of options 1a, 2, and 3 was to think of options that would be OCI runtime independent and would work across runc, gVisor, Kata, etc. 1b was added later and is the opposite approach which moves the logic you pointed out into the runtime shims so that they can be OCI runtime specific.
So assuming I understand this, the underlying problem is that there are "high-level runtimes" (containerd, cri-o) which support multiple "low-level runtimes" (runc, gVisor, kata, etc), and some or all of the high-level runtimes are implementing the CRI PortForward
API in a way that is not compatible with all of the low-level runtimes that they support?
In which case, 1a and 1b are basically "solve this in the runtimes, it's not Kubernetes/CRI's problem", and 3 is "solve this in Kubernetes (but in a way that fits in well with other Kubernetes features and may facilitate other use cases)", and 2 just seems clearly pretty wrong ("make changes in Kubernetes to simplify things for certain runtimes that want to implement a purely-runtime-internal feature in a specific way")
As with the removal of socat from the container runtimes.. the container runtimes are also leaning to removing the pause container. Perhaps we could use an NRI plugin (1.c). Should do a call with @crosbymichael and @mrunalp to discuss 1.b and 1.c as possible routes.
can anybody describe the networking problem for a non-runtime person?
I can't understand why socat is a requirement and why current implementation can not handle other runtimes
https://github.com/containerd/cri/pull/1470
@danwinship /cc @aojea Yes. "low level" runtimes that run the workload in a sandbox often remove the IP address from the veth because they handle the network stack themselves and only send raw packets. This means port-forwarding doesn't work for those runtimes because containerd or cri-o's implementations simply forward the port from inside the Pod's net namespace. No IP address, no port to forward :(. Your summary of the options and motivations is accurate.
@aojea The idea for using socat as a requirement was a thought experiment to make port-forwarding support just be running socat in a container in the pod and piping stdout. For sandbox runtimes, running it as a container in the Pod means it gets run in the sandbox and can forward the port properly and has the benefit that any (Linux) runtime should work and shim's wouldn't need to implement it. For example, if done right, 1a could be implemented entirely in containerd/cri and would avoid needing to update the shim API, dealing with backwards compatibility etc.
@mikebrow /cc @crosbymichael @mrunalp NRI seems interesting though it looks to be in early development. Happy to set up a call if it's in the afternoon U.S. time. If there's a consensus on one of the options I can of course put together a more detailed proposal or POC.
@mikebrow @crosbymichael @mrunalp (and anyone else who wants to join) can we maybe meet on Mon Sept 26 at around 4pm to decide on a path forward? I think Meet or Zoom should work for me.
Sorry @ianlewis missed your ping here. So many messages of late. Yes I'd love to meet to discuss a path forward.
Thx for the discussion. SGTM to POC 1.b