Kubectl: Service port forwarding recovery on restarted pods

Created on 23 Jul 2019 · 25Comments · Source: kubernetes/kubectl

When I start kubectl port-forward svc/leeroy-app 50053:50051 it works the first time.
If I kill the pod behind the service, kubernetes restarts the pod, and then the port forwarding starts failing:

Handling connection for 50053
Handling connection for 50053
E0722 16:21:00.929687  155541 portforward.go:340] error creating error stream for port 50053 -> 50051: Timeout occured
E0722 16:21:00.969972  155541 portforward.go:362] error creating forwarding stream for port 50053 -> 50051: Timeout occured
E0722 16:21:02.989783  155541 portforward.go:362] error creating forwarding stream for port 50053 -> 50051: Timeout occured
E0722 16:21:03.998054  155541 portforward.go:362] error creating forwarding stream for port 50053 -> 50051: Timeout occured
E0722 16:21:04.598329  155541 portforward.go:340] error creating error stream for port 50053 -> 50051: Timeout occured
E0722 16:21:05.577799  155541 portforward.go:362] error creating forwarding stream for port 50053 -> 50051: Timeout occured
Handling connection for 50053
E0722 16:21:06.166770  155541 portforward.go:362] error creating forwarding stream for port 50053 -> 50051: Timeout occured
E0722 16:21:35.578937  155541 portforward.go:340] error creating error stream for port 50053 -> 50051: Timeout occured
Handling connection for 50053
Handling connection for 50053
E0722 16:21:40.688533  155541 portforward.go:400] an error occurred forwarding 50053 -> 50051: error forwarding port 50051 to pod 6b8250b5be8d3e65ed5d9c900cb87966bed006b57cc81617d27b6ba271742815, uid : Error: No such container: 6b8250b5be8d3e65ed5d9c900cb87966bed006b57cc81617d27b6ba271742815
E0722 16:22:10.606373  155541 portforward.go:340] error creating error stream for port 50053 -> 50051: Timeout occured
Handling connection for 50053
Handling connection for 50053
E0722 16:22:40.712581  155541 portforward.go:340] error creating error stream for port 50053 -> 50051: Timeout occured
E0722 16:22:40.712668  155541 portforward.go:340] error creating error stream for port 50053 -> 50051: Timeout occured

If I kill manually kubectl port forwarding and restart, it works.

I would love to see the recovery automatically instead of having to parse the output and restart manually.

We are building portforwarding into our application through kubectl and this would help a lot with the integration.

kinbug lifecyclrotten prioritbacklog

Source

balopat

👍38

Most helpful comment

Isn't there any suggested implementation to implement this automatic recovery?

pvsousalima on 1 Oct 2020

👍3 👀1

All 25 comments

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle stale

fejta-bot on 21 Oct 2019

Stale issues rot after 30d of inactivity.
Mark the issue as fresh with /remove-lifecycle rotten.
Rotten issues close after an additional 30d of inactivity.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle rotten

fejta-bot on 20 Nov 2019

Rotten issues close after 30d of inactivity.
Reopen the issue with /reopen.
Mark the issue as fresh with /remove-lifecycle rotten.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/close

fejta-bot on 20 Dec 2019

@fejta-bot: Closing this issue.

In response to this:

Rotten issues close after 30d of inactivity.
Reopen the issue with /reopen.
Mark the issue as fresh with /remove-lifecycle rotten.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

k8s-ci-robot on 20 Dec 2019

/reopen

jjfmarket on 5 Sep 2020

👍1

@jjfmarket: You can't reopen an issue/PR unless you authored it or you are a collaborator.

In response to this:

/reopen

k8s-ci-robot on 5 Sep 2020

I also see this behavior, my port forwards start failing after I restart the pod that was being forwarded to.

jjfmarket on 5 Sep 2020

/reopen

jjfmarket on 5 Sep 2020

@jjfmarket: You can't reopen an issue/PR unless you authored it or you are a collaborator.

In response to this:

/reopen

k8s-ci-robot on 5 Sep 2020

Isn't there any suggested implementation to implement this automatic recovery?

pvsousalima on 1 Oct 2020

👍3 👀1

/reopen

brianpursley on 2 Oct 2020

👍1

@brianpursley: Reopened this issue.

In response to this:

/reopen

k8s-ci-robot on 2 Oct 2020

I was taking a look at this a little bit today and I think this is a legitimate issue.

The problem seems to be that port forwarding enters some sort of unrecoverable state after it is no longer able to communicate with the pod it was connected to, and yet it does not fail with an exit code either.

Here are my steps to reproduce (use two terminals)

terminal 1

kubectl run sysinfo --image=brianpursley/system-info

terminal 2

kubectl port-forward sysinfo 8080:80

Open a browser or curl to make some requests to http://localhost:8080 and verify that port forwarding is working

terminal 1

kubectl delete pod sysinfo

Open a browser or curl to make some requests to http://localhost:8080 and verify that port forwarding is no longer working

terminal 2
You will see some errors like these:

Forwarding from 127.0.0.1:8080 -> 80
Forwarding from [::1]:8080 -> 80
Handling connection for 8080
Handling connection for 8080
Handling connection for 8080
E1002 15:12:34.808176  125749 portforward.go:400] an error occurred forwarding 8080 -> 80: error forwarding port 80 to pod e2cb7d04631d95df43a87ad38952a027074a146da9ff85c43866c4e2b2806009, uid : exit status 1: 2020/10/02 15:12:34 socat[2905824] E connect(5, AF=2 127.0.0.1:80, 16): Connection refused
Handling connection for 8080
E1002 15:12:34.822191  125749 portforward.go:400] an error occurred forwarding 8080 -> 80: error forwarding port 80 to pod e2cb7d04631d95df43a87ad38952a027074a146da9ff85c43866c4e2b2806009, uid : exit status 1: 2020/10/02 15:12:34 socat[2905825] E connect(5, AF=2 127.0.0.1:80, 16): Connection refused
Handling connection for 8080
E1002 15:12:34.835750  125749 portforward.go:400] an error occurred forwarding 8080 -> 80: error forwarding port 80 to pod e2cb7d04631d95df43a87ad38952a027074a146da9ff85c43866c4e2b2806009, uid : exit status 1: 2020/10/02 15:12:34 socat[2905826] E connect(5, AF=2 127.0.0.1:80, 16): Connection refused

The problem is that kubectl port-forward never exits, and even if I do kubectl run sysinfo --image=brianpursley/system-info it is not able to reestablish a connection, so it is sort of stuck in some invalid state.

NOTE: My example above is for a single pod, but you can port-forward to a service or deployment, in which case it will select a single pod within the deployment and forward to that pod only. You can follow similar steps to reproduce the issue with a deployment, but you have to find the pod it is connect to and delete that pod to see the effect.

Ideas on possible solutions

Detect connection errors and exit with a nonzero exit code
Detect connection errors and automatically attempt to re-establish a new port forwarding connection

brianpursley on 2 Oct 2020

👍2

/remove-lifecycle rotten

brianpursley on 2 Oct 2020

let try to reproduce this report and work on it.

dougsland on 6 Oct 2020

/assign

dougsland on 6 Oct 2020

Hey @soltysh, I am wondering if we can discuss this one in the sig meeting. Should os.Exit(1) enough for this one ? Just tested a local patch and it works.

dougsland on 11 Oct 2020

/priority backlog
/kind bug

eddiezane on 14 Oct 2020

Hey @soltysh, I am wondering if we can discuss this one in the sig meeting. Should os.Exit(1) enough for this one ? Just tested a local patch and it works.

@dougsland just open a PR and pls ping me on slack with it, I'll review

soltysh on 15 Oct 2020

/priority backlog
/kind bug

soltysh on 15 Oct 2020

Hey @soltysh, I am wondering if we can discuss this one in the sig meeting. Should os.Exit(1) enough for this one ? Just tested a local patch and it works.

@dougsland just open a PR and pls ping me on slack with it, I'll review

Spoke on slack. We don't exit from library code. The library code in client-go is starting a server to forward requests, so it's behavior is like ListenAndServe. We don't expect it to exit on failures, the same way we don't expect to have an http server exit on errors. Anything that gets added would need coordination in a high layer of logic in kubectl.

If someone actively pursues it, I think it will be important to write down the conditions for behavior changes in kubectl and then provide a way to expose the information from the port-forwarding server back to kubectl. I don't expect it to be a small fix, since there are many different reasons for failures.

deads2k on 22 Oct 2020

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle stale

fejta-bot on 20 Jan 2021

Stale issues rot after 30d of inactivity.
Mark the issue as fresh with /remove-lifecycle rotten.
Rotten issues close after an additional 30d of inactivity.

If this issue is safe to close now please do so with /close.

Send feedback to sig-contributor-experience at kubernetes/community.
/lifecycle rotten

fejta-bot on 19 Feb 2021

Rotten issues close after 30d of inactivity.
Reopen the issue with /reopen.
Mark the issue as fresh with /remove-lifecycle rotten.

Send feedback to sig-contributor-experience at kubernetes/community.
/close

fejta-bot on 21 Mar 2021

@fejta-bot: Closing this issue.

In response to this:

Rotten issues close after 30d of inactivity.
Reopen the issue with /reopen.
Mark the issue as fresh with /remove-lifecycle rotten.

Send feedback to sig-contributor-experience at kubernetes/community.
/close