Origin: oc exec command gives an error

Created on 19 Jan 2018 · 7Comments · Source: openshift/origin

On trying to execute a remote command on one of the pods using 'oc exec', the command fails with below errors:

Command Executed:

oc exec -n pr-004 pr-004-dc-000-2-5sbn2 'ls'

Console Error:

Error from server: error dialing backend: dial tcp 10.145.194.6:10250: getsockopt: connection refused

Journalctl Error:

Jan 19 07:42:12 master0 dockerd-current[103731]: E0119 07:42:12.303571       1 status.go:62] apiserver received an error that is not an metav1.Status: error dialing backend: dial tcp 10.145.194.6:10250: getsockopt: connection refused
Jan 19 07:42:12 master0 atomic-openshift-master-api[103923]: E0119 07:42:12.303571       1 status.go:62] apiserver received an error that is not an metav1.Status: error dialing backend: dial tcp 10.145.194.6:10250: getsockopt: connection refused

Version

openshift version

openshift v3.6.173.0.83
kubernetes v1.6.1+5115d708d7
etcd 3.2.1

Steps To Reproduce

Not sure. Probably multiple reboot of Openshift Masters and Nodes

Additional Information

Output of oc adm diagnostics
https://gist.github.com/rathinikunj/eb4130d75f49375f871e3092efea44b2

Output of oc command with --loglevel=8
https://gist.github.com/rathinikunj/4ab02f9b3e05feec8e997488a278646e

componencli kinquestion prioritP2

Source

rathinikunj

Most helpful comment

Hi @soltysh

Yep, you are right, i had a faulty iptables entry injected, which was making all the traffic to port 10250 go to that app instead of to the api endpoint.

Thanks a lot for suggesting to verify the firewall settings.

We can close this issue now.

rathinikunj on 30 Jan 2018

🎉3

All 7 comments

what is the output for below command(as system:admin user):

oc get pods -n default
oc get route -n default
oc get svc -n default

aizuddin85 on 19 Jan 2018

Hi @aizuddin85 ,

Thanks for looking in to the issue. Here are the requested outputs.

oc get pods -n default

NAME                       READY     STATUS    RESTARTS   AGE
docker-registry-1-ztpnz    1/1       Running   4          30d
registry-console-1-jwvh8   1/1       Running   2          30d
router-1-4kzbw             1/1       Running   2          30d
router-1-jhkwx             1/1       Running   2          30d
router-1-qs9q8             1/1       Running   2          30d
router-1-vrs3n             1/1       Running   2          30d

oc get route -n default

NAME               HOST/PORT                                                   PATH      SERVICES           PORT      TERMINATION   WILDCARD
docker-registry    docker-registry-default.router.default.svc.cluster.local              docker-registry    <all>     passthrough   None
registry-console   registry-console-default.router.default.svc.cluster.local             registry-console   <all>     passthrough   None

oc get svc -n default

NAME               CLUSTER-IP       EXTERNAL-IP   PORT(S)                   AGE
docker-registry    172.30.207.203   <none>        5000/TCP                  30d
kubernetes         172.30.0.1       <none>        443/TCP,53/UDP,53/TCP     30d
registry-console   172.30.248.146   <none>        9000/TCP                  30d
router             172.30.85.228    <none>        80/TCP,443/TCP,1936/TCP   30d

rathinikunj on 19 Jan 2018

It seems likely that node hostnames cannot be resolved by the apiserver.
Are you able to perform nslookup on all of your nodes?

Related upstream issue:
https://github.com/kubernetes/kubernetes/issues/39026

cc @soltysh in case you have additional input

juanvallejo on 24 Jan 2018

👍1

error dialing backend: dial tcp 10.145.194.6:10250: getsockopt: connection refused

Clearly states the apiserver cannot reach the node on which the pod is running. The other option would be to verify firewall settings, see https://docs.openshift.com/container-platform/3.6/install_config/install/prerequisites.html#required-ports

soltysh on 29 Jan 2018

@juanvallejo hostnames are all resolvable from all the nodes.

@soltysh I do have an iptables entry that allows port 10250, but i do not see the counters increasing for that rule at all. It is in 'OS_FIREWALL_ALLOW' chain.

iptables -nvL

Chain OS_FIREWALL_ALLOW (1 references)
 pkts bytes target     prot opt in     out     source               destination
    0     0 ACCEPT     tcp  --  *      *       0.0.0.0/0            0.0.0.0/0            state NEW tcp dpt:10250
 1245 74700 ACCEPT     tcp  --  *      *       0.0.0.0/0            0.0.0.0/0            state NEW tcp dpt:80
 1245 74700 ACCEPT     tcp  --  *      *       0.0.0.0/0            0.0.0.0/0            state NEW tcp dpt:443
    0     0 ACCEPT     udp  --  *      *       0.0.0.0/0            0.0.0.0/0            state NEW udp dpt:4789

rathinikunj on 30 Jan 2018

@soltysh I do have an iptables entry that allows port 10250, but i do not see the counters increasing for that rule at all. It is in 'OS_FIREWALL_ALLOW' chain.

It looks like some configuration issue on your end in that case. Not sure where exactly, since I'm not a networking expert, but I'd try to verify if you can reach those ports from the api server (or all of them, if you have > 1).

soltysh on 30 Jan 2018

Hi @soltysh

Yep, you are right, i had a faulty iptables entry injected, which was making all the traffic to port 10250 go to that app instead of to the api endpoint.

Thanks a lot for suggesting to verify the firewall settings.

We can close this issue now.

rathinikunj on 30 Jan 2018

🎉3

Was this page helpful?

0 / 5 - 0 ratings