Origin: oc exec command gives an error

Created on 19 Jan 2018  路  7Comments  路  Source: openshift/origin

On trying to execute a remote command on one of the pods using 'oc exec', the command fails with below errors:

Command Executed:
oc exec -n pr-004 pr-004-dc-000-2-5sbn2 'ls'
Console Error:
Error from server: error dialing backend: dial tcp 10.145.194.6:10250: getsockopt: connection refused
Journalctl Error:
Jan 19 07:42:12 master0 dockerd-current[103731]: E0119 07:42:12.303571       1 status.go:62] apiserver received an error that is not an metav1.Status: error dialing backend: dial tcp 10.145.194.6:10250: getsockopt: connection refused
Jan 19 07:42:12 master0 atomic-openshift-master-api[103923]: E0119 07:42:12.303571       1 status.go:62] apiserver received an error that is not an metav1.Status: error dialing backend: dial tcp 10.145.194.6:10250: getsockopt: connection refused
Version
openshift version
openshift v3.6.173.0.83
kubernetes v1.6.1+5115d708d7
etcd 3.2.1
Steps To Reproduce

Not sure. Probably multiple reboot of Openshift Masters and Nodes

Additional Information

Output of oc adm diagnostics
https://gist.github.com/rathinikunj/eb4130d75f49375f871e3092efea44b2

Output of oc command with --loglevel=8
https://gist.github.com/rathinikunj/4ab02f9b3e05feec8e997488a278646e

componencli kinquestion prioritP2

Most helpful comment

Hi @soltysh

Yep, you are right, i had a faulty iptables entry injected, which was making all the traffic to port 10250 go to that app instead of to the api endpoint.

Thanks a lot for suggesting to verify the firewall settings.

We can close this issue now.

All 7 comments

what is the output for below command(as system:admin user):

oc get pods -n default
oc get route -n default
oc get svc -n default

Hi @aizuddin85 ,

Thanks for looking in to the issue. Here are the requested outputs.

oc get pods -n default

NAME                       READY     STATUS    RESTARTS   AGE
docker-registry-1-ztpnz    1/1       Running   4          30d
registry-console-1-jwvh8   1/1       Running   2          30d
router-1-4kzbw             1/1       Running   2          30d
router-1-jhkwx             1/1       Running   2          30d
router-1-qs9q8             1/1       Running   2          30d
router-1-vrs3n             1/1       Running   2          30d

oc get route -n default

NAME               HOST/PORT                                                   PATH      SERVICES           PORT      TERMINATION   WILDCARD
docker-registry    docker-registry-default.router.default.svc.cluster.local              docker-registry    <all>     passthrough   None
registry-console   registry-console-default.router.default.svc.cluster.local             registry-console   <all>     passthrough   None

oc get svc -n default

NAME               CLUSTER-IP       EXTERNAL-IP   PORT(S)                   AGE
docker-registry    172.30.207.203   <none>        5000/TCP                  30d
kubernetes         172.30.0.1       <none>        443/TCP,53/UDP,53/TCP     30d
registry-console   172.30.248.146   <none>        9000/TCP                  30d
router             172.30.85.228    <none>        80/TCP,443/TCP,1936/TCP   30d

It seems likely that node hostnames cannot be resolved by the apiserver.
Are you able to perform nslookup on all of your nodes?

Related upstream issue:
https://github.com/kubernetes/kubernetes/issues/39026

cc @soltysh in case you have additional input

error dialing backend: dial tcp 10.145.194.6:10250: getsockopt: connection refused

Clearly states the apiserver cannot reach the node on which the pod is running. The other option would be to verify firewall settings, see https://docs.openshift.com/container-platform/3.6/install_config/install/prerequisites.html#required-ports

@juanvallejo hostnames are all resolvable from all the nodes.

@soltysh I do have an iptables entry that allows port 10250, but i do not see the counters increasing for that rule at all. It is in 'OS_FIREWALL_ALLOW' chain.

iptables -nvL

Chain OS_FIREWALL_ALLOW (1 references)
 pkts bytes target     prot opt in     out     source               destination
    0     0 ACCEPT     tcp  --  *      *       0.0.0.0/0            0.0.0.0/0            state NEW tcp dpt:10250
 1245 74700 ACCEPT     tcp  --  *      *       0.0.0.0/0            0.0.0.0/0            state NEW tcp dpt:80
 1245 74700 ACCEPT     tcp  --  *      *       0.0.0.0/0            0.0.0.0/0            state NEW tcp dpt:443
    0     0 ACCEPT     udp  --  *      *       0.0.0.0/0            0.0.0.0/0            state NEW udp dpt:4789

@soltysh I do have an iptables entry that allows port 10250, but i do not see the counters increasing for that rule at all. It is in 'OS_FIREWALL_ALLOW' chain.

It looks like some configuration issue on your end in that case. Not sure where exactly, since I'm not a networking expert, but I'd try to verify if you can reach those ports from the api server (or all of them, if you have > 1).

Hi @soltysh

Yep, you are right, i had a faulty iptables entry injected, which was making all the traffic to port 10250 go to that app instead of to the api endpoint.

Thanks a lot for suggesting to verify the firewall settings.

We can close this issue now.

Was this page helpful?
0 / 5 - 0 ratings