On trying to execute a remote command on one of the pods using 'oc exec', the command fails with below errors:
oc exec -n pr-004 pr-004-dc-000-2-5sbn2 'ls'
Error from server: error dialing backend: dial tcp 10.145.194.6:10250: getsockopt: connection refused
Jan 19 07:42:12 master0 dockerd-current[103731]: E0119 07:42:12.303571 1 status.go:62] apiserver received an error that is not an metav1.Status: error dialing backend: dial tcp 10.145.194.6:10250: getsockopt: connection refused
Jan 19 07:42:12 master0 atomic-openshift-master-api[103923]: E0119 07:42:12.303571 1 status.go:62] apiserver received an error that is not an metav1.Status: error dialing backend: dial tcp 10.145.194.6:10250: getsockopt: connection refused
openshift v3.6.173.0.83
kubernetes v1.6.1+5115d708d7
etcd 3.2.1
Not sure. Probably multiple reboot of Openshift Masters and Nodes
Output of oc adm diagnostics
https://gist.github.com/rathinikunj/eb4130d75f49375f871e3092efea44b2
Output of oc command with --loglevel=8
https://gist.github.com/rathinikunj/4ab02f9b3e05feec8e997488a278646e
what is the output for below command(as system:admin user):
oc get pods -n default
oc get route -n default
oc get svc -n default
Hi @aizuddin85 ,
Thanks for looking in to the issue. Here are the requested outputs.
oc get pods -n default
NAME READY STATUS RESTARTS AGE
docker-registry-1-ztpnz 1/1 Running 4 30d
registry-console-1-jwvh8 1/1 Running 2 30d
router-1-4kzbw 1/1 Running 2 30d
router-1-jhkwx 1/1 Running 2 30d
router-1-qs9q8 1/1 Running 2 30d
router-1-vrs3n 1/1 Running 2 30d
oc get route -n default
NAME HOST/PORT PATH SERVICES PORT TERMINATION WILDCARD
docker-registry docker-registry-default.router.default.svc.cluster.local docker-registry <all> passthrough None
registry-console registry-console-default.router.default.svc.cluster.local registry-console <all> passthrough None
oc get svc -n default
NAME CLUSTER-IP EXTERNAL-IP PORT(S) AGE
docker-registry 172.30.207.203 <none> 5000/TCP 30d
kubernetes 172.30.0.1 <none> 443/TCP,53/UDP,53/TCP 30d
registry-console 172.30.248.146 <none> 9000/TCP 30d
router 172.30.85.228 <none> 80/TCP,443/TCP,1936/TCP 30d
It seems likely that node hostnames cannot be resolved by the apiserver.
Are you able to perform nslookup on all of your nodes?
Related upstream issue:
https://github.com/kubernetes/kubernetes/issues/39026
cc @soltysh in case you have additional input
error dialing backend: dial tcp 10.145.194.6:10250: getsockopt: connection refused
Clearly states the apiserver cannot reach the node on which the pod is running. The other option would be to verify firewall settings, see https://docs.openshift.com/container-platform/3.6/install_config/install/prerequisites.html#required-ports
@juanvallejo hostnames are all resolvable from all the nodes.
@soltysh I do have an iptables entry that allows port 10250, but i do not see the counters increasing for that rule at all. It is in 'OS_FIREWALL_ALLOW' chain.
iptables -nvL
Chain OS_FIREWALL_ALLOW (1 references)
pkts bytes target prot opt in out source destination
0 0 ACCEPT tcp -- * * 0.0.0.0/0 0.0.0.0/0 state NEW tcp dpt:10250
1245 74700 ACCEPT tcp -- * * 0.0.0.0/0 0.0.0.0/0 state NEW tcp dpt:80
1245 74700 ACCEPT tcp -- * * 0.0.0.0/0 0.0.0.0/0 state NEW tcp dpt:443
0 0 ACCEPT udp -- * * 0.0.0.0/0 0.0.0.0/0 state NEW udp dpt:4789
@soltysh I do have an iptables entry that allows port 10250, but i do not see the counters increasing for that rule at all. It is in 'OS_FIREWALL_ALLOW' chain.
It looks like some configuration issue on your end in that case. Not sure where exactly, since I'm not a networking expert, but I'd try to verify if you can reach those ports from the api server (or all of them, if you have > 1).
Hi @soltysh
Yep, you are right, i had a faulty iptables entry injected, which was making all the traffic to port 10250 go to that app instead of to the api endpoint.
Thanks a lot for suggesting to verify the firewall settings.
We can close this issue now.
Most helpful comment
Hi @soltysh
Yep, you are right, i had a faulty iptables entry injected, which was making all the traffic to port 10250 go to that app instead of to the api endpoint.
Thanks a lot for suggesting to verify the firewall settings.
We can close this issue now.