Openshift-ansible: Cannot login to registry from Master

Created on 14 Jun 2016  路  26Comments  路  Source: openshift/openshift-ansible

Hello all I was looking at https://github.com/openshift/openshift-ansible/issues/632 and I am having a similar issue. My setup is one master and one node. I would like to be able to do a docker login from the master to the node where the registry is running but I keep getting this error.

Error response from daemon: invalid registry endpoint "http://172.30.58.204:5000/v0/". HTTPS attempt: unable to ping registry endpoint https://172.30.58.204:5000/v0/ v2 ping attempt failed with error: Get https://172.30.58.204:5000/v2/: dial tcp 172.30.58.204:5000: i/o timeout v1 ping attempt failed with error: Get https://172.30.58.204:5000/v1/_ping: dial tcp 172.30.58.204:5000: i/o timeout. HTTP attempt: unable to ping registry endpoint http://172.30.58.204:5000/v0/ v2 ping attempt failed with error: Get http://172.30.58.204:5000/v2/: dial tcp 172.30.58.204:5000: i/o timeout v1 ping attempt failed with error: Get http://172.30.58.204:5000/v1/_ping: dial tcp 172.30.58.204:5000: i/o timeout

I can log into the registry from the node though just not the master.

kinrfe prioritP2

Most helpful comment

You thought I was going to disappear without ever posting a solution, didn't you? How could you every accuse me of something so terrible? I've narrowed it down to one of the following sysctl params:

net.ipv4.conf.default.accept_redirects = 0
net.ipv4.conf.all.accept_redirects = 0
net.ipv4.conf.default.send_redirects = 0
net.ipv4.conf.all.send_redirects = 0
net.ipv4.conf.all.secure_redirects = 0
net.ipv4.conf.default.secure_redirects = 0

Setting these all to '1' fixed the problem... only have to narrow it down to one now.

All 26 comments

Hey @irvingwa, is SDN traffic allowed between your master and node (4789 UDP)?

Yup.

From the master:

Chain OS_FIREWALL_ALLOW (1 references)
ACCEPT udp -- anywhere anywhere state NEW udp dpt:4789

From the node:

Chain OS_FIREWALL_ALLOW (1 references)
ACCEPT udp -- anywhere anywhere state NEW udp dpt:4789

@irvingwa Is there a firewall between these two systems? Are they in a cloud environment where a security group or network ACL could be interfering?

No firewall running. No security group or network ACL.

Is your master also configured as a node and set to unscheduleable? I'm assuming it is since 4789/udp has been added to iptables on the master.

If everything appears to be in order with networking you should walk through the SDN troubleshooting guide.

The master is set to READY. I also looked at the firewalld log and am getting some errors. Does something try and add iptable rules like '/sbin/iptables -w2 -t filter -I IN_public_allow 1 -m tcp -p tcp -m limit --limit 25/minute --limit-burst 100 -j ACCEPT' ? If so that is not where my iptables lives. Mine is under /usr/sbin/iptables.

is there any sort of dependency on iptables forwarding for openshift networking that you know of?

@abutcher sorry one last question. Looking at that trouble shooting dock when I run ip route and I dont see any 10.128.x.x lines. I am thinking this might be the issue.

Are there any relevant errors in your node logs?

There are a good amount of errors in there. And I dont see any Output of setup script: in the log anywhere.
Output.txt

Should the ovs-ofctl command be on my nodes?

@irvingwa Yes, ovs-ofctl will be available on nodes.

From the logs, it looks like there are some issues with DNS timeouts. Is port 53/udp open on your master? I saw one other timeout talking to etcd on port 2379/tcp.

Both are open. I ran the debug script on my master and when it ran on my node with the registry it printed:

Could not find port for 10.1.0.2!

Could you post the full output from the debug script?

https://www.dropbox.com/s/a1sij9rkk5iehz2/debug.tar?dl=0

Sorry this setup is a little different. 1 master and 2 nodes. Registry is on Node 2 and Node 1 cant login to it.

Edit: Sorry for changing the setup it seemed that most people were doing it that way.

I am experiencing the same issue as @irvingwa . I am performing a containerized installation with the latest version of openshift-ansible. My environment is as follows:

  1. RHEL 7.2
  2. Docker 1.9.1-40
  3. One non-schedulable master + node
  4. One schedulable node
  5. (Non-SDN) networking between the two servers is established, and there is no firewall other than iptables.
  6. I have tried with both v 1.1.6 of the origin images, as well as the very latest, with the same results.

My registry is successfully deployed on the node and 'oc get service' shows it listening on the 172.30.0.0/16 network. On the node where the registry is deployed, I can telnet into something like 172.30.71.188:5000. I cannot do so from the master, though. traceroute seems to show the traffic just dying at the 10.1.0.1 interface on the master.

I have executed the debug.sh script as well, but like @irvingwa , I receive a "Could not find port for 10.1.1.3!" error. Regardless, the output is available at the link below:

https://www.dropbox.com/s/2aqcgsqysu8f39p/openshift-sdn-debug-2016-06-21.tgz?dl=0

Should docker0 be listed when I do an ip route? It seems to be missing.

I can log into the registry from the node though just not the master.

The master doesn't automatically have access to the SDN; in order to be able to access pods, the master needs to also be made an (unschedulable) OpenShift node. The ansible install handles this automatically; how did you install this cluster?

Yes I marked the node as unscheduable.

No, in the debug output you provided, the master host is not running atomic-openshift-node

Ya Sorry, when it was not working I switched my setup up (1 master and 2 nodes). Thinking it was something to do with the master. The registry is on node 2 and I can't login from node 1.

I have discovered that I can only reproduce this problem using a specific RHEL VM template; one that has security lockdowns. This is likely what @irvingwa is experiencing as well (we are coworkers). I am in the process of trying to identify what the offending configuration is and I hope to post it here for posterity.

You thought I was going to disappear without ever posting a solution, didn't you? How could you every accuse me of something so terrible? I've narrowed it down to one of the following sysctl params:

net.ipv4.conf.default.accept_redirects = 0
net.ipv4.conf.all.accept_redirects = 0
net.ipv4.conf.default.send_redirects = 0
net.ipv4.conf.all.send_redirects = 0
net.ipv4.conf.all.secure_redirects = 0
net.ipv4.conf.default.secure_redirects = 0

Setting these all to '1' fixed the problem... only have to narrow it down to one now.

When we get a pre-requisites playbook in place, we should test for the problematic sysctl(s).

So, this is no longer an issue for us and I wish I had a cleaner answer as to why, but I think this might remain a mystery of our wacky environment. We did the following and can no longer reproduce the issue (and no longer need to tinker with kernel params):

  1. Updated openshift-ansible to the latest (from a version that was from a few months ago)
  2. Installed Docker manually instead of via Puppet module on the master/node

Prior to doing the above, tinkering with kernel params got past the problem. My best guess is that something about how we were installing Docker + how we locked down our VMs was a problem.

If it somehow helps someone in the future, we were using the garethr/docker Puppet module with near-vanilla settings to install Docker plus the kernel parameters in my previous post (set to =1 instead of =0).

Thanks for your help, gents.

@finerm thanks for the follow up, I'll go ahead and close out this issue for now.

Was this page helpful?
0 / 5 - 0 ratings