Origin: no connectivity to outside from openshift containers

Created on 12 Jul 2016 · 20Comments · Source: openshift/origin

openshift v1.3.0-alpha.2+e7d0a44
kubernetes v1.3.0+57fb9ac
etcd 2.3.0+git

Client:
Version: 1.10.3
API version: 1.22
Package version: docker-1.10.3-21.git19b5791.fc24.x86_64
Go version: go1.6.2
Git commit: 19b5791/1.10.3
Built:
OS/Arch: linux/amd64

Server:
Version: 1.10.3
API version: 1.22
Package version: docker-1.10.3-21.git19b5791.fc24.x86_64
Go version: go1.6.2
Git commit: 19b5791/1.10.3
Built:
OS/Arch: linux/amd64

Whenever creating a new build with new-app the build fails because containers cannot reach the outside world.
I can't shake this issue with any solutions. tried various versions of docker from 1.9 to 1.12.
i tried formatting my laptop, twice. i can even reproduce it on a digital ocean droplet with a fedora 23/24 image. i can even reproduce this on ubuntu 16.04

this is with oc cluster up

[ipalade@openshift-lab ~]$ oc logs -f ruby-sample-build-1-build
Downloading "https://github.com/openshift/ruby-hello-world.git" ...
error: build error: fatal: unable to access 'https://github.com/openshift/ruby-hello-world.git/': Could not resolve host: github.com; Unknown error
[ipalade@openshift-lab ~]$ sudo iptables -F -t nat
[sudo] password for ipalade: 
[ipalade@openshift-lab ~]$ sudo iptables -F 
[ipalade@openshift-lab ~]$ oc start-build ruby-sample-build 
ruby-sample-build-2
[ipalade@openshift-lab ~]$ oc logs -f ruby-sample-build-2-build
Downloading "https://github.com/openshift/ruby-hello-world.git" ...
WARNING: timed out waiting for git server, will wait 1m4s
WARNING: timed out waiting for git server, will wait 4m16s

running native

[ipalade@openshift-lab ~]$ oc new-app -f openshift/origin/examples/sample-app/application-template-stibuild.json 
--> Deploying template ruby-helloworld-sample for "openshift/origin/examples/sample-app/application-template-stibuild.json"
     With parameters:
      ADMIN_USERNAME=adminJF5 # generated
      ADMIN_PASSWORD=vy5ecEf7 # generated
      MYSQL_USER=userHO2 # generated
      MYSQL_PASSWORD=AMUW5wCQ # generated
      MYSQL_DATABASE=root
--> Creating resources with label app=ruby-sample-build ...
    service "frontend" created
    route "route-edge" created
    imagestream "origin-ruby-sample" created
    imagestream "ruby-22-centos7" created
    buildconfig "ruby-sample-build" created
    deploymentconfig "frontend" created
    service "database" created
    deploymentconfig "database" created
--> Success
    Build scheduled, use 'oc logs -f bc/ruby-sample-build' to track its progress.
    Run 'oc status' to view your app.
[ipalade@openshift-lab ~]$ oc logs -f bc/ruby-sample-build
[ipalade@openshift-lab ~]$ oc logs -f bc/ruby-sample-build
Using locally available image "centos/ruby-22-centos7@sha256:221fa430a2f..."
Using locally available image "centos/ruby-22-centos7@sha256:221fa430a2f..."
Using locally available image "centos/ruby-22-centos7@sha256:221fa430a2f..."
Image sha256:34621bd05a7e000fb012da21c85e0e270ef7962bc3fe1b50e97bce2a80e52c51 contains io.openshift.s2i.scripts-url set to "image:///usr/libexec/s2i"
I0711 22:59:02.499259       1 sti.go:142] Preparing to build demo/ruby-sample-build-1:0a7c7e01
Downloading "https://github.com/openshift/ruby-hello-world.git" ...
WARNING: timed out waiting for git server, will wait 1m4s
WARNING: timed out waiting for git server, will wait 4m16s

turning off firewalld doesn't solve the issue on native.
vagrant also doesn't work.

on digitalocean it starts happening after i install firewalld, but stops after i remove it completely from the system - tested with the latest release https://github.com/openshift/origin/releases/tag/v1.3.0-alpha.2

i believe it's an iptables rule that does this, but i can't figure out what it is.

iptables -L -t nat dump here

and without nat here

dumps before i flushed the iptables rule: here

componennetworking kinquestion prioritP2

Source

PI-Victor

All 20 comments

Can you please run the debug.sh script referenced at:
https://docs.openshift.com/enterprise/3.2/admin_guide/sdn_troubleshooting.html#further-help
And get me the tar that generates.

knobunc on 12 Jul 2016

these dumps are from a digital ocean fedora 24 image running the release above. i hope they're accurate. i've definitely think this is a firewalld issue

https://dl.dropboxusercontent.com/u/50993775/openshift-sdn-debug-2016-07-12_no_firewalld.tgz - this is the dump with firewalld not present.

https://dl.dropboxusercontent.com/u/50993775/openshift-sdn-debug-2016-07-12_with_firewalld.tgz this is the one with firewalld installed where connection from the containers isn't working.

and this is a raw paste of me building successfully before installing firewalld https://paste.fedoraproject.org/390287/14683374/
and afterwards failing.

PI-Victor on 12 Jul 2016

gonna close this, since the issues here were largely caused by my firewall and i haven't encountered them since.

PI-Victor on 13 Oct 2016

I seem to be getting the same issue with Openshift Origin on my mac. apparently the pods which run the build config can't seem to get to github. The router pod can get there though.

oc logs -f bc/emailsvc
Cloning "https://github.com/debianmaster/microservices-on-openshift.git" ...
WARNING: timed out waiting for git server, will wait 1m4s

What's the resolution for this as this seems to be a blocker for deploying apps into openshift origin. thanks for your help

akuntamukkala on 6 May 2017

Cool. Resolved after reading instructions 👍
vagrant version
Installed Version: 1.8.4

vboxmanage -version
5.0.4r102546

oc version
oc v1.4.1+3f9807a
kubernetes v1.4.0+776c994
features: Basic-Auth

Server https://10.2.2.2:8443
openshift v1.3.0
kubernetes v1.3.0+52492b4

vagrant init openshift/origin-all-in-one; vagrant up --provider virtualbox

akuntamukkala on 6 May 2017

@akuntamukkala
How did you resolve timeout issue, I too get same error.
My env is also same:
=>vagrant version
Installed Version: 1.8.4
Latest Version: 1.9.5
=> oc version
oc v1.5.0+031cbe4
kubernetes v1.5.2+43a9be4
features: Basic-Auth

Server https://10.2.2.2:8443
openshift v1.3.0
kubernetes v1.3.0+52492b4

anandintouch on 19 May 2017

I tried using previous version of openshift as below and it Worked. NOt getting any timeout error.
But wondering how to resolve issue with V1.3.0 ? Help is appreciated.

Adding box 'thesteve0/openshift-origin' (v1.2.0) for provider: virtualbox
=> oc version
oc v1.5.0+031cbe4
kubernetes v1.5.2+43a9be4
features: Basic-Auth

Server https://10.2.2.2:8443
kubernetes v1.2.0-36-g4a3f9c5

anandintouch on 19 May 2017

works with
vagrant version
Installed Version: 1.8.4

vboxmanage -version
5.0.4r102546
Thanks!

On May 19, 2017 2:01 AM, "anandintouch" notifications@github.com wrote:

@akuntamukkala https://github.com/akuntamukkala
How did you resolve timeout issue, I too get same error.
My env is also same:
=>vagrant version
Installed Version: 1.8.4
Latest Version: 1.9.5
=> oc version
oc v1.5.0+031cbe4
kubernetes v1.5.2+43a9be4
features: Basic-Auth

Server https://10.2.2.2:8443
openshift v1.3.0
kubernetes v1.3.0+52492b4

—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
https://github.com/openshift/origin/issues/9789#issuecomment-302625610,
or mute the thread
https://github.com/notifications/unsubscribe-auth/ABf-1jza-8aMat2LPxgx6I2ImRF-kCb8ks5r7T5dgaJpZM4JJ44Q
.

akuntamukkala on 19 May 2017

I'm having this same issue with Origin. The results of https://raw.githubusercontent.com/openshift/openshift-sdn/master/hack/debug.sh are at http://ge.tt/9YefYyl2

davejlong on 28 Jul 2017

For my case, no outbound requests or name resolution can be done from the container itself:

========================

[root@~]# oc exec "PODname" -it bash
bash-4.2$
bash-4.2$
bash-4.2$ ping github.com
ping: github.com: Name or service not known
bash-4.2$

========================

My environment sits on AWS EC2 platform. Checked all outbound security rules. Nothing should stop outbound stuff. Im guessing resolver in the OpenSHift configs.
~

melvz on 7 Sep 2017

👍1

i would encourage you to open a new issue since the conversation here might not be seen be the right person.

PI-Victor on 7 Sep 2017

thanks @PI-Victor

Anyway.. i resolved it by NOT using this parameter in my Ansible inv_file:

# Configure dnsIP in the node config
#openshift_dns_ip=172.30.0.1

By "not fprcing" the AWS instance to use the prescribed dnsIP, the AWS env will take its own course to use its own recommended resolver.

melvz on 10 Oct 2017

Replying here because it's a top hit for search.

The issue I was having was that the domain name for my cluster was appended into /etc/resolv.conf on the 'search' line. This was causing dns to attempt to resolve external requests as github.com.mydomain.com.

Removing the domain from the node's resolv.conf and restarting the node service allowed new pods to resolve dns properly.

michaelgugino on 18 Feb 2018

Replying here because it's a top hit for search.

Removing the domain from the node's resolv.conf and restarting the node service allowed new pods to resolve dns properly.

michaelgugino on 18 Feb 2018

👍1

Has this problem been solved?
I also encountered this problem.

The DNS resolution of the physical machine itself is normal.
`[root@master ~]# curl https://github.com/sclorg/cakephp-ex/|more
% Total % Received % Xferd Average Speed Time Time Time Current
Dload Upload Total Spent Left Speed
0 0 0 0 0 0 0 0 --:--:-- --:--:-- --:--:-- 0

bug build time exception:
`

Cloning "https://github.com/sclorg/cakephp-ex " ...

| WARNING: timed out waiting for git server, will wait 1m4s
| WARNING: timed out waiting for git server, will wait 4m16s
| WARNING: timed out waiting for git server, will wait 17m4s
| error: fatal: unable to access 'https://github.com/sclorg/cakephp-ex/': Failed connect to github.com:443; Operation now in progress

kasonbin on 7 Sep 2018

I experienced this same issue in Openshift 3.10 when I ran oc cluster up
and tried the wildfly example.

On Fri, 7 Sep 2018 at 12:51, kasonbin notifications@github.com wrote:

Has this problem been solved?
I also encountered this problem.

The DNS resolution of the physical machine itself is normal.
[root@master ~]# curl https://github.com/sclorg/cakephp-ex/|more % Total % Received % Xferd Average Speed Time Time Time Current Dload Upload Total Spent Left Speed 0 0 0 0 0 0 0 0 --:--:-- --:--:-- --:--:-- 0

bug build time exception:
`
Cloning "https://github.com/sclorg/cakephp-ex " ...

| WARNING: timed out waiting for git server, will wait 1m4s
| WARNING: timed out waiting for git server, will wait 4m16s
| WARNING: timed out waiting for git server, will wait 17m4s
| error: fatal: unable to access 'https://github.com/sclorg/cakephp-ex/':
Failed connect to github.com:443; Operation now in progress

`

—
You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub
https://github.com/openshift/origin/issues/9789#issuecomment-419286159,
or mute the thread
https://github.com/notifications/unsubscribe-auth/ACEpHSoPWIXdsAV377fmkMBk1gTmrBaFks5uYcMggaJpZM4JJ44Q
.

magick93 on 7 Sep 2018

👍1

I also run into this issue with any sample app from the console using ubuntu

javiramos1 on 15 Jan 2019

I'm getting this issue after a fresh install / oc cluster up and trying the wildfly example project.

i see lots of these in the /var/log/messages, but I'm not sure if they are related or not

Feb 28 12:37:17 f2x-okd journal[6578]: I0228 17:37:17.457845       1 logs.go:49] skydns: failure to forward request "read udp 172.17.0.3:56729->192.168.100.1:53: i/o timeout"
Feb 28 12:37:17 f2x-okd journal[6578]: I0228 17:37:17.457946       1 logs.go:49] skydns: failure to forward request "read udp 172.17.0.3:34355->192.168.100.1:53: i/o timeout"