origin 🚀 - Unable to pull from registry

@deanpeterson

Does the ip addresses of the registry in the push from the build logs match the ip address from the pull in the deployment?

sdodson on 23 Jul 2015

Yes. 172.30.13.89:5000 is the ip and port of the registry in my default namespace. I can see in the build logs the image was successfully pushed to 172.30.13.89:5000

deanpeterson on 23 Jul 2015

Here is the build log:

Using bundler (1.3.5)
I0722 20:35:37.777526 1 sti.go:388] Your bundle is complete!
I0722 20:35:37.777579 1 sti.go:388] It was installed into ./bundle
I0722 20:35:37.811400 1 sti.go:388] ---> Cleaning up unused ruby gems
I0722 20:35:42.736725 1 sti.go:131] Using provided push secret for pushing 172.30.13.89:5000/test/origin-ruby-sample image
I0722 20:35:42.736761 1 sti.go:134] Pushing 172.30.13.89:5000/test/origin-ruby-sample image ...
I0722 20:36:47.613637 1 sti.go:138] Successfully pushed 172.30.13.89:5000/test/origin-ruby-sample

And this is the deployment event error:

Failed to pull image "172.30.13.89:5000/test/origin-ruby-sample@sha256:d7e1ed4818f45fc14a6ea98c622fddd81e2ab36caad8c94850952d8d150a2952": API error (500): v1 ping attempt failed with error: Get http://172.30.13.89:5000/v1/_ping: read tcp 172.30.13.89:5000: connection reset by peer

deanpeterson on 23 Jul 2015

And this is my current running registry in the default namespace:

openshift]# ./oc get -n default se/docker-registry
NAME LABELS SELECTOR IP(S) PORT(S)
docker-registry docker-registry=default docker-registry=default 172.30.13.89 5000/TCP
[root@localhost openshift]#

deanpeterson on 23 Jul 2015

@rajatchopra @ramr @pravisankar @knobunc can one of you take a look at this from a networking perspective?

bparees on 23 Jul 2015

Is the instance still available? Or does it reproduce reliably? I'd be interested if anything in the iptables was messing you up...

knobunc on 24 Jul 2015

Yes, the instance is still available. It is a two physical node setup at work. I will be going back to work in a few minutes. What should I look for on the iptables? I received this on the openshift user's list last night so I will be running these commands to get more information when I get back in:

Hi,

Could you please show us the following command output?

curl -v oc get services | grep registry | awk '{print $4":"$5}/v2/' | sed 's,/[^/]\+$,/v2/,'
oc describe service docker-registry
oc describe pod oc get pod | grep docker-registry | awk '{print $1}'
oc status

_NOTE_ Please run them with default project. You can use it with '#oc project default' or 'oc -n default ....'.

Thanks,
Kenjiro

deanpeterson on 24 Jul 2015

If you can get the output from: iptables -L

On Fri, Jul 24, 2015 at 9:12 AM, deanpeterson [email protected]
wrote:

Yes, the instance is still available. It is a two physical node setup at
work. I will be going back to work in a few minutes. What should I look for
on the iptables? I received this on the openshift user's list last night so
I will be running these commands to get more information when I get back in:

Hi,

Could you please show us the following command output?
curl -v oc get services | grep registry | awk '{print $4":"$5}/v2/' | sed
's,/[^/]+$,/v2/,' oc describe service docker-registry oc describe pod oc
get pod | grep docker-registry | awk '{print $1}' oc status

_NOTE_ Please run them with default project. You can use it with '#oc
project default' or 'oc -n default ....'.

Thanks,
Kenjiro

—
Reply to this email directly or view it on GitHub
https://github.com/openshift/origin/issues/3859#issuecomment-124515734.

knobunc on 24 Jul 2015

[root@localhost openshift]# iptables -L
Chain INPUT (policy ACCEPT)
target prot opt source destination

Chain FORWARD (policy ACCEPT)
target prot opt source destination
DOCKER all -- anywhere anywhere
ACCEPT all -- anywhere anywhere ctstate RELATED,ESTABLISHED
ACCEPT all -- anywhere anywhere
ACCEPT all -- anywhere anywhere

Chain OUTPUT (policy ACCEPT)
target prot opt source destination

Chain DOCKER (1 references)
target prot opt source destination

deanpeterson on 24 Jul 2015

Wow. Clean rules... that's clearly not the problem, unless that's a VM and
the host machine has something?

On Fri, Jul 24, 2015 at 9:49 AM, deanpeterson [email protected]
wrote:

[root@localhost openshift]# iptables -L
Chain INPUT (policy ACCEPT)
target prot opt source destination

Chain FORWARD (policy ACCEPT)
target prot opt source destination

DOCKER all -- anywhere anywhere

ACCEPT all -- anywhere anywhere ctstate RELATED,ESTABLISHED
ACCEPT all -- anywhere anywhere

ACCEPT all -- anywhere anywhere

Chain OUTPUT (policy ACCEPT)
target prot opt source destination

Chain DOCKER (1 references)
target prot opt source destination

—
Reply to this email directly or view it on GitHub
https://github.com/openshift/origin/issues/3859#issuecomment-124527667.

knobunc on 24 Jul 2015

No, that is directly on the physical host machine. I have the binary all in one running directly on the host and I have a second node in the ready state on another physical machine. The iptable rules are the same on the second physical node.

deanpeterson on 24 Jul 2015

What was the outcome of the commands they suggested?

Are your private registry and the other machine directly connected?

Thanks for helping to debug this...

-ben

On Fri, Jul 24, 2015 at 10:00 AM, deanpeterson [email protected]
wrote:

No, that is directly on the physical host machine. I have the binary all
in one running directly on the host and I have a second node in the ready
state on another physical machine. The iptable rules are the same on the
second physical node.

—
Reply to this email directly or view it on GitHub
https://github.com/openshift/origin/issues/3859#issuecomment-124531936.

knobunc on 24 Jul 2015

Here are the results from the commands Kenjiro wanted me to run:

I was not able to run two of the commands. Here are the results of the commands that ran successfully (notice in the ./oc status command I removed some of the information from the private repository url):

./oc get -n default se/docker-registry
NAME LABELS SELECTOR IP(S) PORT(S)
docker-registry docker-registry=default docker-registry=default 172.30.13.89 5000/TCP
[root@localhost openshift]#

./oc status
In project OpenShift 3 Sample (test)

service database (172.30.147.165:5434 -> 3306)
database deploys docker.io/openshift/mysql-55-centos7:latest
#1 deployed 41 hours ago - 1 pod

service frontend (172.30.245.209:5432 -> 8080)
frontend deploys origin-ruby-sample:latest <-
builds https://username:[email protected]/Code/......./ruby-hello-world.git with test/ruby-20-centos7:latest (I removed my login information and some of the private repo url)
#2 deployment failed 41 hours ago
#1 deployment failed 41 hours ago

Commands that did not work:

curl -v ./oc get services | grep registry | awk '{print $4":"$5}/v2/' | sed 's,/[^/]\+$,/v2/,'
curl: no URL specified!
curl: try 'curl --help' or 'curl --manual' for more information

./oc describe pod ./oc get pod | grep docker-registry | awk '{print $1}'
error: you must provide one or more resources by argument or filename

deanpeterson on 24 Jul 2015

I bet the curl command was supposed to be:
curl -v ./oc get services | grep registry | awk '{print $4":"$5}/v2/' | sed 's,/[^/]\+$,/v2/,'

Can you try that please?

On Fri, Jul 24, 2015 at 10:10 AM, deanpeterson [email protected]
wrote:

Here are the results from the commands Kenjiro wanted me to run:

I was not able to run two of the commands. Here are the results of the
commands that ran successfully (notice in the ./oc status command I removed
some of the information from the private repository url):

./oc get -n default se/docker-registry
NAME LABELS SELECTOR IP(S) PORT(S)
docker-registry docker-registry=default docker-registry=default
172.30.13.89 5000/TCP
[root@localhost openshift]#

./oc status
In project OpenShift 3 Sample (test)

service database (172.30.147.165:5434 -> 3306)
database deploys docker.io/openshift/mysql-55-centos7:latest

1 https://github.com/openshift/origin/pull/1 deployed 41 hours ago - 1

pod

service frontend (172.30.245.209:5432 -> 8080)
frontend deploys origin-ruby-sample:latest <-
builds https://username:[email protected]/Code/......./ruby-hello-world.git
with test/ruby-20-centos7:latest (I removed my login information and some
of the private repo url)

2 https://github.com/openshift/origin/pull/2 deployment failed 41

hours ago

1 https://github.com/openshift/origin/pull/1 deployment failed 41

hours ago

Commands that did not work:

curl -v ./oc get services | grep registry | awk '{print $4":"$5}/v2/' |
sed 's,/[^/]+$,/v2/,'
curl: no URL specified!
curl: try 'curl --help' or 'curl --manual' for more information

./oc describe pod ./oc get pod | grep docker-registry | awk '{print $1}'
error: you must provide one or more resources by argument or filename

—
Reply to this email directly or view it on GitHub
https://github.com/openshift/origin/issues/3859#issuecomment-124535168.

knobunc on 24 Jul 2015

Hmm, I must be doing something wrong:

[root@localhost openshift]# curl -v ./oc get services | grep registry | awk '{print $4":"$5}/v2/' | sed 's,/[^/]\+$,/v2/,'
curl: no URL specified!
curl: try 'curl --help' or 'curl --manual' for more information

The beginning quote does not show before -v './oc get services, but it was there.

deanpeterson on 24 Jul 2015

Check the namespace you are querying for services. You might need to add "-n default" to the "get services" bit

liggitt on 24 Jul 2015

I think you lost the backticks around the oc command... or at least
they didn't get pasted back in. Can you just run:
./oc get services

Thanks

On Fri, Jul 24, 2015 at 11:27 AM, Jordan Liggitt [email protected]
wrote:

Check the namespace you are querying for services. You might need to add
"-n default" to the "get services" bit

—
Reply to this email directly or view it on GitHub
https://github.com/openshift/origin/issues/3859#issuecomment-124558713.

knobunc on 24 Jul 2015

adding -n default seemed to do the trick:

[root@localhost openshift]# curl -v ./oc get services -n default | grep registry | awk '{print $4":"$5}/v2/' | sed 's,/[^/]\+$,/v2/,'

About to connect() to 172.30.13.89 port 5000 (#0)
Trying 172.30.13.89...
Connected to 172.30.13.89 (172.30.13.89) port 5000 (#0)

GET /v2/ HTTP/1.1
User-Agent: curl/7.29.0
Host: 172.30.13.89:5000
Accept: _/_

Recv failure: Connection reset by peer

Closing connection 0
curl: (56) Recv failure: Connection reset by peer
[root@localhost openshift]#

deanpeterson on 24 Jul 2015

and just ./oc get services:

[root@localhost openshift]# ./oc get services
NAME LABELS SELECTOR IP(S) PORT(S)
database template=application-template-stibuild name=database 172.30.147.165 5434/TCP
frontend template=application-template-stibuild name=frontend 172.30.245.209 5432/TCP

deanpeterson on 24 Jul 2015

Can I get access to the machine, or would you rather I keep asking you to run commands?

knobunc on 24 Jul 2015

The machines are at my work. The only way to access our internal network is with configured vpn software. Unfortunately that has to be set up by operations staff and a special keyfab is used to generate codes to get in.

deanpeterson on 24 Jul 2015

Ah, ok... are you allowed to run tcpdump on that system? I want to sniff
what's going on between the two machines while that curl request happens.

On Fri, Jul 24, 2015 at 1:22 PM, deanpeterson [email protected]
wrote:

The machines are at my work. The only way to access our internal network
is with configured vpn software. Unfortunately that has to be set up by
operations staff and a special keyfab is used to generate codes to get in.

—
Reply to this email directly or view it on GitHub
https://github.com/openshift/origin/issues/3859#issuecomment-124585304.

knobunc on 24 Jul 2015

Yes, I have complete control. I had to run an errand and heading back to office now. I will run the TCP dump right when I get in.

deanpeterson on 24 Jul 2015

There is a lot of noise on the device. I think i used tcpdump on the correct device using the -i option. I ran the curl command while tcpdump was listening but it is hard to tell what packets are related. Here is a link to the tcpdump:

https://drive.google.com/file/d/0B2jPVs9ymvNdWl85VU1DbExYbzg/view?usp=sharing

I noticed, in the terminal used to start openshift (on the master), this prints whenever I try to run that curl command:
14:04:59.736003 4709 proxysocket.go:92] Dial failed: dial tcp 172.17.0.1:5000: i/o timeout
E0724 14:05:01.736196 4709 proxysocket.go:92] Dial failed: dial tcp 172.17.0.1:5000: i/o timeout
E0724 14:05:01.742019 4709 proxysocket.go:92] Dial failed: dial tcp 172.17.0.1:5000: no route to host
E0724 14:05:04.748052 4709 proxysocket.go:92] Dial failed: dial tcp 172.17.0.1:5000: no route to host
E0724 14:05:04.748094 4709 proxysocket.go:126] Failed to connect to balancer: failed to connect to an endpoint.

I don't know if that means anything to you.

deanpeterson on 24 Jul 2015

Oh, that could be interesting. Thanks for the dump, and for noticing that
output on the master.

On Fri, Jul 24, 2015 at 2:09 PM, deanpeterson [email protected]
wrote:

There is a lot of noise on the device. Here is a link to the tcpdump:

https://drive.google.com/file/d/0B2jPVs9ymvNdWl85VU1DbExYbzg/view?usp=sharing

I noticed in the terminal used to start openshift on the master this is
printed whenever I try to run that curl command:
14:04:59.736003 4709 proxysocket.go:92] Dial failed: dial tcp
172.17.0.1:5000: i/o timeout
E0724 14:05:01.736196 4709 proxysocket.go:92] Dial failed: dial tcp
172.17.0.1:5000: i/o timeout
E0724 14:05:01.742019 4709 proxysocket.go:92] Dial failed: dial tcp
172.17.0.1:5000: no route to host
E0724 14:05:04.748052 4709 proxysocket.go:92] Dial failed: dial tcp
172.17.0.1:5000: no route to host
E0724 14:05:04.748094 4709 proxysocket.go:126] Failed to connect to
balancer: failed to connect to an endpoint.

I don't know if that means anything to you.

—
Reply to this email directly or view it on GitHub
https://github.com/openshift/origin/issues/3859#issuecomment-124603342.

knobunc on 24 Jul 2015

Thanks for all your help.

deanpeterson on 24 Jul 2015

Dean, can you re-run the tcpdump with: tcpdump -i enp8s0 -nn host 172.30.13.89

And, you are running kube-proxy on all the nodes, right?

Thanks...

knobunc on 24 Jul 2015

The enp8s0 tcpdump did not capture anything. However, I ran it again with lo instead of enp8s0 and received this information:

[root@localhost dpeterson]# tcpdump -i lo -nn host 172.30.13.89
tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
listening on lo, link-type EN10MB (Ethernet), capture size 65535 bytes
15:54:30.432541 IP 172.30.13.89.5000 > 172.19.17.143.35848: Flags [S.], seq 1814618202, ack 1088132762, win 43690, options [mss 65495,sackOK,TS val 1380022808 ecr 1380022808,nop,wscale 7], length 0
15:54:30.432644 IP 172.30.13.89.5000 > 172.19.17.143.35848: Flags [.], ack 85, win 342, options [nop,nop,TS val 1380022808 ecr 1380022808], length 0
15:54:36.444095 IP 172.30.13.89.5000 > 172.19.17.143.35848: Flags [R.], seq 1, ack 85, win 342, options [nop,nop,TS val 1380028820 ecr 1380022808], length 0
^C
3 packets captured
6 packets received by filter
0 packets dropped by kernel

deanpeterson on 24 Jul 2015

Interestung. What does this return: route -n

knobunc on 24 Jul 2015

Kernel IP routing table
Destination Gateway Genmask Flags Metric Ref Use Iface
0.0.0.0 172.19.17.1 0.0.0.0 UG 100 0 0 enp8s0
172.17.0.0 0.0.0.0 255.255.0.0 U 0 0 0 docker0
172.19.17.0 0.0.0.0 255.255.255.0 U 0 0 0 enp8s0
172.19.17.0 0.0.0.0 255.255.255.0 U 100 0 0 enp8s0
172.19.41.226 172.19.17.1 255.255.255.255 UGH 100 0 0 enp8s0

deanpeterson on 24 Jul 2015

What's: ip a
Show?

knobunc on 24 Jul 2015

ip a
1: lo: mtu 65536 qdisc noqueue state UNKNOWN
link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
inet 127.0.0.1/8 scope host lo
valid_lft forever preferred_lft forever
inet6 ::1/128 scope host
valid_lft forever preferred_lft forever
2: enp8s0: mtu 1500 qdisc mq state UP qlen 1000
link/ether 00:23:ae:6f:50:aa brd ff:ff:ff:ff:ff:ff
inet 172.19.17.143/24 brd 172.19.17.255 scope global dynamic enp8s0
valid_lft 518908sec preferred_lft 518908sec
inet6 fe80::223:aeff:fe6f:50aa/64 scope link
valid_lft forever preferred_lft forever
3: docker0: mtu 1500 qdisc noqueue state DOWN
link/ether 56:84:7a:fe:97:99 brd ff:ff:ff:ff:ff:ff
inet 172.17.42.1/16 scope global docker0
valid_lft forever preferred_lft forever
inet6 fe80::5484:7aff:fefe:9799/64 scope link
valid_lft forever preferred_lft forever

deanpeterson on 24 Jul 2015

Are you running the kubelet process? However, I am stumped as to why that curl routed over the loopback interface. Also that tcpdump doesn't show the start of the TCP handshake... did you run curl after you had started up the tcpdump? Or might it still have been starting and missed a bit...

knobunc on 24 Jul 2015

Ok, I was able to run commands again by following the admin guide and running these commands again:
$ export KUBECONFIG=pwd/openshift.local.config/master/admin.kubeconfig
$ export CURL_CA_BUNDLE=pwd/openshift.local.config/master/ca.crt
$ sudo chmod +r pwd/openshift.local.config/master/admin.kubeconfig

when i run ./oc get nodes i see this:
./oc get nodes
NAME LABELS STATUS
localhost.localdomain kubernetes.io/hostname=localhost.localdomain Ready
rhel.node.2 kubernetes.io/hostname=rhel.node.2 Ready

I ran tcpdump again and made sure it was running well in advance of running the curl command:
curl -v ./oc get services -n default | grep registry | awk '{print $4":"$5}/v2/' | sed 's,/[^/]\+$,/v2/,'

Nothing is captured for tcpdump -i enp8s0 -nn host 172.30.13.89, I rerun the the dump with lo:

tcpdump -i enp8s0 -nn host 172.30.13.89
tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
listening on enp8s0, link-type EN10MB (Ethernet), capture size 65535 bytes
^C
0 packets captured
0 packets received by filter
0 packets dropped by kernel
[root@localhost openshift]# tcpdump -i lo -nn host 172.30.13.89
tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
listening on lo, link-type EN10MB (Ethernet), capture size 65535 bytes
16:40:38.012627 IP 172.30.13.89.5000 > 172.19.17.143.39371: Flags [S.], seq 335054175, ack 3631347388, win 43690, options [mss 65495,sackOK,TS val 1382790388 ecr 1382790388,nop,wscale 7], length 0
16:40:38.012720 IP 172.30.13.89.5000 > 172.19.17.143.39371: Flags [.], ack 85, win 342, options [nop,nop,TS val 1382790388 ecr 1382790388], length 0
16:40:44.024092 IP 172.30.13.89.5000 > 172.19.17.143.39371: Flags [R.], seq 1, ack 85, win 342, options [nop,nop,TS val 1382796400 ecr 1382790388], length 0
^C
3 packets captured
6 packets received by filter
0 packets dropped by kernel
[root@localhost openshift]#

deanpeterson on 24 Jul 2015

When I start the master I use ./openshift start. Doesn't that start the kubelet process?

deanpeterson on 24 Jul 2015

I guess I'll pull down the latest version of openshift and give this one more try from scratch.

deanpeterson on 24 Jul 2015

Ok, thanks for persevering. Please keep me in the loop...

On Fri, Jul 24, 2015 at 5:23 PM, deanpeterson [email protected]
wrote:

I guess I'll pull down the latest version of openshift and give this one
more try from scratch.

—
Reply to this email directly or view it on GitHub
https://github.com/openshift/origin/issues/3859#issuecomment-124729129.

knobunc on 25 Jul 2015

Any luck?

knobunc on 27 Jul 2015

I lost the battle at work. This was a bit too hard for me to get going. It looks like everyone wants to go with Amazon.

deanpeterson on 28 Jul 2015

I'm sorry to hear that. Thanks for trying though. Of course, you can run
OpenShift on the Amazon compute resources (since you will still need to
solve many of the problems that OpenShift addresses even when you have EC
nodes).

BTW does it make sense to close the issue if you are not going to be able to pursue it further?

Thanks

On Tue, Jul 28, 2015 at 2:43 PM, deanpeterson [email protected]
wrote:

I lost the battle at work. This was a bit too hard for me to get going. It
looks like everyone wants to go with Amazon.

—
Reply to this email directly or view it on GitHub
https://github.com/openshift/origin/issues/3859#issuecomment-125710762.

knobunc on 28 Jul 2015

That is what I tried to tell them. However, I am the one that rolls up his sleeves and works through the problems. There is another architect that believes Amazon is the answer to every problem (at least as far as running Docker containers is involved). I am not as familiar with Amazon and neither is he. However he has no problem promising upper management that Amazon can do everything and anything automatically with little to no legwork. Do you have a list of what Amazon does not do that Openshift does do? It seems Amazon is coming out with a new service every day. It's hard for me to win an argument when it looks like they have so much velocity and I can't prove it out without money to try Amazon. Unlike Redhat where I can at least get a feel for things quite easily before asking management to write an RFP.

deanpeterson on 28 Jul 2015

And when upper management is involved, politics always wins. One guy is promising them that Amazon will just run and the world will be rainbows and butterflies. I go into technical details based on experience and their eyes glaze over. Guess who wins.

deanpeterson on 28 Jul 2015

He also just sent me this trying to prove that Redhat's business model is flawed:
http://www.cio.com/article/2944334/open-source-development/why-the-open-source-business-model-is-a-failure.html

How do I counter that?

deanpeterson on 28 Jul 2015

@deanpeterson for every post like that, you could probably find one that questions how Amazon can sustain a business model where they make no profit. Obviously Red Hat continues to grow, grow market share, acquire new technologies and build new innovative solutions like OpenShift. In fact, we in Red Hat utilize AWS quite a bit.

So why OpenShift? Well being backed by open source, as we all know, removes vendor lock-in and leverages a community to deliver key capabilities, Red Hat is a strong contributor in many open source communities to deliver on the containerized application vision: Docker, Kubernetes and OpenShift origin. Red Hat, distributes these technologies in a tested and supported distribution and configurations so you can easily run the solution on premise or off. Perhaps it would be best to continue this discussion elsewhere, you can find my contact information on my GitHub page.

sspeiche on 28 Jul 2015

@deanpeterson, what do you want to do with this issue report? Keep it open, even though you aren't going to be able to help debug? I can't reproduce it on my set-up (but that doesn't mean it's not reproducible). I plan to look over the troubleshooting doc to see if I can extend it...

knobunc on 28 Jul 2015

I am fine with closing this.

deanpeterson on 28 Jul 2015

Thanks for all of the help investigating it. I'm sorry we couldn't work
out the problem. If you ever try again and hit this issue, please let me
know and we can pick up the debugging.

On Tue, Jul 28, 2015 at 4:35 PM, deanpeterson [email protected]
wrote:

I am fine with closing this.

—
Reply to this email directly or view it on GitHub
https://github.com/openshift/origin/issues/3859#issuecomment-125746307.

knobunc on 29 Jul 2015

I was able to get everything working without adding the second physical node. I am not sure if that was the problem the first time around or not. I pulled down the latest version and was able to use the template file pointing to my private git repository on Kiln. I was able to create and deploy the ruby sample app successfully. You said I could run Openshit Origin on Amazon. I see little to no documentation on this. Is there a good place to look for that?

deanpeterson on 29 Jul 2015

Try this:
https://github.com/openshift/openshift-ansible/blob/master/README_AWS.md

On Wed, Jul 29, 2015 at 4:00 PM, deanpeterson [email protected]
wrote:

I was able to get everything working without adding the second physical
node. I am not sure if that was the problem the first time around or not. I
pulled down the latest version and the template file pointing to my private
git repository on Kiln successfully. You said I could run Openshit Origin
on Amazon. I see little to no documentation on this. Is there a good place
to look for that?

—
Reply to this email directly or view it on GitHub
https://github.com/openshift/origin/issues/3859#issuecomment-126079670.

knobunc on 29 Jul 2015

Great thanks!

deanpeterson on 29 Jul 2015

This will have been outstanding for over a month with no other comments. AFAIK, others haven't been able to re-create. Perhaps its time we close the bug?

devop-mmcgrath on 28 Aug 2015

I could reproduce the issue. My environment is multi-node using vagrant and export OPENSHIFT_SDN=redhat/openshift-ovs-multitenant
openshift v1.0.5-115-gdace507
kubernetes v1.1.0-alpha.0-1605-g44c91b1

qwang1 on 2 Sep 2015

Interesting, can you reliably reproduce it? Can you give me precise steps? Thanks.

knobunc on 2 Sep 2015

i got same issue, and capture some logs from docker(/var/log/message centos), hope to a bit help for fix
Nov 11 19:09:11 iZu15atz2x8Z origin-node: I1111 19:09:11.300061 27195 helpers.go:96] Unable to get network stats from pid 9149: couldn
't read network stats: failure opening /proc/9149/net/dev: open /proc/9149/net/dev: no such file or directory
Nov 11 19:09:12 iZu15atz2x8Z origin-node: I1111 19:09:12.092821 27195 helpers.go:96] Unable to get network stats from pid 9076: couldn
't read network stats: failure opening /proc/9076/net/dev: open /proc/9076/net/dev: no such file or directory
Nov 11 19:09:12 iZu15atz2x8Z docker: time="2015-11-11T19:09:12.970236339+08:00" level=info msg="GET /images/json"
Nov 11 19:09:13 iZu15atz2x8Z origin-node: I1111 19:09:13.092504 27195 helpers.go:96] Unable to get network stats from pid 9076: couldn
't read network stats: failure opening /proc/9076/net/dev: open /proc/9076/net/dev: no such file or directory
Nov 11 19:09:13 iZu15atz2x8Z docker: time="2015-11-11T19:09:13.129131432+08:00" level=info msg="GET /containers/json?all=1"
Nov 11 19:09:13 iZu15atz2x8Z origin-node: I1111 19:09:13.299777 27195 helpers.go:96] Unable to get network stats from pid 9149: couldn
't read network stats: failure opening /proc/9149/net/dev: open /proc/9149/net/dev: no such file or directory
Nov 11 19:09:15 iZu15atz2x8Z origin-node: I1111 19:09:15.092616 27195 helpers.go:96] Unable to get network stats from pid 9076: couldn
't read network stats: failure opening /proc/9076/net/dev: open /proc/9076/net/dev: no such file or directory
Nov 11 19:09:15 iZu15atz2x8Z docker: time="2015-11-11T19:09:15.731629027+08:00" level=info msg="GET /version"
Nov 11 19:09:17 iZu15atz2x8Z docker: time="2015-11-11T19:09:17.224192076+08:00" level=info msg="GET /version"
Nov 11 19:09:17 iZu15atz2x8Z origin-node: I1111 19:09:17.300308 27195 helpers.go:96] Unable to get network stats from pid 9149: couldn
't read network stats: failure opening /proc/9149/net/dev: open /proc/9149/net/dev: no such file or directory

dragon9783 on 11 Nov 2015

Unfortunately, those error messages are not really indicative of anything. Can you try to use the SDN troubleshooting guide (https://docs.openshift.org/latest/admin_guide/sdn_troubleshooting.html). If you can't work it out using that, please make sure you follow the instructions in "Further Help" and point me at the generated file.

knobunc on 11 Nov 2015

I didn't meet the problem during these several rounds of testing. I'll still pay attention to this issue.

qwang1 on 22 Jan 2016

Same error :

févr. 09 05:03:23 ose3-node2.example.com origin-node[6053]: I0209 05:03:23.170817    6053 proxier.go:294] Adding new service "default/kubernetes:dns" at 172.30.0.1:53/UDP
févr. 09 05:03:23 ose3-node2.example.com origin-node[6053]: I0209 05:03:23.170869    6053 proxier.go:294] Adding new service "default/kubernetes:dns-tcp" at 172.30.0.1:53/TCP
févr. 09 05:35:34 ose3-node2.example.com origin-node[6053]: I0209 05:35:34.326273    6053 proxier.go:294] Adding new service "default/docker-registry:5000-tcp" at 172.30.49.223:5000/TCP
févr. 09 05:35:34 ose3-node2.example.com origin-node[6053]: I0209 05:35:34.585426    6053 kubelet.go:2169] SyncLoop (ADD, "api"): "docker-registry-1-deploy_default"
févr. 09 05:35:34 ose3-node2.example.com origin-node[6053]: I0209 05:35:34.629945    6053 manager.go:1720] Need to restart pod infra container for "docker-registry-1-deploy_default" because it is not found
févr. 09 05:35:34 ose3-node2.example.com origin-node[6053]: I0209 05:35:34.637097    6053 provider.go:91] Refreshing cache for provider: *credentialprovider.defaultDockerConfigProvider
févr. 09 05:35:34 ose3-node2.example.com origin-node[6053]: I0209 05:35:34.637405    6053 docker.go:159] Pulling image openshift/origin-pod:v1.1.1.1 without credentials
févr. 09 05:35:39 ose3-node2.example.com ovs-vsctl[9125]: ovs|00001|vsctl|INFO|Called as ovs-vsctl add-port br0 veth485ac71
févr. 09 05:35:39 ose3-node2.example.com origin-node[6053]: W0209 05:35:39.948578    6053 manager.go:1892] Hairpin setup failed for pod "docker-registry-1-deploy_default": open /sys/devices/virtual/net/veth485ac71/brport/hairpin_mode: no such file or directory
févr. 09 05:35:39 ose3-node2.example.com origin-node[6053]: I0209 05:35:39.949475    6053 docker.go:159] Pulling image openshift/origin-deployer:v1.1.1.1 without credentials
févr. 09 05:36:42 ose3-node2.example.com origin-node[6053]: I0209 05:36:42.071073    6053 helpers.go:96] Unable to get network stats from pid 9457: couldn't read network stats: failure opening /proc/9457/net/dev: open /proc/9457/net/dev: no such file or directory
févr. 09 05:36:42 ose3-node2.example.com ovs-vsctl[9503]: ovs|00001|vsctl|INFO|Called as ovs-vsctl --if-exists del-port veth485ac71
févr. 09 05:36:42 ose3-node2.example.com origin-node[6053]: I0209 05:36:42.285123    6053 manager.go:1419] Killing container "e05e5eea3d310b9f92363feaff1864100829b4323946f2d50b7af19c5bfc374f default/docker-registry-1-deploy" with 30 second grace period
févr. 09 05:36:43 ose3-node2.example.com origin-node[6053]: I0209 05:36:43.071070    6053 helpers.go:96] Unable to get network stats from pid 9457: couldn't read network stats: failure opening /proc/9457/net/dev: open /proc/9457/net/dev: no such file or directory
févr. 09 05:36:43 ose3-node2.example.com origin-node[6053]: I0209 05:36:43.195859    6053 manager.go:1451] Container "e05e5eea3d310b9f92363feaff1864100829b4323946f2d50b7af19c5bfc374f default/docker-registry-1-deploy" exited after 910.697662ms
févr. 09 05:36:43 ose3-node2.example.com origin-node[6053]: I0209 05:36:43.370212    6053 helpers.go:96] Unable to get network stats from pid 9096: couldn't read network stats: failure opening /proc/9096/net/dev: open /proc/9096/net/dev: no such file or directory
févr. 09 05:36:43 ose3-node2.example.com origin-node[6053]: I0209 05:36:43.824606    6053 container.go:430] Failed to update stats for container "/system.slice/rhel-dmesg.service": failed to parse memory.usage_in_bytes - read /sys/fs/cgroup/memory/system.slice/rhel-dmesg.service/memory.usage_in_bytes: no such device, continuing to push stats
févr. 09 05:36:44 ose3-node2.example.com origin-node[6053]: I0209 05:36:44.367634    6053 helpers.go:96] Unable to get network stats from pid 9096: couldn't read network stats: failure opening /proc/9096/net/dev: open /proc/9096/net/dev: no such file or directory
févr. 09 05:36:45 ose3-node2.example.com origin-node[6053]: I0209 05:36:45.062070    6053 helpers.go:96] Unable to get network stats from pid 9457: couldn't read network stats: failure opening /proc/9457/net/dev: open /proc/9457/net/dev: no such file or directory
févr. 09 05:36:46 ose3-node2.example.com origin-node[6053]: I0209 05:36:46.219197    6053 helpers.go:96] Unable to get network stats from pid 9096: couldn't read network stats: failure opening /proc/9096/net/dev: open /proc/9096/net/dev: no such file or directory
févr. 09 05:36:49 ose3-node2.example.com origin-node[6053]: I0209 05:36:49.062221    6053 helpers.go:96] Unable to get network stats from pid 9457: couldn't read network stats: failure opening /proc/9457/net/dev: open /proc/9457/net/dev: no such file or directory
févr. 09 05:36:50 ose3-node2.example.com origin-node[6053]: I0209 05:36:50.219607    6053 helpers.go:96] Unable to get network stats from pid 9096: couldn't read network stats: failure opening /proc/9096/net/dev: open /proc/9096/net/dev: no such file or directory
févr. 09 05:36:57 ose3-node2.example.com origin-node[6053]: I0209 05:36:57.069812    6053 helpers.go:96] Unable to get network stats from pid 9457: couldn't read network stats: failure opening /proc/9457/net/dev: open /proc/9457/net/dev: no such file or directory
févr. 09 05:36:58 ose3-node2.example.com origin-node[6053]: I0209 05:36:58.299551    6053 helpers.go:96] Unable to get network stats from pid 9096: couldn't read network stats: failure opening /proc/9096/net/dev: open /proc/9096/net/dev: no such file or directory
févr. 09 05:37:12 ose3-node2.example.com origin-node[6053]: I0209 05:37:12.062444    6053 helpers.go:96] Unable to get network stats from pid 9457: couldn't read network stats: failure opening /proc/9457/net/dev: open /proc/9457/net/dev: no such file or directory
févr. 09 05:37:13 ose3-node2.example.com origin-node[6053]: I0209 05:37:13.062876    6053 helpers.go:96] Unable to get network stats from pid 9457: couldn't read network stats: failure opening /proc/9457/net/dev: open /proc/9457/net/dev: no such file or directory
févr. 09 05:37:13 ose3-node2.example.com origin-node[6053]: I0209 05:37:13.257192    6053 helpers.go:96] Unable to get network stats from pid 9096: couldn't read network stats: failure opening /proc/9096/net/dev: open /proc/9096/net/dev: no such file or directory
févr. 09 05:37:14 ose3-node2.example.com origin-node[6053]: I0209 05:37:14.507035    6053 helpers.go:96] Unable to get network stats from pid 9096: couldn't read network stats: failure opening /proc/9096/net/dev: open /proc/9096/net/dev: no such file or directory
févr. 09 05:37:15 ose3-node2.example.com origin-node[6053]: I0209 05:37:15.062669    6053 helpers.go:96] Unable to get network stats from pid 9457: couldn't read network stats: failure opening /proc/9457/net/dev: open /proc/9457/net/dev: no such file or directory
févr. 09 05:37:16 ose3-node2.example.com origin-node[6053]: I0209 05:37:16.609264    6053 helpers.go:96] Unable to get network stats from pid 9096: couldn't read network stats: failure opening /proc/9096/net/dev: open /proc/9096/net/dev: no such file or directory
févr. 09 05:37:19 ose3-node2.example.com origin-node[6053]: I0209 05:37:19.062320    6053 helpers.go:96] Unable to get network stats from pid 9457: couldn't read network stats: failure opening /proc/9457/net/dev: open /proc/9457/net/dev: no such file or directory
févr. 09 05:37:20 ose3-node2.example.com origin-node[6053]: I0209 05:37:20.605642    6053 helpers.go:96] Unable to get network stats from pid 9096: couldn't read network stats: failure opening /proc/9096/net/dev: open /proc/9096/net/dev: no such file or directory

harobed on 9 Feb 2016

👍1

+1.

hodrigohamalho on 27 Apr 2016

These network stats issues are coming from the monitoring of the pods... they are not harmful, and there's not much we can do about them.

knobunc on 7 Oct 2016

Origin: Unable to pull from registry

All 59 comments

1 https://github.com/openshift/origin/pull/1 deployed 41 hours ago - 1

2 https://github.com/openshift/origin/pull/2 deployment failed 41

1 https://github.com/openshift/origin/pull/1 deployment failed 41

Related issues