I now have a successful build that was pushed to the private registry. However, when deploying the frontend portion of the sample-app, I get the following error in the event logs:
Failed to pull image "172.30.13.89:5000/test/origin-ruby-sample@sha256:d7e1ed4818f45fc14a6ea98c622fddd81e2ab36caad8c94850952d8d150a2952": API error (500): v1 ping attempt failed with error: Get http://172.30.13.89:5000/v1/_ping: read tcp 172.30.13.89:5000: connection reset by peer
How is it, the build portion of the process can see the registry and push to it, but the front end deployment piece is unable to pull the image that was just pushed?
@deanpeterson
Does the ip addresses of the registry in the push from the build logs match the ip address from the pull in the deployment?
Yes. 172.30.13.89:5000 is the ip and port of the registry in my default namespace. I can see in the build logs the image was successfully pushed to 172.30.13.89:5000
Here is the build log:
Using bundler (1.3.5)
I0722 20:35:37.777526 1 sti.go:388] Your bundle is complete!
I0722 20:35:37.777579 1 sti.go:388] It was installed into ./bundle
I0722 20:35:37.811400 1 sti.go:388] ---> Cleaning up unused ruby gems
I0722 20:35:42.736725 1 sti.go:131] Using provided push secret for pushing 172.30.13.89:5000/test/origin-ruby-sample image
I0722 20:35:42.736761 1 sti.go:134] Pushing 172.30.13.89:5000/test/origin-ruby-sample image ...
I0722 20:36:47.613637 1 sti.go:138] Successfully pushed 172.30.13.89:5000/test/origin-ruby-sample
And this is the deployment event error:
Failed to pull image "172.30.13.89:5000/test/origin-ruby-sample@sha256:d7e1ed4818f45fc14a6ea98c622fddd81e2ab36caad8c94850952d8d150a2952": API error (500): v1 ping attempt failed with error: Get http://172.30.13.89:5000/v1/_ping: read tcp 172.30.13.89:5000: connection reset by peer
And this is my current running registry in the default namespace:
openshift]# ./oc get -n default se/docker-registry
NAME LABELS SELECTOR IP(S) PORT(S)
docker-registry docker-registry=default docker-registry=default 172.30.13.89 5000/TCP
[root@localhost openshift]#
@rajatchopra @ramr @pravisankar @knobunc can one of you take a look at this from a networking perspective?
Is the instance still available? Or does it reproduce reliably? I'd be interested if anything in the iptables was messing you up...
Yes, the instance is still available. It is a two physical node setup at work. I will be going back to work in a few minutes. What should I look for on the iptables? I received this on the openshift user's list last night so I will be running these commands to get more information when I get back in:
Hi,
Could you please show us the following command output?
curl -v oc get services | grep registry | awk '{print $4":"$5}/v2/' | sed 's,/[^/]\+$,/v2/,'
oc describe service docker-registry
oc describe pod oc get pod | grep docker-registry | awk '{print $1}'
oc status
_NOTE_ Please run them with default project. You can use it with '#oc project default' or 'oc -n default ....'.
Thanks,
Kenjiro
If you can get the output from: iptables -L
On Fri, Jul 24, 2015 at 9:12 AM, deanpeterson [email protected]
wrote:
Yes, the instance is still available. It is a two physical node setup at
work. I will be going back to work in a few minutes. What should I look for
on the iptables? I received this on the openshift user's list last night so
I will be running these commands to get more information when I get back in:Hi,
Could you please show us the following command output?
curl -v oc get services | grep registry | awk '{print $4":"$5}/v2/' | sed
's,/[^/]+$,/v2/,' oc describe service docker-registry oc describe pod oc
get pod | grep docker-registry | awk '{print $1}' oc status_NOTE_ Please run them with default project. You can use it with '#oc
project default' or 'oc -n default ....'.Thanks,
Kenjiro—
Reply to this email directly or view it on GitHub
https://github.com/openshift/origin/issues/3859#issuecomment-124515734.
[root@localhost openshift]# iptables -L
Chain INPUT (policy ACCEPT)
target prot opt source destination
Chain FORWARD (policy ACCEPT)
target prot opt source destination
DOCKER all -- anywhere anywhere
ACCEPT all -- anywhere anywhere ctstate RELATED,ESTABLISHED
ACCEPT all -- anywhere anywhere
ACCEPT all -- anywhere anywhere
Chain OUTPUT (policy ACCEPT)
target prot opt source destination
Chain DOCKER (1 references)
target prot opt source destination
Wow. Clean rules... that's clearly not the problem, unless that's a VM and
the host machine has something?
On Fri, Jul 24, 2015 at 9:49 AM, deanpeterson [email protected]
wrote:
[root@localhost openshift]# iptables -L
Chain INPUT (policy ACCEPT)
target prot opt source destinationChain FORWARD (policy ACCEPT)
target prot opt source destinationDOCKER all -- anywhere anywhere
ACCEPT all -- anywhere anywhere ctstate RELATED,ESTABLISHED
ACCEPT all -- anywhere anywhereACCEPT all -- anywhere anywhere
Chain OUTPUT (policy ACCEPT)
target prot opt source destinationChain DOCKER (1 references)
target prot opt source destination—
Reply to this email directly or view it on GitHub
https://github.com/openshift/origin/issues/3859#issuecomment-124527667.
No, that is directly on the physical host machine. I have the binary all in one running directly on the host and I have a second node in the ready state on another physical machine. The iptable rules are the same on the second physical node.
What was the outcome of the commands they suggested?
Are your private registry and the other machine directly connected?
Thanks for helping to debug this...
-ben
On Fri, Jul 24, 2015 at 10:00 AM, deanpeterson [email protected]
wrote:
No, that is directly on the physical host machine. I have the binary all
in one running directly on the host and I have a second node in the ready
state on another physical machine. The iptable rules are the same on the
second physical node.—
Reply to this email directly or view it on GitHub
https://github.com/openshift/origin/issues/3859#issuecomment-124531936.
Here are the results from the commands Kenjiro wanted me to run:
I was not able to run two of the commands. Here are the results of the commands that ran successfully (notice in the ./oc status command I removed some of the information from the private repository url):
./oc get -n default se/docker-registry
NAME LABELS SELECTOR IP(S) PORT(S)
docker-registry docker-registry=default docker-registry=default 172.30.13.89 5000/TCP
[root@localhost openshift]#
./oc status
In project OpenShift 3 Sample (test)
service database (172.30.147.165:5434 -> 3306)
database deploys docker.io/openshift/mysql-55-centos7:latest
#1 deployed 41 hours ago - 1 pod
service frontend (172.30.245.209:5432 -> 8080)
frontend deploys origin-ruby-sample:latest <-
builds https://username:[email protected]/Code/......./ruby-hello-world.git with test/ruby-20-centos7:latest (I removed my login information and some of the private repo url)
#2 deployment failed 41 hours ago
#1 deployment failed 41 hours ago
Commands that did not work:
curl -v ./oc get services | grep registry | awk '{print $4":"$5}/v2/' | sed 's,/[^/]\+$,/v2/,'
curl: no URL specified!
curl: try 'curl --help' or 'curl --manual' for more information
./oc describe pod ./oc get pod | grep docker-registry | awk '{print $1}'
error: you must provide one or more resources by argument or filename
I bet the curl command was supposed to be:
curl -v ./oc get services | grep registry | awk '{print $4":"$5}/v2/' |
sed 's,/[^/]\+$,/v2/,'
Can you try that please?
On Fri, Jul 24, 2015 at 10:10 AM, deanpeterson [email protected]
wrote:
Here are the results from the commands Kenjiro wanted me to run:
I was not able to run two of the commands. Here are the results of the
commands that ran successfully (notice in the ./oc status command I removed
some of the information from the private repository url):./oc get -n default se/docker-registry
NAME LABELS SELECTOR IP(S) PORT(S)
docker-registry docker-registry=default docker-registry=default
172.30.13.89 5000/TCP
[root@localhost openshift]#./oc status
In project OpenShift 3 Sample (test)service database (172.30.147.165:5434 -> 3306)
database deploys docker.io/openshift/mysql-55-centos7:latest1 https://github.com/openshift/origin/pull/1 deployed 41 hours ago - 1
pod
service frontend (172.30.245.209:5432 -> 8080)
frontend deploys origin-ruby-sample:latest <-
builds https://username:[email protected]/Code/......./ruby-hello-world.git
with test/ruby-20-centos7:latest (I removed my login information and some
of the private repo url)2 https://github.com/openshift/origin/pull/2 deployment failed 41
hours ago
1 https://github.com/openshift/origin/pull/1 deployment failed 41
hours ago
Commands that did not work:
curl -v ./oc get services | grep registry | awk '{print $4":"$5}/v2/' |
sed 's,/[^/]+$,/v2/,'
curl: no URL specified!
curl: try 'curl --help' or 'curl --manual' for more information./oc describe pod ./oc get pod | grep docker-registry | awk '{print $1}'
error: you must provide one or more resources by argument or filename—
Reply to this email directly or view it on GitHub
https://github.com/openshift/origin/issues/3859#issuecomment-124535168.
Hmm, I must be doing something wrong:
[root@localhost openshift]# curl -v ./oc get services | grep registry | awk '{print $4":"$5}/v2/' | sed 's,/[^/]\+$,/v2/,'
curl: no URL specified!
curl: try 'curl --help' or 'curl --manual' for more information
The beginning quote does not show before -v './oc get services, but it was there.
Check the namespace you are querying for services. You might need to add "-n default" to the "get services" bit
I think you lost the backticks around the oc command... or at least
they didn't get pasted back in. Can you just run:
./oc get services
Thanks
On Fri, Jul 24, 2015 at 11:27 AM, Jordan Liggitt [email protected]
wrote:
Check the namespace you are querying for services. You might need to add
"-n default" to the "get services" bit—
Reply to this email directly or view it on GitHub
https://github.com/openshift/origin/issues/3859#issuecomment-124558713.
adding -n default seemed to do the trick:
[root@localhost openshift]# curl -v ./oc get services -n default | grep registry | awk '{print $4":"$5}/v2/' | sed 's,/[^/]\+$,/v2/,'
GET /v2/ HTTP/1.1
User-Agent: curl/7.29.0
Host: 172.30.13.89:5000
Accept: _/_
- Recv failure: Connection reset by peer
- Closing connection 0
curl: (56) Recv failure: Connection reset by peer
[root@localhost openshift]#
and just ./oc get services:
[root@localhost openshift]# ./oc get services
NAME LABELS SELECTOR IP(S) PORT(S)
database template=application-template-stibuild name=database 172.30.147.165 5434/TCP
frontend template=application-template-stibuild name=frontend 172.30.245.209 5432/TCP
Can I get access to the machine, or would you rather I keep asking you to run commands?
The machines are at my work. The only way to access our internal network is with configured vpn software. Unfortunately that has to be set up by operations staff and a special keyfab is used to generate codes to get in.
Ah, ok... are you allowed to run tcpdump on that system? I want to sniff
what's going on between the two machines while that curl request happens.
On Fri, Jul 24, 2015 at 1:22 PM, deanpeterson [email protected]
wrote:
The machines are at my work. The only way to access our internal network
is with configured vpn software. Unfortunately that has to be set up by
operations staff and a special keyfab is used to generate codes to get in.—
Reply to this email directly or view it on GitHub
https://github.com/openshift/origin/issues/3859#issuecomment-124585304.
Yes, I have complete control. I had to run an errand and heading back to office now. I will run the TCP dump right when I get in.
There is a lot of noise on the device. I think i used tcpdump on the correct device using the -i option. I ran the curl command while tcpdump was listening but it is hard to tell what packets are related. Here is a link to the tcpdump:
https://drive.google.com/file/d/0B2jPVs9ymvNdWl85VU1DbExYbzg/view?usp=sharing
I noticed, in the terminal used to start openshift (on the master), this prints whenever I try to run that curl command:
14:04:59.736003 4709 proxysocket.go:92] Dial failed: dial tcp 172.17.0.1:5000: i/o timeout
E0724 14:05:01.736196 4709 proxysocket.go:92] Dial failed: dial tcp 172.17.0.1:5000: i/o timeout
E0724 14:05:01.742019 4709 proxysocket.go:92] Dial failed: dial tcp 172.17.0.1:5000: no route to host
E0724 14:05:04.748052 4709 proxysocket.go:92] Dial failed: dial tcp 172.17.0.1:5000: no route to host
E0724 14:05:04.748094 4709 proxysocket.go:126] Failed to connect to balancer: failed to connect to an endpoint.
I don't know if that means anything to you.
Oh, that could be interesting. Thanks for the dump, and for noticing that
output on the master.
On Fri, Jul 24, 2015 at 2:09 PM, deanpeterson [email protected]
wrote:
There is a lot of noise on the device. Here is a link to the tcpdump:
https://drive.google.com/file/d/0B2jPVs9ymvNdWl85VU1DbExYbzg/view?usp=sharing
I noticed in the terminal used to start openshift on the master this is
printed whenever I try to run that curl command:
14:04:59.736003 4709 proxysocket.go:92] Dial failed: dial tcp
172.17.0.1:5000: i/o timeout
E0724 14:05:01.736196 4709 proxysocket.go:92] Dial failed: dial tcp
172.17.0.1:5000: i/o timeout
E0724 14:05:01.742019 4709 proxysocket.go:92] Dial failed: dial tcp
172.17.0.1:5000: no route to host
E0724 14:05:04.748052 4709 proxysocket.go:92] Dial failed: dial tcp
172.17.0.1:5000: no route to host
E0724 14:05:04.748094 4709 proxysocket.go:126] Failed to connect to
balancer: failed to connect to an endpoint.I don't know if that means anything to you.
—
Reply to this email directly or view it on GitHub
https://github.com/openshift/origin/issues/3859#issuecomment-124603342.
Thanks for all your help.
Dean, can you re-run the tcpdump with: tcpdump -i enp8s0 -nn host 172.30.13.89
And, you are running kube-proxy on all the nodes, right?
Thanks...
The enp8s0 tcpdump did not capture anything. However, I ran it again with lo instead of enp8s0 and received this information:
[root@localhost dpeterson]# tcpdump -i lo -nn host 172.30.13.89
tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
listening on lo, link-type EN10MB (Ethernet), capture size 65535 bytes
15:54:30.432541 IP 172.30.13.89.5000 > 172.19.17.143.35848: Flags [S.], seq 1814618202, ack 1088132762, win 43690, options [mss 65495,sackOK,TS val 1380022808 ecr 1380022808,nop,wscale 7], length 0
15:54:30.432644 IP 172.30.13.89.5000 > 172.19.17.143.35848: Flags [.], ack 85, win 342, options [nop,nop,TS val 1380022808 ecr 1380022808], length 0
15:54:36.444095 IP 172.30.13.89.5000 > 172.19.17.143.35848: Flags [R.], seq 1, ack 85, win 342, options [nop,nop,TS val 1380028820 ecr 1380022808], length 0
^C
3 packets captured
6 packets received by filter
0 packets dropped by kernel
Interestung. What does this return: route -n
Kernel IP routing table
Destination Gateway Genmask Flags Metric Ref Use Iface
0.0.0.0 172.19.17.1 0.0.0.0 UG 100 0 0 enp8s0
172.17.0.0 0.0.0.0 255.255.0.0 U 0 0 0 docker0
172.19.17.0 0.0.0.0 255.255.255.0 U 0 0 0 enp8s0
172.19.17.0 0.0.0.0 255.255.255.0 U 100 0 0 enp8s0
172.19.41.226 172.19.17.1 255.255.255.255 UGH 100 0 0 enp8s0
What's: ip a
Show?
ip a
1: lo:
link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
inet 127.0.0.1/8 scope host lo
valid_lft forever preferred_lft forever
inet6 ::1/128 scope host
valid_lft forever preferred_lft forever
2: enp8s0:
link/ether 00:23:ae:6f:50:aa brd ff:ff:ff:ff:ff:ff
inet 172.19.17.143/24 brd 172.19.17.255 scope global dynamic enp8s0
valid_lft 518908sec preferred_lft 518908sec
inet6 fe80::223:aeff:fe6f:50aa/64 scope link
valid_lft forever preferred_lft forever
3: docker0:
link/ether 56:84:7a:fe:97:99 brd ff:ff:ff:ff:ff:ff
inet 172.17.42.1/16 scope global docker0
valid_lft forever preferred_lft forever
inet6 fe80::5484:7aff:fefe:9799/64 scope link
valid_lft forever preferred_lft forever
Are you running the kubelet process? However, I am stumped as to why that curl routed over the loopback interface. Also that tcpdump doesn't show the start of the TCP handshake... did you run curl after you had started up the tcpdump? Or might it still have been starting and missed a bit...
Ok, I was able to run commands again by following the admin guide and running these commands again:
$ export KUBECONFIG=pwd/openshift.local.config/master/admin.kubeconfig
$ export CURL_CA_BUNDLE=pwd/openshift.local.config/master/ca.crt
$ sudo chmod +r pwd/openshift.local.config/master/admin.kubeconfig
when i run ./oc get nodes i see this:
./oc get nodes
NAME LABELS STATUS
localhost.localdomain kubernetes.io/hostname=localhost.localdomain Ready
rhel.node.2 kubernetes.io/hostname=rhel.node.2 Ready
I ran tcpdump again and made sure it was running well in advance of running the curl command:
curl -v ./oc get services -n default | grep registry | awk '{print $4":"$5}/v2/' | sed 's,/[^/]\+$,/v2/,'
Nothing is captured for tcpdump -i enp8s0 -nn host 172.30.13.89, I rerun the the dump with lo:
tcpdump -i enp8s0 -nn host 172.30.13.89
tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
listening on enp8s0, link-type EN10MB (Ethernet), capture size 65535 bytes
^C
0 packets captured
0 packets received by filter
0 packets dropped by kernel
[root@localhost openshift]# tcpdump -i lo -nn host 172.30.13.89
tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
listening on lo, link-type EN10MB (Ethernet), capture size 65535 bytes
16:40:38.012627 IP 172.30.13.89.5000 > 172.19.17.143.39371: Flags [S.], seq 335054175, ack 3631347388, win 43690, options [mss 65495,sackOK,TS val 1382790388 ecr 1382790388,nop,wscale 7], length 0
16:40:38.012720 IP 172.30.13.89.5000 > 172.19.17.143.39371: Flags [.], ack 85, win 342, options [nop,nop,TS val 1382790388 ecr 1382790388], length 0
16:40:44.024092 IP 172.30.13.89.5000 > 172.19.17.143.39371: Flags [R.], seq 1, ack 85, win 342, options [nop,nop,TS val 1382796400 ecr 1382790388], length 0
^C
3 packets captured
6 packets received by filter
0 packets dropped by kernel
[root@localhost openshift]#
When I start the master I use ./openshift start. Doesn't that start the kubelet process?
I guess I'll pull down the latest version of openshift and give this one more try from scratch.
Ok, thanks for persevering. Please keep me in the loop...
On Fri, Jul 24, 2015 at 5:23 PM, deanpeterson [email protected]
wrote:
I guess I'll pull down the latest version of openshift and give this one
more try from scratch.—
Reply to this email directly or view it on GitHub
https://github.com/openshift/origin/issues/3859#issuecomment-124729129.
Any luck?
I lost the battle at work. This was a bit too hard for me to get going. It looks like everyone wants to go with Amazon.
I'm sorry to hear that. Thanks for trying though. Of course, you can run
OpenShift on the Amazon compute resources (since you will still need to
solve many of the problems that OpenShift addresses even when you have EC
nodes).
BTW does it make sense to close the issue if you are not going to be able to pursue it further?
Thanks
On Tue, Jul 28, 2015 at 2:43 PM, deanpeterson [email protected]
wrote:
I lost the battle at work. This was a bit too hard for me to get going. It
looks like everyone wants to go with Amazon.—
Reply to this email directly or view it on GitHub
https://github.com/openshift/origin/issues/3859#issuecomment-125710762.
That is what I tried to tell them. However, I am the one that rolls up his sleeves and works through the problems. There is another architect that believes Amazon is the answer to every problem (at least as far as running Docker containers is involved). I am not as familiar with Amazon and neither is he. However he has no problem promising upper management that Amazon can do everything and anything automatically with little to no legwork. Do you have a list of what Amazon does not do that Openshift does do? It seems Amazon is coming out with a new service every day. It's hard for me to win an argument when it looks like they have so much velocity and I can't prove it out without money to try Amazon. Unlike Redhat where I can at least get a feel for things quite easily before asking management to write an RFP.
And when upper management is involved, politics always wins. One guy is promising them that Amazon will just run and the world will be rainbows and butterflies. I go into technical details based on experience and their eyes glaze over. Guess who wins.
He also just sent me this trying to prove that Redhat's business model is flawed:
http://www.cio.com/article/2944334/open-source-development/why-the-open-source-business-model-is-a-failure.html
How do I counter that?
@deanpeterson for every post like that, you could probably find one that questions how Amazon can sustain a business model where they make no profit. Obviously Red Hat continues to grow, grow market share, acquire new technologies and build new innovative solutions like OpenShift. In fact, we in Red Hat utilize AWS quite a bit.
So why OpenShift? Well being backed by open source, as we all know, removes vendor lock-in and leverages a community to deliver key capabilities, Red Hat is a strong contributor in many open source communities to deliver on the containerized application vision: Docker, Kubernetes and OpenShift origin. Red Hat, distributes these technologies in a tested and supported distribution and configurations so you can easily run the solution on premise or off. Perhaps it would be best to continue this discussion elsewhere, you can find my contact information on my GitHub page.
@deanpeterson, what do you want to do with this issue report? Keep it open, even though you aren't going to be able to help debug? I can't reproduce it on my set-up (but that doesn't mean it's not reproducible). I plan to look over the troubleshooting doc to see if I can extend it...
I am fine with closing this.
Thanks for all of the help investigating it. I'm sorry we couldn't work
out the problem. If you ever try again and hit this issue, please let me
know and we can pick up the debugging.
On Tue, Jul 28, 2015 at 4:35 PM, deanpeterson [email protected]
wrote:
I am fine with closing this.
—
Reply to this email directly or view it on GitHub
https://github.com/openshift/origin/issues/3859#issuecomment-125746307.
I was able to get everything working without adding the second physical node. I am not sure if that was the problem the first time around or not. I pulled down the latest version and was able to use the template file pointing to my private git repository on Kiln. I was able to create and deploy the ruby sample app successfully. You said I could run Openshit Origin on Amazon. I see little to no documentation on this. Is there a good place to look for that?
Try this:
https://github.com/openshift/openshift-ansible/blob/master/README_AWS.md
On Wed, Jul 29, 2015 at 4:00 PM, deanpeterson [email protected]
wrote:
I was able to get everything working without adding the second physical
node. I am not sure if that was the problem the first time around or not. I
pulled down the latest version and the template file pointing to my private
git repository on Kiln successfully. You said I could run Openshit Origin
on Amazon. I see little to no documentation on this. Is there a good place
to look for that?—
Reply to this email directly or view it on GitHub
https://github.com/openshift/origin/issues/3859#issuecomment-126079670.
Great thanks!
This will have been outstanding for over a month with no other comments. AFAIK, others haven't been able to re-create. Perhaps its time we close the bug?
I could reproduce the issue. My environment is multi-node using vagrant and export OPENSHIFT_SDN=redhat/openshift-ovs-multitenant
openshift v1.0.5-115-gdace507
kubernetes v1.1.0-alpha.0-1605-g44c91b1
Interesting, can you reliably reproduce it? Can you give me precise steps? Thanks.
i got same issue, and capture some logs from docker(/var/log/message centos), hope to a bit help for fix
Nov 11 19:09:11 iZu15atz2x8Z origin-node: I1111 19:09:11.300061 27195 helpers.go:96] Unable to get network stats from pid 9149: couldn
't read network stats: failure opening /proc/9149/net/dev: open /proc/9149/net/dev: no such file or directory
Nov 11 19:09:12 iZu15atz2x8Z origin-node: I1111 19:09:12.092821 27195 helpers.go:96] Unable to get network stats from pid 9076: couldn
't read network stats: failure opening /proc/9076/net/dev: open /proc/9076/net/dev: no such file or directory
Nov 11 19:09:12 iZu15atz2x8Z docker: time="2015-11-11T19:09:12.970236339+08:00" level=info msg="GET /images/json"
Nov 11 19:09:13 iZu15atz2x8Z origin-node: I1111 19:09:13.092504 27195 helpers.go:96] Unable to get network stats from pid 9076: couldn
't read network stats: failure opening /proc/9076/net/dev: open /proc/9076/net/dev: no such file or directory
Nov 11 19:09:13 iZu15atz2x8Z docker: time="2015-11-11T19:09:13.129131432+08:00" level=info msg="GET /containers/json?all=1"
Nov 11 19:09:13 iZu15atz2x8Z origin-node: I1111 19:09:13.299777 27195 helpers.go:96] Unable to get network stats from pid 9149: couldn
't read network stats: failure opening /proc/9149/net/dev: open /proc/9149/net/dev: no such file or directory
Nov 11 19:09:15 iZu15atz2x8Z origin-node: I1111 19:09:15.092616 27195 helpers.go:96] Unable to get network stats from pid 9076: couldn
't read network stats: failure opening /proc/9076/net/dev: open /proc/9076/net/dev: no such file or directory
Nov 11 19:09:15 iZu15atz2x8Z docker: time="2015-11-11T19:09:15.731629027+08:00" level=info msg="GET /version"
Nov 11 19:09:17 iZu15atz2x8Z docker: time="2015-11-11T19:09:17.224192076+08:00" level=info msg="GET /version"
Nov 11 19:09:17 iZu15atz2x8Z origin-node: I1111 19:09:17.300308 27195 helpers.go:96] Unable to get network stats from pid 9149: couldn
't read network stats: failure opening /proc/9149/net/dev: open /proc/9149/net/dev: no such file or directory
Unfortunately, those error messages are not really indicative of anything. Can you try to use the SDN troubleshooting guide (https://docs.openshift.org/latest/admin_guide/sdn_troubleshooting.html). If you can't work it out using that, please make sure you follow the instructions in "Further Help" and point me at the generated file.
I didn't meet the problem during these several rounds of testing. I'll still pay attention to this issue.
Same error :
févr. 09 05:03:23 ose3-node2.example.com origin-node[6053]: I0209 05:03:23.170817 6053 proxier.go:294] Adding new service "default/kubernetes:dns" at 172.30.0.1:53/UDP
févr. 09 05:03:23 ose3-node2.example.com origin-node[6053]: I0209 05:03:23.170869 6053 proxier.go:294] Adding new service "default/kubernetes:dns-tcp" at 172.30.0.1:53/TCP
févr. 09 05:35:34 ose3-node2.example.com origin-node[6053]: I0209 05:35:34.326273 6053 proxier.go:294] Adding new service "default/docker-registry:5000-tcp" at 172.30.49.223:5000/TCP
févr. 09 05:35:34 ose3-node2.example.com origin-node[6053]: I0209 05:35:34.585426 6053 kubelet.go:2169] SyncLoop (ADD, "api"): "docker-registry-1-deploy_default"
févr. 09 05:35:34 ose3-node2.example.com origin-node[6053]: I0209 05:35:34.629945 6053 manager.go:1720] Need to restart pod infra container for "docker-registry-1-deploy_default" because it is not found
févr. 09 05:35:34 ose3-node2.example.com origin-node[6053]: I0209 05:35:34.637097 6053 provider.go:91] Refreshing cache for provider: *credentialprovider.defaultDockerConfigProvider
févr. 09 05:35:34 ose3-node2.example.com origin-node[6053]: I0209 05:35:34.637405 6053 docker.go:159] Pulling image openshift/origin-pod:v1.1.1.1 without credentials
févr. 09 05:35:39 ose3-node2.example.com ovs-vsctl[9125]: ovs|00001|vsctl|INFO|Called as ovs-vsctl add-port br0 veth485ac71
févr. 09 05:35:39 ose3-node2.example.com origin-node[6053]: W0209 05:35:39.948578 6053 manager.go:1892] Hairpin setup failed for pod "docker-registry-1-deploy_default": open /sys/devices/virtual/net/veth485ac71/brport/hairpin_mode: no such file or directory
févr. 09 05:35:39 ose3-node2.example.com origin-node[6053]: I0209 05:35:39.949475 6053 docker.go:159] Pulling image openshift/origin-deployer:v1.1.1.1 without credentials
févr. 09 05:36:42 ose3-node2.example.com origin-node[6053]: I0209 05:36:42.071073 6053 helpers.go:96] Unable to get network stats from pid 9457: couldn't read network stats: failure opening /proc/9457/net/dev: open /proc/9457/net/dev: no such file or directory
févr. 09 05:36:42 ose3-node2.example.com ovs-vsctl[9503]: ovs|00001|vsctl|INFO|Called as ovs-vsctl --if-exists del-port veth485ac71
févr. 09 05:36:42 ose3-node2.example.com origin-node[6053]: I0209 05:36:42.285123 6053 manager.go:1419] Killing container "e05e5eea3d310b9f92363feaff1864100829b4323946f2d50b7af19c5bfc374f default/docker-registry-1-deploy" with 30 second grace period
févr. 09 05:36:43 ose3-node2.example.com origin-node[6053]: I0209 05:36:43.071070 6053 helpers.go:96] Unable to get network stats from pid 9457: couldn't read network stats: failure opening /proc/9457/net/dev: open /proc/9457/net/dev: no such file or directory
févr. 09 05:36:43 ose3-node2.example.com origin-node[6053]: I0209 05:36:43.195859 6053 manager.go:1451] Container "e05e5eea3d310b9f92363feaff1864100829b4323946f2d50b7af19c5bfc374f default/docker-registry-1-deploy" exited after 910.697662ms
févr. 09 05:36:43 ose3-node2.example.com origin-node[6053]: I0209 05:36:43.370212 6053 helpers.go:96] Unable to get network stats from pid 9096: couldn't read network stats: failure opening /proc/9096/net/dev: open /proc/9096/net/dev: no such file or directory
févr. 09 05:36:43 ose3-node2.example.com origin-node[6053]: I0209 05:36:43.824606 6053 container.go:430] Failed to update stats for container "/system.slice/rhel-dmesg.service": failed to parse memory.usage_in_bytes - read /sys/fs/cgroup/memory/system.slice/rhel-dmesg.service/memory.usage_in_bytes: no such device, continuing to push stats
févr. 09 05:36:44 ose3-node2.example.com origin-node[6053]: I0209 05:36:44.367634 6053 helpers.go:96] Unable to get network stats from pid 9096: couldn't read network stats: failure opening /proc/9096/net/dev: open /proc/9096/net/dev: no such file or directory
févr. 09 05:36:45 ose3-node2.example.com origin-node[6053]: I0209 05:36:45.062070 6053 helpers.go:96] Unable to get network stats from pid 9457: couldn't read network stats: failure opening /proc/9457/net/dev: open /proc/9457/net/dev: no such file or directory
févr. 09 05:36:46 ose3-node2.example.com origin-node[6053]: I0209 05:36:46.219197 6053 helpers.go:96] Unable to get network stats from pid 9096: couldn't read network stats: failure opening /proc/9096/net/dev: open /proc/9096/net/dev: no such file or directory
févr. 09 05:36:49 ose3-node2.example.com origin-node[6053]: I0209 05:36:49.062221 6053 helpers.go:96] Unable to get network stats from pid 9457: couldn't read network stats: failure opening /proc/9457/net/dev: open /proc/9457/net/dev: no such file or directory
févr. 09 05:36:50 ose3-node2.example.com origin-node[6053]: I0209 05:36:50.219607 6053 helpers.go:96] Unable to get network stats from pid 9096: couldn't read network stats: failure opening /proc/9096/net/dev: open /proc/9096/net/dev: no such file or directory
févr. 09 05:36:57 ose3-node2.example.com origin-node[6053]: I0209 05:36:57.069812 6053 helpers.go:96] Unable to get network stats from pid 9457: couldn't read network stats: failure opening /proc/9457/net/dev: open /proc/9457/net/dev: no such file or directory
févr. 09 05:36:58 ose3-node2.example.com origin-node[6053]: I0209 05:36:58.299551 6053 helpers.go:96] Unable to get network stats from pid 9096: couldn't read network stats: failure opening /proc/9096/net/dev: open /proc/9096/net/dev: no such file or directory
févr. 09 05:37:12 ose3-node2.example.com origin-node[6053]: I0209 05:37:12.062444 6053 helpers.go:96] Unable to get network stats from pid 9457: couldn't read network stats: failure opening /proc/9457/net/dev: open /proc/9457/net/dev: no such file or directory
févr. 09 05:37:13 ose3-node2.example.com origin-node[6053]: I0209 05:37:13.062876 6053 helpers.go:96] Unable to get network stats from pid 9457: couldn't read network stats: failure opening /proc/9457/net/dev: open /proc/9457/net/dev: no such file or directory
févr. 09 05:37:13 ose3-node2.example.com origin-node[6053]: I0209 05:37:13.257192 6053 helpers.go:96] Unable to get network stats from pid 9096: couldn't read network stats: failure opening /proc/9096/net/dev: open /proc/9096/net/dev: no such file or directory
févr. 09 05:37:14 ose3-node2.example.com origin-node[6053]: I0209 05:37:14.507035 6053 helpers.go:96] Unable to get network stats from pid 9096: couldn't read network stats: failure opening /proc/9096/net/dev: open /proc/9096/net/dev: no such file or directory
févr. 09 05:37:15 ose3-node2.example.com origin-node[6053]: I0209 05:37:15.062669 6053 helpers.go:96] Unable to get network stats from pid 9457: couldn't read network stats: failure opening /proc/9457/net/dev: open /proc/9457/net/dev: no such file or directory
févr. 09 05:37:16 ose3-node2.example.com origin-node[6053]: I0209 05:37:16.609264 6053 helpers.go:96] Unable to get network stats from pid 9096: couldn't read network stats: failure opening /proc/9096/net/dev: open /proc/9096/net/dev: no such file or directory
févr. 09 05:37:19 ose3-node2.example.com origin-node[6053]: I0209 05:37:19.062320 6053 helpers.go:96] Unable to get network stats from pid 9457: couldn't read network stats: failure opening /proc/9457/net/dev: open /proc/9457/net/dev: no such file or directory
févr. 09 05:37:20 ose3-node2.example.com origin-node[6053]: I0209 05:37:20.605642 6053 helpers.go:96] Unable to get network stats from pid 9096: couldn't read network stats: failure opening /proc/9096/net/dev: open /proc/9096/net/dev: no such file or directory
+1.
These network stats issues are coming from the monitoring of the pods... they are not harmful, and there's not much we can do about them.