After successfully running the ansible playbook creating a 5 node cluster I'm having trouble getting a pod to start on a node.
kubectl describe pod test-345661524-ucrbu
Events:
FirstSeen LastSeen Count From SubobjectPath Type Reason Message
--------- -------- ----- ---- ------------- -------- ------ -------
1m 1m 1 {default-scheduler } Normal Scheduled Successfully assigned test-345661524-ucrbu to kubernetes-node-3
58s 58s 1 {kubelet kubernetes-node-3} Warning FailedSync Error syncing pod, skipping: failed to "StartContainer" for "POD" with RunContainerError: "runContainer: API error (404): failed to create endpoint k8s_POD.6059dfa2_test-345661524-ucrbu_default_54b24685-07d0-11e6-9aeb-fa163e8f1e63_b4b2c4db on network bridge: adding interface vethc4809b2 to bridge docker0 failed: could not find bridge docker0: route ip+net: no such network interface\n"
41s 41s 1 {kubelet kubernetes-node-3} Warning FailedSync Error syncing pod, skipping: failed to "StartContainer" for "POD" with RunContainerError: "runContainer: API error (404): failed to create endpoint k8s_POD.6059dfa2_test-345661524-ucrbu_default_54b24685-07d0-11e6-9aeb-fa163e8f1e63_509eaf7e on network bridge: adding interface vethdbd61a8 to bridge docker0 failed: could not find bridge docker0: route ip+net: no such network interface\n"
In the kargo docs I can see there's a section of things to check with flannel but I think they all look correct..
cat /run/flannel/subnet.env
FLANNEL_NETWORK=10.233.64.0/18
FLANNEL_SUBNET=10.233.116.1/24
FLANNEL_MTU=1350
FLANNEL_IPMASQ=false
ip a show dev flannel.1
6: flannel.1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1350 qdisc noqueue state UNKNOWN
link/ether 1e:d5:82:46:d5:af brd ff:ff:ff:ff:ff:ff
inet 10.233.116.0/18 scope global flannel.1
valid_lft forever preferred_lft forever
inet6 fe80::1cd5:82ff:fe46:d5af/64 scope link
valid_lft forever preferred_lft forever
ps aux | grep docker
root 3775 0.0 0.0 112644 952 pts/0 S+ 14:49 0:00 grep --color=auto docker
root 24720 0.3 0.3 1851316 60960 ? Ssl 13:15 0:19 /usr/bin/docker daemon --bip=10.233.116.1/24 --mtu=1350
Is there anything else I should be checking checking? I'm not from the infra side so I expect I'm missing something.
Using CenOS 7.2 and docker 1.11.0
docker version
Client:
Version: 1.11.0
API version: 1.23
Go version: go1.5.4
Git commit: 4dc5990
Built: Wed Apr 13 18:40:36 2016
OS/Arch: linux/amd64
Server:
Version: 1.11.0
API version: 1.23
Go version: go1.5.4
Git commit: 4dc5990
Built: Wed Apr 13 18:40:36 2016
OS/Arch: linux/amd64
By not knowing what I was doing and a fair bit of googling I've got the pod running!
With the help of this blog https://www.digitalocean.com/community/tutorials/how-to-install-and-configure-kubernetes-on-top-of-a-coreos-cluster
There's a section that talks about the docker0 interface and flannel, I added the following two lines to my docker.service on the node that the test pod was trying to be run and restarted the docker service.. the pod was started! Not sure this is all correct but figure the fact its started is a good sign.
ExecStartPre=-/usr/bin/ip link set dev docker0 down
ExecStartPre=-/usr/sbin/brctl delbr docker0
Ok so the steps I took, on each worker node..
vi /usr/lib/systemd/system/docker.service
and those two lines so it looks like this..
[Unit]
Description=Docker Application Container Engine
Documentation=http://docs.docker.com
After=network.target
Wants=docker-storage-setup.service
[Service]
Type=notify
EnvironmentFile=-/etc/default/docker
ExecStartPre=-/usr/bin/ip link set dev docker0 down
ExecStartPre=-/usr/sbin/brctl delbr docker0
Environment=GOTRACEBACK=crash
ExecStart=/usr/bin/docker daemon \
$OPTIONS \
$DOCKER_STORAGE_OPTIONS \
$DOCKER_NETWORK_OPTIONS \
$INSECURE_REGISTRY
LimitNOFILE=1048576
LimitNPROC=1048576
LimitCORE=infinity
MountFlags=slave
TimeoutStartSec=1min
[Install]
WantedBy=multi-user.target
systemctl daemon-reload
systemctl restart docker
Not sure if this is something that could be added to kargo if it turns out an acceptable solution.
Maybe the unit section needs an After=flannel.service too
Actually when doing the same on the other nodes I realised I dont need those two lines at all, I just need to restart the docker service and the docker0 is created
Hi @rawlingsj , i'm in holidays and i won't be able to tests next week.
I'll check that as soon as i'm back. I hope you'll find a way in the meantime.
Hi @Smana no problem at all! It all seems fine after I restarted the docker service on each node so all is good. I'll leave this open incase there's something you want to do when you're back.
Thanks!
I'm just facing the problem too, i found the solution. pr's coming
Most helpful comment
By not knowing what I was doing and a fair bit of googling I've got the pod running!
With the help of this blog https://www.digitalocean.com/community/tutorials/how-to-install-and-configure-kubernetes-on-top-of-a-coreos-cluster
There's a section that talks about the
docker0interface and flannel, I added the following two lines to my docker.service on the node that the test pod was trying to be run and restarted the docker service.. the pod was started! Not sure this is all correct but figure the fact its started is a good sign.Ok so the steps I took, on each worker node..
and those two lines so it looks like this..
Not sure if this is something that could be added to kargo if it turns out an acceptable solution.