Kind: network connectivity in travis with kind

Created on 26 May 2019  路  16Comments  路  Source: kubernetes-sigs/kind

Hi,

pods started in kind on travis seem to have lost network connectivity

https://travis-ci.com/lukasheinrich/examples/builds/113218001

sudo: required
language: go
go:
    - "1.12"

services:
  - docker

install: true

script: 
  - echo "Run your tests here"

before_install:
  # Download and install Kind and kubectl
  - GO111MODULE=on go get sigs.k8s.io/kind
  - kind create cluster --config kind-config.yaml
  - export KUBECONFIG="$(kind get kubeconfig-path --name="kind")"
  - curl -LO https://storage.googleapis.com/kubernetes-release/release/$(curl -s https://storage.googleapis.com/kubernetes-release/release/stable.txt)/bin/linux/amd64/kubectl
  - chmod +x ./kubectl
  - curl http://google.com
  - docker run --rm -it alpine sh -c 'apk add curl;curl http://google.com'
  - ./kubectl run -it  hello --image alpine -- sh -c 'apk add curl;curl http://google.com'

the last command gives

kubectl run --generator=deployment/apps.v1 is DEPRECATED and will be removed in a future version. Use kubectl run --generator=run-pod/v1 or kubectl create instead.
If you don't see a command prompt, try pressing enter.
ERROR: http://dl-cdn.alpinelinux.org/alpine/v3.9/main: temporary error (try again later)
WARNING: Ignoring APKINDEX.b89edf6e.tar.gz: No such file or directory
fetch http://dl-cdn.alpinelinux.org/alpine/v3.9/community/x86_64/APKINDEX.tar.gz
ERROR: http://dl-cdn.alpinelinux.org/alpine/v3.9/community: temporary error (try again later)
WARNING: Ignoring APKINDEX.737f7e01.tar.gz: No such file or directory
ERROR: unsatisfiable constraints:
  curl (missing):
    required by: world[curl]
sh: curl: not found

this seems to have been introduced with newer versions of kind (or travis) since I have successfully used this before. Docker seems to be unaffected (see docker run above).

Any ideas what this might be?

If this should be moved to an issue at https://github.com/kind-ci please let me know.

kinbug prioritcritical-urgent

All 16 comments

@lukasheinrich can you add a command to get the logs from kind and store them somewhere?
kind export logs

@aojea here's a gist with the logs

https://gist.github.com/lukasheinrich/9804850552818b6cfc53906b7a7dcce9

(updated .travis.yml here: https://github.com/lukasheinrich/examples/blob/master/.travis.yml)

I notice some lines mentioning that the cni plugin fails to initialize.

May 26 18:12:31 kind-worker containerd[46]: time="2019-05-26T18:12:31.786335741Z" level=error msg="Failed to load cni configuration" error="cni config load failed: no network config found in /etc/cni/net.d: cni plugin not initialized: failed to load cni config"
May 26 18:12:48 kind-worker containerd[46]: time="2019-05-26T18:12:48.262263971Z" level=error msg="Failed to load cni configuration" error="cni config load failed: no network config found in /etc/cni/net.d: cni plugin not initialized: failed to load cni config"
May 26 18:12:53 kind-worker containerd[46]: time="2019-05-26T18:12:53.433725012Z" level=error msg="Failed to load cni configuration" error="cni config load failed: no network config found in /etc/cni/net.d: cni plugin not initialized: failed to load cni config"
May 26 18:12:58 kind-worker containerd[46]: time="2019-05-26T18:12:58.434854145Z" level=error msg="Failed to load cni configuration" error="cni config load failed: no network config found in /etc/cni/net.d: cni plugin not initialized: failed to load cni config"

Maybe that鈥檚 a hint?

I麓m trying to reproduce it https://travis-ci.org/aojea/examples

The cni plugin eventually gets ready, those errors can be a hint but in this case they are not. If the cni plugin is not ready the nodes won麓t be in ready status.

I麓d like to check if is a problem that DNS doesn麓t resolve or that pods can麓t get out of the container.

--- EDIT ---
Seems that the pods are not able to resolve external dns names, https://travis-ci.org/aojea/examples/jobs/537557493

./kubectl exec -ti busybox -- nslookup kubernetes.default
Server:    10.96.0.10
Address 1: 10.96.0.10 kube-dns.kube-system.svc.cluster.local
Name:      kubernetes.default
Address 1: 10.96.0.1 kubernetes.default.svc.cluster.local
48.27s$ ./kubectl exec -ti busybox -- nslookup www.google.com
Server:    10.96.0.10
Address 1: 10.96.0.10 kube-dns.kube-system.svc.cluster.local
nslookup: can't resolve 'www.google.com'
command terminated with exit code 1

@aojea thanks for looking into it. Can I provide any more info?

thank you @lukasheinrich

I could advance on the issue, it's a problem resolving external domains, as you can see in the snippet below the pod can reach external ips but can't resolve the external domain


$ ./kubectl exec -ti busybox -- nslookup kubernetes.default
Server:    10.96.0.10
Address 1: 10.96.0.10 kube-dns.kube-system.svc.cluster.local
Name:      kubernetes.default
Address 1: 10.96.0.1 kubernetes.default.svc.cluster.local
before_install.16
48.24s$ ./kubectl exec -ti busybox -- nslookup www.google.com || true
Server:    10.96.0.10
Address 1: 10.96.0.10 kube-dns.kube-system.svc.cluster.local
nslookup: can't resolve 'www.google.com'
command terminated with exit code 1
before_install.17
0.24s$ ./kubectl exec -ti busybox -- ip a
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue qlen 1
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
    inet 127.0.0.1/8 scope host lo
       valid_lft forever preferred_lft forever
    inet6 ::1/128 scope host 
       valid_lft forever preferred_lft forever
3: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue 
    link/ether 02:aa:21:f3:44:6b brd ff:ff:ff:ff:ff:ff
    inet 10.244.1.2/24 scope global eth0
       valid_lft forever preferred_lft forever
    inet6 fe80::aa:21ff:fef3:446b/64 scope link 
       valid_lft forever preferred_lft forever
before_install.18
135.31s$ ./kubectl exec -ti busybox -- traceroute 8.8.8.8
traceroute to 8.8.8.8 (8.8.8.8), 30 hops max, 46 byte packets
 1  10.244.1.1 (10.244.1.1)  0.004 ms  0.003 ms  0.001 ms
 2  172.17.0.1 (172.17.0.1)  0.001 ms  0.006 ms  0.001 ms
 3  10.10.1.5 (10.10.1.5)  1.549 ms  10.10.1.8 (10.10.1.8)  1.691 ms  1.465 ms
 4  *  *  *
 5  *  *  *
 6  *  *  *
 7  *  *  *
 8  *  *  *
 9  *  *  *
10  *  *  *
11  8.8.8.8 (8.8.8.8)  2.615 ms  2.088 ms  2.273 ms
install
0.01s$ true

馃 seems that the issue can be related to the DNS server used in travis

0.09s$ docker exec kind-control-plane sh -c 'cat /etc/resolv.conf'
# Dynamic resolv.conf(5) file for glibc resolver(3) generated by resolvconf(8)
#     DO NOT EDIT THIS FILE BY HAND -- YOUR CHANGES WILL BE OVERWRITTEN
nameserver 169.254.169.254
search c.travis-ci-prod-2.internal google.internal
options timeout:5
options attempts:5

@tao12345666333 has hit a similar issue on his patch to allow kind to create their own networks

so as a short-term hack should

docker exec kind-control-plane sh -c 'echo 8.8.8.8 >> /etc/resolv.conf'

work? Does kind take over the resolv.conf of its host?

Thanks for tracking this down.

We had a long debate about the resolv.conf that you can follow on this thread https://github.com/kubernetes-sigs/kind/pull/484#issuecomment-489431439 .

I was doing some test and seems that travis doesn't permit to overwrite the resolv.conf, as a workaround we should try to see what of the documented setups work in travis

https://kubernetes.io/docs/concepts/services-networking/dns-pod-service/
https://kubernetes.io/docs/tasks/administer-cluster/dns-custom-nameservers/#coredns

Thanks, it's interesting because kind definitely used to work (incl network) on travis when I first set up the CI (February). Not sure at which point it broke

I've found the problem, the ipmasq-agent adds a rule to non masquerade the traffic to the link-local network breaking the connectivity between the cluster DNS and the host DNS, since the host DNS ip address belongs to that range.

-A IP-MASQ-AGENT -d 169.254.0.0/16 -m comment --comment "ip-masq-agent: local traffic is not subject to MASQUERADE" -j RETURN

 for p in $(./kubectl get pods --namespace=kube-system -l k8s-app=kube-dns -o name); do ./kubectl logs --namespace=kube-system $p; done
.:53
2019-05-27T11:19:36.353Z [INFO] CoreDNS-1.3.1
2019-05-27T11:19:36.353Z [INFO] linux/amd64, go1.11.4, 6b56a9c
CoreDNS-1.3.1
linux/amd64, go1.11.4, 6b56a9c
2019-05-27T11:19:36.353Z [INFO] plugin/reload: Running configuration MD5 = 599b9eb76b8c147408aed6a0bbe0f669
2019-05-27T11:19:42.355Z [ERROR] plugin/errors: 2 6195421070260342714.8357799154935943776. HINFO: read udp 10.244.0.3:52331->169.254.169.254:53: i/o timeout
2019-05-27T11:19:45.355Z [ERROR] plugin/errors: 2 6195421070260342714.8357799154935943776. HINFO: read udp 10.244.0.3:41512->169.254.169.254:53: i/o timeout
2019-05-27T11:19:46.354Z [ERROR] plugin/errors: 2 6195421070260342714.8357799154935943776. HINFO: read udp 10.244.0.3:49238->169.254.169.254:53: i/o timeout
2019-05-27T11:19:47.354Z [ERROR] plugin/errors: 2 6195421070260342714.8357799154935943776. HINFO: read udp 10.244.0.3:57052->169.254.169.254:53: i/o timeout
2019-05-27T11:19:50.355Z [ERROR] plugin/errors: 2 6195421070260342714.8357799154935943776. HINFO: read udp 10.244.0.3:49625->169.254.169.254:53: i/o timeout
2019-05-27T11:19:53.356Z [ERROR] plugin/errors: 2 6195421070260342714.8357799154935943776. HINFO: read udp 10.244.0.3:35345->169.254.169.254:53: i/o timeout
2019-05-27T11:19:56.357Z [ERROR] plugin/errors: 2 6195421070260342714.8357799154935943776. HINFO: read udp 10.244.0.3:59890->169.254.169.254:53: i/o timeout
2019-05-27T11:19:59.358Z [ERROR] plugin/errors: 2 6195421070260342714.8357799154935943776. HINFO: read udp 10.244.0.3:52423->169.254.169.254:53: i/o timeout
2019-05-27T11:20:01.214Z [ERROR] plugin/errors: 2 www.google.com. AAAA: read udp 10.244.0.3:37466->169.254.169.254:53: i/o timeout
2019-05-27T11:20:02.358Z [ERROR] plugin/errors: 2 6195421070260342714.8357799154935943776. HINFO: read udp 10.244.0.3:40731->169.254.169.254:53: i/o timeout
2019-05-27T11:20:05.217Z [ERROR] plugin/errors: 2 www.google.com.google.internal. AAAA: read udp 10.244.0.3:47136->169.254.169.254:53: i/o timeout
2019-05-27T11:20:05.359Z [ERROR] plugin/errors: 2 6195421070260342714.8357799154935943776. HINFO: read udp 10.244.0.3:50467->169.254.169.254:53: i/o timeout
2019-05-27T11:20:07.217Z [ERROR] plugin/errors: 2 www.google.com. AAAA: read udp 10.244.0.3:35281->169.254.169.254:53: i/o timeout
2019-05-27T11:20:11.220Z [ERROR] plugin/errors: 2 www.google.com.google.internal. AAAA: read udp 10.244.0.3:44330->169.254.169.254:53: i/o timeout
2019-05-27T11:20:13.222Z [ERROR] plugin/errors: 2 www.google.com. AAAA: read udp 10.244.0.3:58693->169.254.169.254:53: i/o timeout
2019-05-27T11:20:17.225Z [ERROR] plugin/errors: 2 www.google.com.google.internal. AAAA: read udp 10.244.0.3:38230->169.254.169.254:53: i/o timeout
2019-05-27T11:20:23.229Z [ERROR] plugin/errors: 2 www.google.com.google.internal. AAAA: read udp 10.244.0.3:55421->169.254.169.254:53: i/o timeout
2019-05-27T11:20:27.232Z [ERROR] plugin/errors: 2 www.google.com.c.eco-emissary-99515.internal. A: read udp 10.244.0.3:43721->169.254.169.254:53: i/o timeout
2019-05-27T11:20:29.233Z [ERROR] plugin/errors: 2 www.google.com.google.internal. A: read udp 10.244.0.3:50274->169.254.169.254:53: i/o timeout
2019-05-27T11:20:37.237Z [ERROR] plugin/errors: 2 www.google.com. A: read udp 10.244.0.3:42297->169.254.169.254:53: i/o timeout
2019-05-27T11:20:39.239Z [ERROR] plugin/errors: 2 www.google.com.c.eco-emissary-99515.internal. A: read udp 10.244.0.3:58562->169.254.169.254:53: i/o timeout
2019-05-27T11:20:41.241Z [ERROR] plugin/errors: 2 www.google.com.google.internal. A: read udp 10.244.0.3:38125->169.254.169.254:53: i/o timeout
2019-05-27T11:20:43.242Z [ERROR] plugin/errors: 2 www.google.com. A: read udp 10.244.0.3:47642->169.254.169.254:53: i/o timeout
2019-05-27T11:20:45.244Z [ERROR] plugin/errors: 2 www.google.com.c.eco-emissary-99515.internal. A: read udp 10.244.0.3:47169->169.254.169.254:53: i/o timeout
2019-05-27T11:20:47.245Z [ERROR] plugin/errors: 2 www.google.com.google.internal. A: read udp 10.244.0.3:51052->169.254.169.254:53: i/o timeout
.:53

/assign @aojea

/assign
/assign @aojea

@lukasheinrich the patch was merged but until the new node images are not published you have to create the node image yourself

  - GO111MODULE=on go get sigs.k8s.io/kind
  - kind build node-image --type apt
  - kind create cluster --config kind-config.yaml --image kindest/node:latest

Please provide feedback

Hi @aojea

Thanks! I tried adding the image build step to the travis ci, but it seems to not yet be enough. Am I missing a step?

https://github.com/lukasheinrich/examples/blob/master/.travis.yml#L24

@lukasheinrich use

GO111MODULE=on go get -u -v sigs.k8s.io/kind@master

if that works for you better to pin the version to the specific commit until there is a new release, in this case

GO111MODULE=on go get -u -v sigs.k8s.io/kind@ab59991e3d0c45866efae345b170a47e193e9cdf

Thanks! this seems to have worked :)

Was this page helpful?
0 / 5 - 0 ratings