Kind: coredns pods crash loop in 1.14.10 images

Created on 31 May 2020 · 8Comments · Source: kubernetes-sigs/kind

What happened:

I ran the following command in a Ubuntu 18.04 host.

kind create cluster --image kindest/node:v1.14.10

coredns pods kept crash looping

$ kubectl get pods
NAME                                         READY   STATUS             RESTARTS   AGE
coredns-6dcc67dcbc-brtcn                     0/1     CrashLoopBackOff   5          5m37s
coredns-6dcc67dcbc-t878n                     0/1     CrashLoopBackOff   5          5m37s
etcd-kind-control-plane                      1/1     Running            0          4m43s
kindnet-jhjck                                1/1     Running            0          5m36s
kube-apiserver-kind-control-plane            1/1     Running            0          4m45s
kube-controller-manager-kind-control-plane   1/1     Running            0          4m45s
kube-proxy-5wtrf                             1/1     Running            0          5m36s
kube-scheduler-kind-control-plane            1/1     Running            0          4m38s

$ kubectl logs -f coredns-6dcc67dcbc-brtcn
.:53
2020-05-31T07:00:56.606Z [INFO] CoreDNS-1.3.1
2020-05-31T07:00:56.606Z [INFO] linux/amd64, go1.11.4, 6b56a9c
CoreDNS-1.3.1
linux/amd64, go1.11.4, 6b56a9c
2020-05-31T07:00:56.606Z [INFO] plugin/reload: Running configuration MD5 = 599b9eb76b8c147408aed6a0bbe0f669
2020-05-31T07:00:57.607Z [FATAL] plugin/loop: Loop (127.0.0.1:54705 -> :53) detected for zone ".", see https://coredns.io/plugins/loop#troubleshooting. Query: "HINFO 7640624352083188946.3802943032990386465."

$ kubectl logs -f coredns-6dcc67dcbc-t878n
.:53
2020-05-31T07:00:59.668Z [INFO] CoreDNS-1.3.1
2020-05-31T07:00:59.668Z [INFO] linux/amd64, go1.11.4, 6b56a9c
CoreDNS-1.3.1
linux/amd64, go1.11.4, 6b56a9c
2020-05-31T07:00:59.668Z [INFO] plugin/reload: Running configuration MD5 = 599b9eb76b8c147408aed6a0bbe0f669
2020-05-31T07:00:59.669Z [FATAL] plugin/loop: Loop (127.0.0.1:50557 -> :53) detected for zone ".", see https://coredns.io/plugins/loop#troubleshooting. Query: "HINFO 2175379813594798799.1669182364528889982."

What you expected to happen:

I expected the coredns pods to not crash loop

How to reproduce it (as minimally and precisely as possible):

Anything else we need to know?:

Environment:

kind version: (use kind version): kind v0.8.1 go1.14.2 linux/amd64
Kubernetes version: (use kubectl version):

Client Version: v1.18.3
Server Version: v1.14.10

Docker version: (use docker info):

Client:
 Debug Mode: false
 Plugins:
  app: Docker Application (Docker Inc., v0.8.0)
  buildx: Build with BuildKit (Docker Inc., v0.3.1-tp-docker)

Server:
 Containers: 2
  Running: 2
  Paused: 0
  Stopped: 0
 Images: 113
 Server Version: 19.03.10
 Storage Driver: overlay2
  Backing Filesystem: extfs
  Supports d_type: true
  Native Overlay Diff: true
 Logging Driver: json-file
 Cgroup Driver: cgroupfs
 Plugins:
  Volume: local
  Network: bridge host ipvlan macvlan null overlay
  Log: awslogs fluentd gcplogs gelf journald json-file local logentries splunk syslog
 Swarm: inactive
 Runtimes: runc
 Default Runtime: runc
 Init Binary: docker-init
 containerd version: 7ad184331fa3e55e52b890ea95e65ba581ae3429
 runc version: dc9208a3303feef5b3839f4323d9beb36df0a9dd
 init version: fec3683
 Security Options:
  apparmor
  seccomp
   Profile: default
 Kernel Version: 4.15.0-101-generic
 Operating System: Ubuntu 18.04.4 LTS
 OSType: linux
 Architecture: x86_64
 CPUs: 8
 Total Memory: 15.38GiB
 Name: beast
 ID: H747:N42V:6KTW:VQWA:RUJL:6U73:XTFK:VSPF:GWTL:KHMA:POO7:O2AN
 Docker Root Dir: /var/lib/docker
 Debug Mode: false
 Username: tigerworks
 Registry: https://index.docker.io/v1/
 Labels:
 Experimental: false
 Insecure Registries:
  127.0.0.0/8
 Live Restore Enabled: false

WARNING: No swap limit support

OS (e.g. from /etc/os-release):

NAME="Ubuntu"
VERSION="18.04.4 LTS (Bionic Beaver)"
ID=ubuntu
ID_LIKE=debian
PRETTY_NAME="Ubuntu 18.04.4 LTS"
VERSION_ID="18.04"
HOME_URL="https://www.ubuntu.com/"
SUPPORT_URL="https://help.ubuntu.com/"
BUG_REPORT_URL="https://bugs.launchpad.net/ubuntu/"
PRIVACY_POLICY_URL="https://www.ubuntu.com/legal/terms-and-policies/privacy-policy"
VERSION_CODENAME=bionic
UBUNTU_CODENAME=bionic

kinbug lifecyclactive triagneeds-information

Source

tamalsaha

Most helpful comment

@BenTheElder , you are right. After updating the local kindest/node:v1.14.10 image this issue was gone.

tamalsaha on 31 May 2020

❤1 👍1

All 8 comments

This was brought up the other day in slack,

[FATAL] plugin/loop: Loop (127.0.0.1:54705 -> :53

the problem there was the cluster had a localhost address as nameserver in /etc/resolv.conf,
can you paste the resolv.conf in one of the kind nodes docker exec -it kind-control-plane cat /etc/resolv.conf?

aojea on 31 May 2020

are you using the current 1.14.10 images from the matching release notes?
the old 1.14.10 image if cached will not work, the @sha256 digest should be included

BenTheElder on 31 May 2020

I don't see this failure with the new image (and we shouldn't see it), so I strongly suspect that on this host using kindest/node:v1.14.10 is leading to using the wrong, old image.

In that case, this should also be possible to work around by removing that image / doing a fresh pull.

[bentheelder@bentheelder-macbookpro:~/go/src/sigs.k8s.io/kind·2020-05-31T00:18:09-0700·master@9e8816b0]
$ kind create cluster --image kindest/node:v1.14.10@sha256:6cd43ff41ae9f02bb46c8f455d5323819aec858b99534a290517ebc181b443c6
Creating cluster "kind" ...
 ✓ Ensuring node image (kindest/node:v1.14.10) 🖼 
 ✓ Preparing nodes 📦  
 ✓ Writing configuration 📜 
 ✓ Starting control-plane 🕹️ 
 ✓ Installing CNI 🔌 
 ✓ Installing StorageClass 💾 
Set kubectl context to "kind-kind"
You can now use your cluster with:

kubectl cluster-info --context kind-kind

Have a question, bug, or feature request? Let us know! https://kind.sigs.k8s.io/#community 🙂

[bentheelder@bentheelder-macbookpro:~/go/src/sigs.k8s.io/kind·2020-05-31T10:34:34-0700·master@9e8816b0]
$ kubectl get po -A
NAMESPACE            NAME                                         READY   STATUS    RESTARTS   AGE
kube-system          coredns-6dcc67dcbc-2fkdn                     1/1     Running   0          7m12s
kube-system          coredns-6dcc67dcbc-7lt8d                     1/1     Running   0          7m12s
kube-system          etcd-kind-control-plane                      1/1     Running   0          6m16s
kube-system          kindnet-2lvqt                                1/1     Running   0          7m12s
kube-system          kube-apiserver-kind-control-plane            1/1     Running   0          6m28s
kube-system          kube-controller-manager-kind-control-plane   1/1     Running   0          6m12s
kube-system          kube-proxy-dv6pd                             1/1     Running   0          7m12s
kube-system          kube-scheduler-kind-control-plane            1/1     Running   0          6m9s
local-path-storage   local-path-provisioner-56fcf95c58-clvvj      1/1     Running   0          7m12s

[bentheelder@bentheelder-macbookpro:~/go/src/sigs.k8s.io/kind·2020-05-31T10:41:59-0700·master@9e8816b0]
$ kind --version
kind version 0.9.0-alpha+14137443600a66

BenTheElder on 31 May 2020

$ docker exec -it kind-control-plane cat /etc/resolv.conf

nameserver 127.0.0.11
nameserver 2001:558:feed::1
nameserver 2001:558:feed::2
options ndots:0

tamalsaha on 31 May 2020

@BenTheElder , you are right. After updating the local kindest/node:v1.14.10 image this issue was gone.

tamalsaha on 31 May 2020

❤1 👍1

I think so .

$ docker pull kindest/node:v1.14.10

v1.14.10: Pulling from kindest/node
cbdc015de259: Already exists 
35f8707632bb: Already exists 
6b786d087f83: Already exists 
31fb3cd05fdd: Already exists 
859fef2ca79d: Already exists 
ec47119c4869: Already exists 
2319734322b0: Pull complete 
Digest: sha256:6cd43ff41ae9f02bb46c8f455d5323819aec858b99534a290517ebc181b443c6
Status: Downloaded newer image for kindest/node:v1.14.10
docker.io/kindest/node:v1.14.10

tamalsaha on 31 May 2020

/close

seems there was an issue due to using legacy node images
thanks for reporting
😄

aojea on 31 May 2020

@aojea: Closing this issue.

In response to this:

/close

seems there was an issue due to using legacy node images
thanks for reporting
😄

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.