What happened:
I ran the following command in a Ubuntu 18.04 host.
kind create cluster --image kindest/node:v1.14.10
coredns pods kept crash looping
$ kubectl get pods
NAME READY STATUS RESTARTS AGE
coredns-6dcc67dcbc-brtcn 0/1 CrashLoopBackOff 5 5m37s
coredns-6dcc67dcbc-t878n 0/1 CrashLoopBackOff 5 5m37s
etcd-kind-control-plane 1/1 Running 0 4m43s
kindnet-jhjck 1/1 Running 0 5m36s
kube-apiserver-kind-control-plane 1/1 Running 0 4m45s
kube-controller-manager-kind-control-plane 1/1 Running 0 4m45s
kube-proxy-5wtrf 1/1 Running 0 5m36s
kube-scheduler-kind-control-plane 1/1 Running 0 4m38s
$ kubectl logs -f coredns-6dcc67dcbc-brtcn
.:53
2020-05-31T07:00:56.606Z [INFO] CoreDNS-1.3.1
2020-05-31T07:00:56.606Z [INFO] linux/amd64, go1.11.4, 6b56a9c
CoreDNS-1.3.1
linux/amd64, go1.11.4, 6b56a9c
2020-05-31T07:00:56.606Z [INFO] plugin/reload: Running configuration MD5 = 599b9eb76b8c147408aed6a0bbe0f669
2020-05-31T07:00:57.607Z [FATAL] plugin/loop: Loop (127.0.0.1:54705 -> :53) detected for zone ".", see https://coredns.io/plugins/loop#troubleshooting. Query: "HINFO 7640624352083188946.3802943032990386465."
$ kubectl logs -f coredns-6dcc67dcbc-t878n
.:53
2020-05-31T07:00:59.668Z [INFO] CoreDNS-1.3.1
2020-05-31T07:00:59.668Z [INFO] linux/amd64, go1.11.4, 6b56a9c
CoreDNS-1.3.1
linux/amd64, go1.11.4, 6b56a9c
2020-05-31T07:00:59.668Z [INFO] plugin/reload: Running configuration MD5 = 599b9eb76b8c147408aed6a0bbe0f669
2020-05-31T07:00:59.669Z [FATAL] plugin/loop: Loop (127.0.0.1:50557 -> :53) detected for zone ".", see https://coredns.io/plugins/loop#troubleshooting. Query: "HINFO 2175379813594798799.1669182364528889982."
What you expected to happen:
I expected the coredns pods to not crash loop
How to reproduce it (as minimally and precisely as possible):
Anything else we need to know?:
Environment:
kind version): kind v0.8.1 go1.14.2 linux/amd64kubectl version):Client Version: v1.18.3
Server Version: v1.14.10
docker info):Client:
Debug Mode: false
Plugins:
app: Docker Application (Docker Inc., v0.8.0)
buildx: Build with BuildKit (Docker Inc., v0.3.1-tp-docker)
Server:
Containers: 2
Running: 2
Paused: 0
Stopped: 0
Images: 113
Server Version: 19.03.10
Storage Driver: overlay2
Backing Filesystem: extfs
Supports d_type: true
Native Overlay Diff: true
Logging Driver: json-file
Cgroup Driver: cgroupfs
Plugins:
Volume: local
Network: bridge host ipvlan macvlan null overlay
Log: awslogs fluentd gcplogs gelf journald json-file local logentries splunk syslog
Swarm: inactive
Runtimes: runc
Default Runtime: runc
Init Binary: docker-init
containerd version: 7ad184331fa3e55e52b890ea95e65ba581ae3429
runc version: dc9208a3303feef5b3839f4323d9beb36df0a9dd
init version: fec3683
Security Options:
apparmor
seccomp
Profile: default
Kernel Version: 4.15.0-101-generic
Operating System: Ubuntu 18.04.4 LTS
OSType: linux
Architecture: x86_64
CPUs: 8
Total Memory: 15.38GiB
Name: beast
ID: H747:N42V:6KTW:VQWA:RUJL:6U73:XTFK:VSPF:GWTL:KHMA:POO7:O2AN
Docker Root Dir: /var/lib/docker
Debug Mode: false
Username: tigerworks
Registry: https://index.docker.io/v1/
Labels:
Experimental: false
Insecure Registries:
127.0.0.0/8
Live Restore Enabled: false
WARNING: No swap limit support
/etc/os-release):NAME="Ubuntu"
VERSION="18.04.4 LTS (Bionic Beaver)"
ID=ubuntu
ID_LIKE=debian
PRETTY_NAME="Ubuntu 18.04.4 LTS"
VERSION_ID="18.04"
HOME_URL="https://www.ubuntu.com/"
SUPPORT_URL="https://help.ubuntu.com/"
BUG_REPORT_URL="https://bugs.launchpad.net/ubuntu/"
PRIVACY_POLICY_URL="https://www.ubuntu.com/legal/terms-and-policies/privacy-policy"
VERSION_CODENAME=bionic
UBUNTU_CODENAME=bionic
This was brought up the other day in slack,
[FATAL] plugin/loop: Loop (127.0.0.1:54705 -> :53
the problem there was the cluster had a localhost address as nameserver in /etc/resolv.conf,
can you paste the resolv.conf in one of the kind nodes docker exec -it kind-control-plane cat /etc/resolv.conf?
are you using the current 1.14.10 images from the matching release notes?
the old 1.14.10 image if cached will not work, the @sha256 digest should be included
I don't see this failure with the new image (and we shouldn't see it), so I strongly suspect that on this host using kindest/node:v1.14.10 is leading to using the wrong, old image.
In that case, this should also be possible to work around by removing that image / doing a fresh pull.
[bentheelder@bentheelder-macbookpro:~/go/src/sigs.k8s.io/kind·2020-05-31T00:18:09-0700·master@9e8816b0]
$ kind create cluster --image kindest/node:v1.14.10@sha256:6cd43ff41ae9f02bb46c8f455d5323819aec858b99534a290517ebc181b443c6
Creating cluster "kind" ...
✓ Ensuring node image (kindest/node:v1.14.10) 🖼
✓ Preparing nodes 📦
✓ Writing configuration 📜
✓ Starting control-plane 🕹️
✓ Installing CNI 🔌
✓ Installing StorageClass 💾
Set kubectl context to "kind-kind"
You can now use your cluster with:
kubectl cluster-info --context kind-kind
Have a question, bug, or feature request? Let us know! https://kind.sigs.k8s.io/#community 🙂
[bentheelder@bentheelder-macbookpro:~/go/src/sigs.k8s.io/kind·2020-05-31T10:34:34-0700·master@9e8816b0]
$ kubectl get po -A
NAMESPACE NAME READY STATUS RESTARTS AGE
kube-system coredns-6dcc67dcbc-2fkdn 1/1 Running 0 7m12s
kube-system coredns-6dcc67dcbc-7lt8d 1/1 Running 0 7m12s
kube-system etcd-kind-control-plane 1/1 Running 0 6m16s
kube-system kindnet-2lvqt 1/1 Running 0 7m12s
kube-system kube-apiserver-kind-control-plane 1/1 Running 0 6m28s
kube-system kube-controller-manager-kind-control-plane 1/1 Running 0 6m12s
kube-system kube-proxy-dv6pd 1/1 Running 0 7m12s
kube-system kube-scheduler-kind-control-plane 1/1 Running 0 6m9s
local-path-storage local-path-provisioner-56fcf95c58-clvvj 1/1 Running 0 7m12s
[bentheelder@bentheelder-macbookpro:~/go/src/sigs.k8s.io/kind·2020-05-31T10:41:59-0700·master@9e8816b0]
$ kind --version
kind version 0.9.0-alpha+14137443600a66
$ docker exec -it kind-control-plane cat /etc/resolv.conf
nameserver 127.0.0.11
nameserver 2001:558:feed::1
nameserver 2001:558:feed::2
options ndots:0
@BenTheElder , you are right. After updating the local kindest/node:v1.14.10 image this issue was gone.
I think so .
$ docker pull kindest/node:v1.14.10
v1.14.10: Pulling from kindest/node
cbdc015de259: Already exists
35f8707632bb: Already exists
6b786d087f83: Already exists
31fb3cd05fdd: Already exists
859fef2ca79d: Already exists
ec47119c4869: Already exists
2319734322b0: Pull complete
Digest: sha256:6cd43ff41ae9f02bb46c8f455d5323819aec858b99534a290517ebc181b443c6
Status: Downloaded newer image for kindest/node:v1.14.10
docker.io/kindest/node:v1.14.10
/close
seems there was an issue due to using legacy node images
thanks for reporting
😄
@aojea: Closing this issue.
In response to this:
/close
seems there was an issue due to using legacy node images
thanks for reporting
😄
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.
Most helpful comment
@BenTheElder , you are right. After updating the local
kindest/node:v1.14.10image this issue was gone.