Minikube: Hung at "Starting cluster components...": etcd CrashLoopBackOff / DNS misconfiguration

Created on 9 Jul 2018 · 5Comments · Source: kubernetes/minikube

This is a BUG REPORT... with a suggested feature request if my diagnosis is accurate.

-minikube version: v0.28.0
-Ubuntu 16.04.4 LTS
-virtualbox (5.1.34_Ubuntur121010)
-minikube-v0.28.0.iso

What happened:
Hangs forever at Starting cluster components...

minikube ssh followed by docker ps -a yields two pieces of interesting information:

the etcd container keeps exiting 1 with a final log line of 2018-07-09 03:32:11.847764 C | etcdmain: listen tcp <ip address of my terrible ISP's ad server for bad dns>:2380: bind: cannot assign requested address
the kubeapi container keeps exiting 137 with hooks.go:188] PostStartHook "ca-registration" failed: unable to initialize client CA configmap: timed out waiting for the condition
`

What you expected to happen:
The minikube cluster starts... because etcd is able to start.

How to reproduce it (as minimally and precisely as possible):
minikube start and wait.

Output of minikube logs (if applicable):
repeating blocks of:

Jul 09 03:39:17 minikube kubelet[2703]: I0709 03:39:17.911778    2703 kuberuntime_manager.go:513] Container {Name:etcd Image:k8s.gcr.io/etcd-amd64:3.1.12 Command:[etcd --advertise-client-urls=https://127.0.0.1:2379 --data-dir=/data/minikube --trusted-ca-file=/var/lib/localkube/certs/etcd/ca.crt --peer-key-file=/var/lib/localkube/certs/etcd/peer.key --peer-cert-file=/var/lib/localkube/certs/etcd/peer.crt --peer-trusted-ca-file=/var/lib/localkube/certs/etcd/ca.crt --listen-client-urls=https://127.0.0.1:2379 --client-cert-auth=true --peer-client-cert-auth=true --cert-file=/var/lib/localkube/certs/etcd/server.crt --key-file=/var/lib/localkube/certs/etcd/server.key] Args:[] WorkingDir: Ports:[] EnvFrom:[] Env:[] Resources:{Limits:map[] Requests:map[]} VolumeMounts:[{Name:etcd-data ReadOnly:false MountPath:/data/minikube SubPath: MountPropagation:<nil>} {Name:etcd-certs ReadOnly:false MountPath:/var/lib/localkube/certs//etcd SubPath: MountPropagation:<nil>}] VolumeDevices:[] LivenessProbe:&Probe{Handler:Handler{Exec:&ExecAction{Command:[/bin/sh -ec ETCDCTL_API=3 etcdctl --endpoints=127.0.0.1:2379 --cacert=/var/lib/localkube/certs//etcd/ca.crt --cert=/var/lib/localkube/certs//etcd/healthcheck-client.crt --key=/var/lib/localkube/certs//etcd/healthcheck-client.key get foo],},HTTPGet:nil,TCPSocket:nil,},InitialDelaySeconds:15,TimeoutSeconds:15,PeriodSeconds:10,SuccessThreshold:1,FailureThreshold:8,} ReadinessProbe:nil Lifecycle:nil TerminationMessagePath:/dev/termination-log TerminationMessagePolicy:File ImagePullPolicy:IfNotPresent SecurityContext:nil Stdin:false StdinOnce:false TTY:false} is dead, but RestartPolicy says that we should restart it.
Jul 09 03:39:17 minikube kubelet[2703]: I0709 03:39:17.912117    2703 kuberuntime_manager.go:757] checking backoff for container "etcd" in pod "etcd-minikube_kube-system(5496bc5148cedeede1465cfa1b83a851)"
Jul 09 03:39:17 minikube kubelet[2703]: I0709 03:39:17.912493    2703 kuberuntime_manager.go:767] Back-off 5m0s restarting failed container=etcd pod=etcd-minikube_kube-system(5496bc5148cedeede1465cfa1b83a851)
Jul 09 03:39:17 minikube kubelet[2703]: E0709 03:39:17.912591    2703 pod_workers.go:186] Error syncing pod 5496bc5148cedeede1465cfa1b83a851 ("etcd-minikube_kube-system(5496bc5148cedeede1465cfa1b83a851)"), skipping: failed to "StartContainer" for "etcd" with CrashLoopBackOff: "Back-off 5m0s restarting failed container=etcd pod=etcd-minikube_kube-system(5496bc5148cedeede1465cfa1b83a851)"

Anything else do we need to know:
There are many minikube hangs issues; a few suggest bootstrapper=localkube, which does indeed solve my problem. This leads me to believe there's something specific to kubeadm.

Chasing down reports of etcd problems leads to https://github.com/kubernetes/kubernetes/issues/57709#issue-285110170 which details issues with DNS and etcd; it also offers the workaround:

etcd.extraArgs.listen-peer-urls=http://127.0.0.1:2380 in your kubeadm config file to force etcd to use the correct IP

There is a similar issue on etcd: https://github.com/coreos/etcd/issues/9070#issue-284952478

... and another on K8s: https://github.com/kubernetes/kubernetes/issues/57870#issuecomment-355701753

Finally, https://github.com/kubernetes/minikube/issues/2917 is someone else with the same problem with minikube.

the FEATURE REQUEST:
enhance the kubeadm-bootstrapper to allow --extra-config operations on the etcd component and pass them through as appropriate to kubeadm config; this will get closer to parity with the soon-to-be-removed localkube, which does allow etcd configuration.

By doing so, minikube start --extra-config=etcd.listen-peer-urls=http://127.0.0.1:2380 should work, and I believe that would insulate minikube users from the etcd dns behaviors that present challenges.

aredns ehung-start kinbug lifecyclstale olinux prioritawaiting-more-evidence

Source

StephenWithPH

👍13

Most helpful comment

I've just gone through the same issue in trying to get a minikube installation running successfully when using --vm-driver=none and where my /etc/resolv.conf specifies external nameservers (I use dhcp). Running nslookup localhost resolves to an external IP which causes problems where localhost is used in the configuration set up by kubeadm/minikube to connect to the Kubernetes API server, due to the golang netdns=go lookup bypassing nsswitch.conf lookup order (see here https://github.com/kubernetes/kubernetes/issues/57709).

Hopefully for the benefit of other people having the same issue as me, here is what I have done to get a working minikube installation, running locally on Arch Linux.

Minikube Version = v0.28.2

Starting Minikube

I use the same startup shell script as defined here: https://github.com/kubernetes/minikube, with the following options when calling minikube start:

sudo -E minikube start --vm-driver=none --apiserver-ips=127.0.0.1 --alsologtostderr

By specifying 127.0.0.1 as an API server IP address, this IP address is baked into the generated certificates, which is key to changing localhost to 127.0.0.1 in the configuration files.

Run the minikube start script. It will hang after downloading, building and then starting the cluster. While it is hanging, update the /etc/kubernetes/manifests/etcd.yaml file and add - --listen-peer-urls=https://127.0.0.1:2380 to the list of arguments.

Also, I have an issue (https://github.com/kubernetes/minikube/issues/2975) where some of the files in the .minikube directory are still only accessible by the root user. I run the following commands to correct this (which has to be done after minikube has been started so the files exist):

sudo chown -R $USER $HOME/.minikube
sudo chgrp -R $USER $HOME/.minikube

The hanging script should now automatically complete successfully.

There are a number of configuration files that now need updating to point to 127.0.0.1 instead of localhost. These are the kubectl kubeconfig files that contain the address of the API server. The files to change are:

/var/lib/localkube/kubeconfig
/etc/kubernetes/admin.conf
/etc/kubernetes/controller-manager.conf
/etc/kubernetes/kubelet.conf
/etc/kubernetes/scheduler.conf

Reboot the machine, then you should find that minikube and the Kubernetes pods are all running now. The command minikube dashboard also works, which prior to these changes, it wouldn't work at all.

Deleting minikube

In order to work through these issues, I had to delete and re-start minikube several times. It isn't enough to just run minikube delete, you have to delete certain files and directories in order to re-start minikube successfully. I basically do this (and it seems to work):

minikube stop
minikube delete
rm -r ~/.kube
rm -r ~/.minikube
sudo rm /var/lib/kubeadm.yaml
sudo rm -r /etc/kubernetes

Once you have followed that process, then starting minikube requires all of the changes to configuration described above to be made again.

Hope this helps others having similar issues to me.

345paul on 24 Sep 2018

👍8 ❤1 🎉1

All 5 comments

Hopefully for the benefit of other people having the same issue as me, here is what I have done to get a working minikube installation, running locally on Arch Linux.

Minikube Version = v0.28.2

Starting Minikube

I use the same startup shell script as defined here: https://github.com/kubernetes/minikube, with the following options when calling minikube start:

sudo -E minikube start --vm-driver=none --apiserver-ips=127.0.0.1 --alsologtostderr

By specifying 127.0.0.1 as an API server IP address, this IP address is baked into the generated certificates, which is key to changing localhost to 127.0.0.1 in the configuration files.

sudo chown -R $USER $HOME/.minikube
sudo chgrp -R $USER $HOME/.minikube

The hanging script should now automatically complete successfully.

/var/lib/localkube/kubeconfig
/etc/kubernetes/admin.conf
/etc/kubernetes/controller-manager.conf
/etc/kubernetes/kubelet.conf
/etc/kubernetes/scheduler.conf

Reboot the machine, then you should find that minikube and the Kubernetes pods are all running now. The command minikube dashboard also works, which prior to these changes, it wouldn't work at all.

Deleting minikube

minikube stop
minikube delete
rm -r ~/.kube
rm -r ~/.minikube
sudo rm /var/lib/kubeadm.yaml
sudo rm -r /etc/kubernetes

Once you have followed that process, then starting minikube requires all of the changes to configuration described above to be made again.

Hope this helps others having similar issues to me.

345paul on 24 Sep 2018

👍8 ❤1 🎉1

Upgrading to v0.30.0, I had to make some changes to the above process:

/var/lib/localkube/kubeconfig is now /var/lib/minikube/kubeconfig.

I additionally changed the local machine IP addresses to 127.0.0.1 in the following files:

In /var/lib/kubeadm.yaml.
Changed advertiseAddress
In /etc/kubernetes/manifests/kube-apiserver.yaml
Changed --advertise-address
Changed livenessProbe:httpGet:host

To delete minikube, I added this command to the above:

sudo rm -r /var/lib/minikube

345paul on 12 Nov 2018

The issue is caused by the DNS resolver, so the easiest way is to change the "search" section of /etc/resolv.conf to search a none existing domain.
It will make that the "nslookup localhost" returns 127.0.0.1, and resolve this issue.

xu001186 on 28 Nov 2018

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle stale

fejta-bot on 29 Apr 2019

This seems to have been fixed organically in one of the past releases. No longer happening on a near-identical setup with minikube 1.0.1. Since I opened this, I'll close it.

StephenWithPH on 30 Apr 2019

Was this page helpful?

0 / 5 - 0 ratings