Kubeadm: Having 'localhost' as a registered hostname in your DNS domain upsets your 'kubeadm init'

Created on 14 Jun 2017 · 13Comments · Source: kubernetes/kubeadm

What keywords did you search in kubeadm issues before filing this one?

[apiclient] Created API client, waiting for the control plane to become ready

Is this a BUG REPORT or FEATURE REQUEST?

BUG REPORT

Versions

kubeadm version (use kubeadm version):

kubeadm version: version.Info{Major:"1", Minor:"6", GitVersion:"v1.6.4", GitCommit:"d6f433224538d4f9ca2f7ae19b252e6fcb66a3ae", GitTreeState:"clean", BuildDate:"2017-05-19T18:33:17Z", GoVersion:"go1.7.5", Compiler:"gc", Platform:"linux/amd64"}

Environment:

Kubernetes version (use kubectl version):

Client Version: version.Info{Major:"1", Minor:"6", GitVersion:"v1.6.4", GitCommit:"d6f433224538d4f9ca2f7ae19b252e6fcb66a3ae", GitTreeState:"clean", BuildDate:"2017-05-19T18:44:27Z", GoVersion:"go1.7.5", Compiler:"gc", Platform:"linux/amd64"}
Server Version: version.Info{Major:"1", Minor:"6", GitVersion:"v1.6.4", GitCommit:"d6f433224538d4f9ca2f7ae19b252e6fcb66a3ae", GitTreeState:"clean", BuildDate:"2017-05-19T18:33:17Z", GoVersion:"go1.7.5", Compiler:"gc", Platform:"linux/amd64"}

Cloud provider or hardware configuration:
On prem VMs
OS (e.g. from /etc/os-release):

CentOS Linux release 7.3.1611 (Core)

Kernel (e.g. uname -a):

Linux tme-lnx1-centos 3.10.0-514.21.1.el7.x86_64 #1 SMP Thu May 25 17:04:51 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux

Others:
Docker 1.12.6 (from https://yum.dockerproject.org/repo/main/centos/7)

What happened?

API controller failed to start when issuing kubeadm init

What you expected to happen?

kubeadm init completely successfully

How to reproduce it (as minimally and precisely as possible)?

In the network I'm sitting there's a number of subdomains that have 'localhost' registered as a hostname. Somewhere when the API controller starts, it resolves localhost.foo.domain.com and sticks that IP address to the API controller:

$ docker ps
CONTAINER ID        IMAGE                                                                                                                   COMMAND                  CREATED              STATUS                            PORTS               NAMES
ab6b449d952c        gcr.io/google_containers/kube-apiserver-amd64@sha256:6d5aa429c2b0806e4b6d1d179054d6deee46eec0aabe7bd7bd6abff97be36ae7   "kube-apiserver --all"   About a minute ago   Exited (255) About a minute ago                       k8s_kube-apiserver_kube-apiserver-tme-lnx1-centos_kube-system_3a03da482c18faa7691e3f59fcfc1189_10
$ docker logs ab6b449d952c
E0613 19:23:20.246384       1 reflector.go:201] k8s.io/kubernetes/pkg/client/informers/informers_generated/internalversion/factory.go:70: Failed to list *rbac.RoleBinding: Get https://localhost:6443/apis/rbac.authorization.k8s.io/v1beta1/rolebindings?resourceVersion=0: dial tcp 10.12.180.36:6443: getsockopt: connection refused
E0613 19:23:20.246417       1 reflector.go:201] k8s.io/kubernetes/pkg/client/informers/informers_generated/internalversion/factory.go:70: Failed to list *api.LimitRange: Get https://localhost:6443/api/v1/limitranges?resourceVersion=0: dial tcp 10.12.180.36:6443: getsockopt: connection refused
E0613 19:23:20.246431       1 reflector.go:201] k8s.io/kubernetes/pkg/client/informers/informers_generated/internalversion/factory.go:70: Failed to list *api.Namespace: Get https://localhost:6443/api/v1/namespaces?resourceVersion=0: dial tcp 10.12.180.36:6443: getsockopt: connection refused
E0613 19:23:20.246392       1 reflector.go:201] k8s.io/kubernetes/pkg/client/informers/informers_generated/internalversion/factory.go:70: Failed to list *api.ResourceQuota: Get https://localhost:6443/api/v1/resourcequotas?resourceVersion=0: dial tcp 10.12.180.36:6443: getsockopt: connection refused
E0613 19:23:20.246372       1 reflector.go:201] k8s.io/kubernetes/pkg/client/informers/informers_generated/internalversion/factory.go:70: Failed to list *api.ServiceAccount: Get https://localhost:6443/api/v1/serviceaccounts?resourceVersion=0: dial tcp 10.12.180.36:6443: getsockopt: connection refused
E0613 19:23:20.246405       1 reflector.go:201] k8s.io/kubernetes/pkg/client/informers/informers_generated/internalversion/factory.go:70: Failed to list *storage.StorageClass: Get https://localhost:6443/apis/storage.k8s.io/v1beta1/storageclasses?resourceVersion=0: dial tcp 10.12.180.36:6443: getsockopt: connection refused
E0613 19:23:20.246681       1 reflector.go:201] k8s.io/kubernetes/pkg/client/informers/informers_generated/internalversion/factory.go:70: Failed to list *api.Secret: Get https://localhost:6443/api/v1/secrets?resourceVersion=0: dial tcp 10.12.180.36:6443: getsockopt: connection refused
E0613 19:23:20.246703       1 reflector.go:201] k8s.io/kubernetes/pkg/client/informers/informers_generated/internalversion/factory.go:70: Failed to list *rbac.ClusterRoleBinding: Get https://localhost:6443/apis/rbac.authorization.k8s.io/v1beta1/clusterrolebindings?resourceVersion=0: dial tcp 10.12.180.36:6443: getsockopt: connection refused
E0613 19:23:20.246806       1 reflector.go:201] k8s.io/kubernetes/pkg/client/informers/informers_generated/internalversion/factory.go:70: Failed to list *rbac.Role: Get https://localhost:6443/apis/rbac.authorization.k8s.io/v1beta1/roles?resourceVersion=0: dial tcp 10.12.180.36:6443: getsockopt: connection refused
E0613 19:23:20.246880       1 reflector.go:201] k8s.io/kubernetes/pkg/client/informers/informers_generated/internalversion/factory.go:70: Failed to list *rbac.ClusterRole: Get https://localhost:6443/apis/rbac.authorization.k8s.io/v1beta1/clusterroles?resourceVersion=0: dial tcp 10.12.180.36:6443: getsockopt: connection refused
[restful] 2017/06/13 19:23:20 log.go:30: [restful/swagger] listing is available at https://10.21.34.95:6443/swaggerapi/
[restful] 2017/06/13 19:23:20 log.go:30: [restful/swagger] https://10.21.34.95:6443/swaggerui/ is mapped to folder /swagger-ui/
I0613 19:23:20.381159       1 serve.go:79] Serving securely on 0.0.0.0:6443
W0613 19:23:20.383807       1 storage_extensions.go:127] third party resource sync failed: Get https://localhost:6443/apis/extensions/v1beta1/thirdpartyresources: dial tcp 10.12.180.36:6443: getsockopt: connection refused
F0613 19:23:20.383841       1 controller.go:128] Unable to perform initial IP allocation check: unable to refresh the service IP block: Get https://localhost:6443/api/v1/services: dial tcp 10.12.180.36:6443: getsockopt: connection refused

10.12.180.36 is localhost.foo.domain.com, 10.21.34.95 is the correct public facing IP address of the VM.

Anything else we need to know?

Removing the search and/or domain from /etc/resolv.conf that has a localhost hostname registered will resolve this problem and kubeadm init succeeds after a kubeadm reset.

Source

datamattsson

👍2

Most helpful comment

This will be fixed in v1.7 thanks to https://github.com/kubernetes/kubernetes/pull/46772 :tada:

luxas on 20 Jun 2017

🎉2

All 13 comments

I confirm this issue – faced it on three FirstVDS.ru KVMs (two with Ubuntu 16.04 and one with CentOS 7.3). Also @albpal had this on myhosting.com. We got the issue fixed by removing the provider's DNS servers and installing dnsmasq. More details in https://github.com/kubernetes/kubeadm/issues/228#issuecomment-307158412 and the following comments.

Hope there is a fix on the kubeadm side, because sorting out what's going on after you've stuck at [apiclient] Created API client, waiting for the control plane to become ready is too hard.

kachkaev on 14 Jun 2017

@kachkaev I'm thinking of adding a preflight check for this, i.e. fail fast if
nslookup localhost, nslookup localhost.$(hostname -d) or nslookup $(hostname) returns a non-loopback or address that doesn't exist in ip addr.

How does that sound to you?

luxas on 14 Jun 2017

Thanks to chatting with @drajen I now have a clearer understanding of the problem domain, I had never encountered such a setup myself before.

luxas on 14 Jun 2017

@luxas my k8s experience is only a couple of weeks, so I'm not sure I can be a good adviser here. But I agree that running a preflight check would be a good start! It's also important to point people who fail the check to some good explanation of what's happening so that they could either install dnsmaq (like I did) of configure their DNS server (if they have access to it).

If you're fancy to experiment, I can share root access to two small KVMs on FirstVDS.ru, one with Ubuntu and another one with CentOS 7.3. I rented them to experiment with kubeadm and they are paid for another couple of weeks in any case. Just DM me on twitter or send an email.

kachkaev on 14 Jun 2017

Bump, also experiencing this.
Also, if 'localhost' appears on DNS after kubeadm has initted, the cluster has unexpected behaviour in communication between hosts.

vascofg on 18 Jun 2017

I ran into this. I am using ubuntu though.. but pretty much the same problem.

dalenoe on 19 Jun 2017

One thing maybe worth pointing out is that it's not kubeadm that's failing per se, but this seems to be a Kubernetes-in-general thing. I'm not even sure if it even is a bug in the API server, since it just uses the normal golang resolver which takes (and should take indeed) /etc/resolv.conf into account.
If you have something invalid in there it will obviously fail.

However, kubeadm could be more user-friendly by detecting such a misconfiguration and failing fast, that's something we're definitely gonna take into account.

luxas on 19 Jun 2017

@luxas is there any open issue that you know of on the Kubernetes side?

vascofg on 19 Jun 2017

@vascofg Not anyone I know about, but there might be.
You can ask in #sig-api-machinery on Slack...

luxas on 19 Jun 2017

This will be fixed in v1.7 thanks to https://github.com/kubernetes/kubernetes/pull/46772 :tada:

luxas on 20 Jun 2017

🎉2

Happy to pick this one up.

craigtracey on 21 Jun 2017

👍1

This is fixed in the latest v1.6 release thanks to https://github.com/kubernetes/kubernetes/pull/48875 and in v1.7 thanks to https://github.com/kubernetes/kubernetes/pull/46772

luxas on 19 Aug 2017

👍1

I'm leaving a comment about this here as I ran into the same issue with a machine called 'localhost' registered on the Enterprise network. In my case, all I had to do was change the 'search' line in /etc/resolv.conf to start with 'localdomain' before any corporate domain names. This allowed 'nslookup localhost' to always actually search localhost.localdomain which resolves to the loopback address.

Although this is fxed in later realeases, the software I was installing is a black box K8s solution with no option for upgrading kubernetes or altering any of the config.