kubeadm join failing preflight checks

Created on 17 Jul 2017  路  21Comments  路  Source: kubernetes/kubeadm

Is this a BUG REPORT or FEATURE REQUEST?

BUG REPORT

Versions

kubeadm version: &version.Info{Major:"1", Minor:"7", GitVersion:"v1.7.1", GitCommit:"1dc5c66f5dd61da08412a74221ecc79208c2165b", GitTreeState:"clean", BuildDate:"2017-07-14T01:48:01Z", GoVersion:"go1.8.3", Compiler:"gc", Platform:"linux/amd64"}

Environment:

  • Kubernetes version Client Version: version.Info{Major:"1", Minor:"7", GitVersion:"v1.7.1", GitCommit:"1dc5c66f5dd61da08412a74221ecc79208c2165b", GitTreeState:"clean", BuildDate:"2017-07-14T02:00:46Z", GoVersion:"go1.8.3", Compiler:"gc", Platform:"linux/amd64"}
  • Cloud provider or hardware configuration: AWS
  • OS (e.g. from /etc/os-release): CentOS Linux release 7.3.1611 (Core)

What happened?

[root@ip-192-168-99-11 ~]# kubeadm join --token random.16charactertoken 192.168.99.10:6443
[kubeadm] WARNING: kubeadm is in beta, please do not use it for production clusters.
[preflight] Running pre-flight checks
[preflight] WARNING: hostname "" could not be reached
[preflight] WARNING: hostname "" lookup : no such host
[preflight] Some fatal errors occurred:
    hostname "" a DNS-1123 subdomain must consist of lower case alphanumeric characters, '-' or '.', and must start and end with an alphanumeric character (e.g. 'example.com', regex used for validation is '[a-z0-9]([-a-z0-9]*[a-z0-9])?(\.[a-z0-9]([-a-z0-9]*[a-z0-9])?)*')
[preflight] If you know what you are doing, you can skip pre-flight checks with `--skip-preflight-checks`

What you expected to happen?

Normal completion of kubeadm join

How to reproduce it (as minimally and precisely as possible)?

Try to perform kubeadm join with v1.7..1

Anything else we need to know?

Skipping preflight checks allows the join to run normally.

I suspect the problem is this line:
https://github.com/kubernetes/kubernetes/blob/v1.7.1/cmd/kubeadm/app/preflight/checks.go#L544

It was not updated to match the changes to HostnameCheck in https://github.com/kubernetes/kubernetes/commit/e20f5a3d05d36399b9880e371fc52ef5264500cd
So it always passes in an empty string and always fails the updated HostnameCheck.

Most helpful comment

All 21 comments

I'm seeing this too.

Yeah as a workaround you can skip the preflight checks but the problem persists in 1.7.1

Same is happening for me.

Seeing this on 1.7.1. Even after providing node-name, it still complains with the same error. Workaround is to skip pre-flight checks. Looks like, the error is happening in this validation:

https://github.com/kubernetes/kubernetes/blob/master/staging/src/k8s.io/apimachinery/pkg/util/validation/validation.go#L110-L126

Thanks @stevenbower for the PR!

Problem persists even after v1.7.2

Confirmed, problem is still present.

Looks like the PR to fix it hasn't been merged to master yet: https://github.com/kubernetes/kubernetes/pull/49073

After it's merged, it will also need to make it onto a backport branch for a 1.7.x point release.

~@erhudy: It's a little confusing, but that change was already cherry picked onto the 1.7 branch (in https://github.com/kubernetes/kubernetes/pull/49094). The fix is released in 1.7.2.~

~It's still broken in master at the moment.~

Edit: nope, I'm wrong.

Edit Edit: I wasn't completely wrong, the purported fix was backported but the issue _is_ still in 1.7.2.

Oh, interesting. kubeadm 1.7.2 is still showing the behavior, so I'm going to take another look at it.

@erhudy ah, you're right. I crossed the streams.

Apologies for all the back and forth and confusion here. I did more digging and I do see the purported fix backported to v1.7.2: https://github.com/kubernetes/kubernetes/blob/v1.7.2/cmd/kubeadm/app/preflight/checks.go#L544

I also confirmed that 1.7.2 still has the problem. I think that means https://github.com/kubernetes/kubernetes/pull/49073 is not a complete fix for the problem.

cc @stevenbower @luxas

As far as I can tell, the problem is that in v1.7.0, the hostname check had the following structure:

    hostname := node.GetHostname("")
    for _, msg := range validation.ValidateNodeName(hostname, false) {
        errors = append(errors, fmt.Errorf("hostname \"%s\" %s", hostname, msg))
    }
    addr, err := net.LookupHost(hostname)

The node.GetHostname("") call gets passed to a method which then returns the value of os.Hostname(). This comes back, gets looked up via net.LookupHost(hostname), and passes.

In v1.7.2, the structure of this check changed. The hostnameCheck struct now has a nodeName field that is populated with the value of cfg.NodeName, cfg being an instance of kubeadmapi.NodeConfiguration. cfg is initialized to an unconfigured instance of the struct in NewCmdJoin, from which we (presumably) see the node name being reported as "", which then fails the preflight check.

Based on this, I think the correct fix is probably to populate cfg.NodeName with the value of node.GetHostname("") prior to running the preflight checks, as v1.7.0 did.

Does anyone see a problem with this analysis? This is easy enough to fix, and also to add an additional test around to verify that we're not passing blank strings as the node name.

@erhudy If you're able to confirm that it actually fixes the problem you're seeing, I don't think you'll get any pushback for such a patch. :-)

Right, just writing it up here in the event that there was some grander reason for restructuring it that I don't know about. I will work up a PR.

Just to add an additional issue when using the --config flag and setting nodeName in the config (to try and get around the error) the following error is given:

[root@node1 ~]# kubeadm join --config=/etc/kubernetes/kubeadm.yaml
[kubeadm] WARNING: kubeadm is in beta, please do not use it for production clusters.
[preflight] Running pre-flight checks
can not mix '--config' with other arguments

kubeadm.yaml

kind: NodeConfiguration
apiVersion: kubeadm.k8s.io/v1alpha1
nodeName: node1.k8s1.vmw.rmb938.me
token: nope
discoveryTokenAPIServers:
  - master.rmb938.me:6443

Pushed a PR against master that fixes this issue, both the blank default hostname and not registering the presence of --node-name. The PR can be trivially backported to 1.7.x (I tested against both master and 1.7.2).

Resulting this issue kube nodes cannot join master node cause of they are has the equal domain name, eg "" and "". Possibly I should open new issue.

Do we have an ETA for this bug to be fixed/merged/approved? 馃憤

For the moment I downgraded my workers to kubeadm-1.7.0 and they joined a master initialized with 1.7.2 just fine, and even reported node version 1.7.2. (CentOS 7 using the Kubernetes yum repo)

Was this page helpful?
0 / 5 - 0 ratings