Kubeadm: add error messaging that kubeadm init and join should not be called on the same machine

Created on 5 Jul 2018  路  13Comments  路  Source: kubernetes/kubeadm

EDIT: lubomir: see comments for the exact cause.

What keywords did you search in kubeadm issues before filing this one?

/etc/kubernetes/pki/ca.crt already exists

Is this a BUG REPORT or FEATURE REQUEST?

BUG REPORT

Versions

kubeadm version (use kubeadm version):
Client Version: version.Info{Major:"1", Minor:"10", GitVersion:"v1.10.3", GitCommit:"2bba0127d85d5a46ab4b778548be28623b32d0b0", GitTreeState:"clean", BuildDate:"2018-05-21T09:17:39Z", GoVersion:"go1.9.3", Compiler:"gc", Platform:"linux/amd64"}

Environment:

  • Kubernetes version (use kubectl version):
    Client Version: version.Info{Major:"1", Minor:"10", GitVersion:"v1.10.3", GitCommit:"2bba0127d85d5a46ab4b778548be28623b32d0b0", GitTreeState:"clean", BuildDate:"2018-05-21T09:17:39Z", GoVersion:"go1.9.3", Compiler:"gc", Platform:"linux/amd64"}
  • Cloud provider or hardware configuration:
    Virtual Machine
  • OS (e.g. from /etc/os-release):
    NAME="Ubuntu"
    VERSION="16.04.4 LTS (Xenial Xerus)"
    ID=ubuntu
    ID_LIKE=debian
    PRETTY_NAME="Ubuntu 16.04.4 LTS"
    VERSION_ID="16.04"
    HOME_URL="http://www.ubuntu.com/"
    SUPPORT_URL="http://help.ubuntu.com/"
    BUG_REPORT_URL="http://bugs.launchpad.net/ubuntu/"
    VERSION_CODENAME=xenial
    UBUNTU_CODENAME=xenial
  • Kernel (e.g. uname -a):
    Linux kube-test 4.4.0-127-generic #153-Ubuntu SMP Sat May 19 10:58:46 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux
  • Others:
    @kubernetes/sig-cluster-lifecycle

What happened?

root@kube-test:~# kubeadm join 10.37.249.120:6443 --token <my token> --discovery-token-ca-cert-hash sha256<sha token>
[preflight] Running pre-flight checks.
[preflight] Some fatal errors occurred:
    [ERROR Port-10250]: Port 10250 is in use
    [ERROR DirAvailable--etc-kubernetes-manifests]: /etc/kubernetes/manifests is not empty
    [ERROR FileAvailable--etc-kubernetes-pki-ca.crt]: /etc/kubernetes/pki/ca.crt already exists
    [ERROR FileAvailable--etc-kubernetes-kubelet.conf]: /etc/kubernetes/kubelet.conf already exists
[preflight] If you know what you are doing, you can make a check non-fatal with `--ignore-preflight-errors=...`

What you expected to happen?

Join should work without the flag "--ignore-preflight-errors=All". Those "ERRORS" should be "WARNINGS"

How to reproduce it (as minimally and precisely as possible)?

as root

kubeadm init --apiserver-advertise-address=10.37.249.120 --pod-network-cidr=192.168.0.0/16 

as a regular user

mkdir -p $HOME/.kube
sudo cp -i /etc/kubernetes/admin.conf $HOME/.kube/config
sudo chown $(id -u):$(id -g) $HOME/.kube/config
kubectl apply -f https://docs.projectcalico.org/v3.0/getting-started/kubernetes/installation/hosted/kubeadm/1.7/calico.yaml
kubectl taint nodes --all node-role.kubernetes.io/master-

as root user

kubeadm join 10.37.249.120:6443 --token <my token> --discovery-token-ca-cert-hash sha256:<sha token>

Anything else we need to know?

Work around (note the ignore All errors)

kubeadm join 10.37.249.120:6443 --token <my token> --discovery-token-ca-cert-hash sha256:<sha token> --ignore-preflight-errors=All

When you see the error about

/etc/kubernetes/pki/ca.crt already exists

and you delete that file, it leads to this error

[discovery] Failed to request cluster info, will try again: [Get https://10.37.249.120:6443/api/v1/namespaces/kube-public/configmaps/cluster-info: dial tcp 10.37.249.120:6443: getsockopt: connection refused]
areUX good first issue help wanted kinfeature lifecyclfrozen

Most helpful comment

^ that was initial though as me and @kad a similar report recently.
could be a missing kubeadm reset on that particular node.

All 13 comments

Are you joining a node that you ran init on?

^ that was initial though as me and @kad a similar report recently.
could be a missing kubeadm reset on that particular node.

This was all on a single VM. Yes, init and join is run on the same node.

This was all on a single VM. Yes, init and join is run on the same node.

you need to run kubeadm init / join on two separate VMs or bare metal machines.

Is that a documentation defect? Why is that not stated with the "join" command.

Why does it work when I use the --ignore-preflight-errors=All flag?

Why does the join command not give an a error like "you are not allowed to create a cluster on the same VM"?

Is that a documentation defect? Why is that not stated with the "join" command.

https://kubernetes.io/docs/concepts/architecture/nodes/
contains the following:

A node is a worker machine in Kubernetes, previously known as a minion. A node may be a VM or physical machine, depending on the cluster. Each node has the services necessary to run pods and is managed by the master components.

so we assume that our users know what a node is in the first place.

Why does it work when I use the --ignore-preflight-errors=All flag?

i don't think it works. it probably breaks somewhere and you need to look at the logs.

Why does the join command not give an a error like "you are not allowed to create a cluster on the same VM"?

we could add error messages that join and init should not be run on the same machine, as long as we have a good way to detect that.

Is there any good way to detect that join and init has run on the same machine?

@xlgao-zju

Is there any good way to detect that join and init has run on the same machine?

there are some ways but none of them are that "good".

i can bring this as agenda for the meeting today, as i've seen people do this by mistake and our way of knowing this happened looks like this [ERROR DirAvailable...

i will get back to you on how to proceed with this.

i don't think it works. it probably breaks somewhere and you need to look at the logs.

No, this does work. I was able to create a kube cluster on a single node, using the flag above. Not a good idea for a production instance, but I just want a dev instance to play with. Not sure why I would be forced to use 2 VM's.

so you mean you created a single node (master node) cluster....that's fine and it should work.
the errors you are seeing is because you didn't call kubeadm reset on the same node. always run that before kubeadm init / join...

about the flag and errors, we need to add better error messaging and i will write some proposals here later.

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle stale

/lifecycle frozen

/close due to lack of updates
feel free to re-open if necessary

Was this page helpful?
0 / 5 - 0 ratings