EDIT: lubomir: see comments for the exact cause.
/etc/kubernetes/pki/ca.crt already exists
BUG REPORT
kubeadm version (use kubeadm version):
Client Version: version.Info{Major:"1", Minor:"10", GitVersion:"v1.10.3", GitCommit:"2bba0127d85d5a46ab4b778548be28623b32d0b0", GitTreeState:"clean", BuildDate:"2018-05-21T09:17:39Z", GoVersion:"go1.9.3", Compiler:"gc", Platform:"linux/amd64"}
Environment:
kubectl version):uname -a):root@kube-test:~# kubeadm join 10.37.249.120:6443 --token <my token> --discovery-token-ca-cert-hash sha256<sha token>
[preflight] Running pre-flight checks.
[preflight] Some fatal errors occurred:
[ERROR Port-10250]: Port 10250 is in use
[ERROR DirAvailable--etc-kubernetes-manifests]: /etc/kubernetes/manifests is not empty
[ERROR FileAvailable--etc-kubernetes-pki-ca.crt]: /etc/kubernetes/pki/ca.crt already exists
[ERROR FileAvailable--etc-kubernetes-kubelet.conf]: /etc/kubernetes/kubelet.conf already exists
[preflight] If you know what you are doing, you can make a check non-fatal with `--ignore-preflight-errors=...`
Join should work without the flag "--ignore-preflight-errors=All". Those "ERRORS" should be "WARNINGS"
as root
kubeadm init --apiserver-advertise-address=10.37.249.120 --pod-network-cidr=192.168.0.0/16
as a regular user
mkdir -p $HOME/.kube
sudo cp -i /etc/kubernetes/admin.conf $HOME/.kube/config
sudo chown $(id -u):$(id -g) $HOME/.kube/config
kubectl apply -f https://docs.projectcalico.org/v3.0/getting-started/kubernetes/installation/hosted/kubeadm/1.7/calico.yaml
kubectl taint nodes --all node-role.kubernetes.io/master-
as root user
kubeadm join 10.37.249.120:6443 --token <my token> --discovery-token-ca-cert-hash sha256:<sha token>
Work around (note the ignore All errors)
kubeadm join 10.37.249.120:6443 --token <my token> --discovery-token-ca-cert-hash sha256:<sha token> --ignore-preflight-errors=All
When you see the error about
/etc/kubernetes/pki/ca.crt already exists
and you delete that file, it leads to this error
[discovery] Failed to request cluster info, will try again: [Get https://10.37.249.120:6443/api/v1/namespaces/kube-public/configmaps/cluster-info: dial tcp 10.37.249.120:6443: getsockopt: connection refused]
Are you joining a node that you ran init on?
^ that was initial though as me and @kad a similar report recently.
could be a missing kubeadm reset on that particular node.
This was all on a single VM. Yes, init and join is run on the same node.
This was all on a single VM. Yes, init and join is run on the same node.
you need to run kubeadm init / join on two separate VMs or bare metal machines.
Is that a documentation defect? Why is that not stated with the "join" command.
Why does it work when I use the --ignore-preflight-errors=All flag?
Why does the join command not give an a error like "you are not allowed to create a cluster on the same VM"?
Is that a documentation defect? Why is that not stated with the "join" command.
https://kubernetes.io/docs/concepts/architecture/nodes/
contains the following:
A node is a worker machine in Kubernetes, previously known as a minion. A node may be a VM or physical machine, depending on the cluster. Each node has the services necessary to run pods and is managed by the master components.
so we assume that our users know what a node is in the first place.
Why does it work when I use the --ignore-preflight-errors=All flag?
i don't think it works. it probably breaks somewhere and you need to look at the logs.
Why does the join command not give an a error like "you are not allowed to create a cluster on the same VM"?
we could add error messages that join and init should not be run on the same machine, as long as we have a good way to detect that.
Is there any good way to detect that join and init has run on the same machine?
@xlgao-zju
Is there any good way to detect that join and init has run on the same machine?
there are some ways but none of them are that "good".
i can bring this as agenda for the meeting today, as i've seen people do this by mistake and our way of knowing this happened looks like this [ERROR DirAvailable...
i will get back to you on how to proceed with this.
i don't think it works. it probably breaks somewhere and you need to look at the logs.
No, this does work. I was able to create a kube cluster on a single node, using the flag above. Not a good idea for a production instance, but I just want a dev instance to play with. Not sure why I would be forced to use 2 VM's.
so you mean you created a single node (master node) cluster....that's fine and it should work.
the errors you are seeing is because you didn't call kubeadm reset on the same node. always run that before kubeadm init / join...
about the flag and errors, we need to add better error messaging and i will write some proposals here later.
Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.
If this issue is safe to close now please do so with /close.
Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle stale
/lifecycle frozen
/close due to lack of updates
feel free to re-open if necessary
Most helpful comment
^ that was initial though as me and @kad a similar report recently.
could be a missing
kubeadm reseton that particular node.