Website: Issue with k8s.io/docs/setup/independent/create-cluster-kubeadm/ Stacked control plane nodes

Created on 1 Oct 2018 · 18Comments · Source: kubernetes/website

This is a...

[ ] Feature Request
[x] Bug Report

Problem:
Cannot create HA cluster by following steps provided under "Stacked control plane nodes."

Proposed Solution:
I am in the process of learning Kubernetes, don't know what exactly is wrong, and don't know how to fix.

Page to Update:
https://kubernetes.io/...

Kubernetes Version:
1.12.0

First attempt:
Failed setting up second control plane at this step:
kubeadm alpha phase kubelet write-env-file --config kubeadm-config.yaml
Output:
didn't recognize types with GroupVersionKind: [kubeadm.k8s.io/v1alpha3, Kind=ClusterConfiguration]

As a workaround, I replaced kubeadm.k8s.io/v1alpha3 with kubeadm.k8s.io/v1alpha2 and ClusterConfiguration with MasterConfiguration in the kubeadm-config.yaml

Second attempt:
kubeadm alpha phase mark-master --config kubeadm-config.yaml times out. I tried this several times, trying to debug, and noticed that everything seems to break after running kubectl exec -n kube-system etcd-${CP0_HOSTNAME} -- etcdctl --ca-file /etc/kubernetes/pki/etcd/ca.crt --cert-file /etc/kubernetes/pki/etcd/peer.crt --key-file /etc/kubernetes/pki/etcd/peer.key --endpoints=https://${CP0_IP}:2379 member add ${CP2_HOSTNAME} https://${CP2_IP}:2380. By doing docker ps on the first control-plane, I notice that new containers keep getting created (I assume they keep crashing).

Third attempt:
Tried the proposed solution in #9526. This time there are no errors or timeouts from executing the commands, but the kube controller managers and schedulers on the second and third control planes appear to be crashing:

kubectl get pods --all-namespaces
NAMESPACE NAME kube-system calico-node-279cq kube-system calico-node-4ggxh kube-system calico-node-pzz6x kube-system coredns-576cbf47c7-jf9v8 kube-system coredns-576cbf47c7-t6x8j kube-system etcd-REDACTED kube-system etcd-REDACTED kube-system etcd-REDACTED kube-system kube-apiserver-REDACTED kube-system kube-apiserver-REDACTED kube-system kube-apiserver-REDACTED kube-system kube-controller-manager-REDACTED kube-system kube-controller-manager-REDACTED kube-system kube-controller-manager-REDACTED kube-system kube-proxy-5z77x kube-system kube-proxy-fljtd kube-system kube-proxy-tlc2s kube-system kube-scheduler-REDACTED kube-system kube-scheduler-REDACTED kube-system kube-scheduler-REDACTED READY STATUS RESTARTS AGE
2/2 Running 0 22m
2/2 Running 0 22m
2/2 Running 0 22m
1/1 Running 0 48m
1/1 Running 0 48m
1/1 Running 1 47m
1/1 Running 0 41m
1/1 Running 0 25m
1/1 Running 0 47m
1/1 Running 0 41m
1/1 Running 0 25m
1/1 Running 1 47m
0/1 CrashLoopBackOff 13 42m
0/1 CrashLoopBackOff 9 26m
1/1 Running 0 48m
1/1 Running 0 26m
1/1 Running 0 42m
1/1 Running 1 47m
0/1 CrashLoopBackOff 13 42m
0/1 CrashLoopBackOff 10 26m

Looks like they don't have a configuration?:

kubectl --namespace=kube-system logs kube-controller-manager-REDACTED
Flag --address has been deprecated, see --bind-address instead.
I1001 18:39:51.847438 1 serving.go:293] Generated self-signed cert (/var/run/kubernetes/kube-controller-manager.crt, /var/run/kubernetes/kube-controller-manager.key)
invalid configuration: no configuration has been provided

Anyone know any quick fixes for my issues? I'd be happy to provide more information, but I'm not sure where to look.

kinbug sicluster-lifecycle

Source

eturpin

👍7

Most helpful comment

I had the same issue :

My context :

os: Ubuntu 18.04.1 fresh install
Arch: amd64
Kubernetes version: 1.12.0
Issue on second master (Like @eturpin )

As workaround I did the step (aka kubeadm alpha phase kubelet write-env-file --config kubeadm-config.yaml) manually by getting the file generated on cp0.
Copy the file /var/lib/kubelet/kubeadm-flags.env from cp0 to cp1 at the same location instead of doing the step.

For the record, my file looks like this:

/var/lib/kubelet/kubeadm-flags.env

KUBELET_KUBEADM_ARGS=--cgroup-driver=cgroupfs --network-plugin=cni --resolv-conf=/run/systemd/resolve/resolv.conf

ztec on 5 Oct 2018

👍5

All 18 comments

/kind bug

neolit123 on 1 Oct 2018

@kubernetes/sig-cluster-lifecycle
/sig cluster-lifecycle

neolit123 on 1 Oct 2018

I ran into the same issue using debian 9, docker 17.3.3, and flannel.
I looked at these docs to try to see if I could maybe figure out more about the file. I ultimately skipped it, as i wasn't able to see any kubelet config files written to cp0 that weren't also on the other 2 hosts.
I was able to get through the rest of it with an almost working cluster.
control plane and etcd were working.
cp0 ran the local manifests, but stayed in "NotReady" state due to cni config not being present.

Tried again on coreos 1855.4.0 and had similar experience. Seems like a circular dependency on first host and CNI plugin. It works if you add the file /etc/cni/net.d/10-flannel.conflist to cp0 manually.

gclyatt on 4 Oct 2018

I had the same issue :

My context :

os: Ubuntu 18.04.1 fresh install
Arch: amd64
Kubernetes version: 1.12.0
Issue on second master (Like @eturpin )

For the record, my file looks like this:

/var/lib/kubelet/kubeadm-flags.env

KUBELET_KUBEADM_ARGS=--cgroup-driver=cgroupfs --network-plugin=cni --resolv-conf=/run/systemd/resolve/resolv.conf

ztec on 5 Oct 2018

👍5

@eturpin @ztec @gclyatt: Just to clarify, you are seeing this when using v1.12.0 of kubeadm?

detiber on 5 Oct 2018

@detiber for me, yes.

ztec on 5 Oct 2018

@detiber
Yes. Using the package from http://apt.kubernetes.io/ kubernetes-xenial main.

os: Ubuntu 16.04.5
arch: amd64
Kubernetes version: 1.12.0

eturpin on 5 Oct 2018

@detiber I saw this on v1.12.0 of kubeadm and just tried again with v1.12.1 and had same result with the write-env-file step.

gclyatt on 6 Oct 2018

I have the same issue in Centos 7.5 v1.12.1
Can I just copy from CP0 to CP1 and move on ?
I am getting timeout switching CP1 to master if i do

jjgraham on 9 Oct 2018

👍1

Also on Centos 7.5 and v1.12.1, same issue as mentioned above.

Created file from cp0 /var/lib/kubelet/kubeadm-flags.env , and now I can continue.

NAME STATUS ROLES AGE VERSION
lt1-k8s-04.blah.co.za Ready master 37m v1.12.1
lt1-k8s-05.blah.co.za NotReady master 5m25s v1.12.1

nardusg on 10 Oct 2018

/assign

detiber on 10 Oct 2018

👍1

I had the same issue :

OS: Ubuntu Ubuntu 16.04.4 LTS
Arch: amd64
Kubernetes version: 1.12.1
Issue on second master

selmison on 10 Oct 2018

👍1

I'm experiencing the same problem with Kubernetes 1.12.1 and Ubuntu 18.04.

jbiel on 12 Oct 2018

Facing same issue with CentOS & Kubernetes 1.12.1

Please advise.

pankajpandey9 on 15 Oct 2018

@pankajpandey9, the workaround outlined by @ztec a few posts up works.

jbiel on 15 Oct 2018

For me, the workaround (manually creating the /var/lib/kubelet/kubeadm-flags.env file on the other nodes) allowed the rest of the commands in the documentation to complete.

However, the resulting cluster was still (mostly) broken. kube-controller-manager and kube-scheduler on the other nodes were in a continual crashing loop.

billimek on 15 Oct 2018

closing in favor of:
https://github.com/kubernetes/kubeadm/issues/1171

^ has the actual cause + solution defined too.

this is a kubeadm issue and we shouldn't track it in the website repo.
/close

neolit123 on 15 Oct 2018

@neolit123: Closing this issue.

In response to this:

closing in favor of:
https://github.com/kubernetes/kubeadm/issues/1171

^ has the actual cause + solution defined too.

this is a kubeadm issue and we shouldn't tracking it in the website repo.
/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.