Kubeadm: Modifying a multi-node control-plane can result in a control plane failing to join the cluster

Created on 21 Jun 2019  路  10Comments  路  Source: kubernetes/kubeadm

Is this a BUG REPORT or FEATURE REQUEST?

/kind bug

kubeadm v1.14.2, but I believe this has been an issue for a while and continues to be so.

What happened?

I had a 3 node stacked control plane, cp1, cp2, cp3
I created a new node and added it to the control plane., cp4
I removed etcd running on cp1 from the etcd cluster.
I removed cp1 as a kubernetes node. I deleted the backing infrastructure.

I created a new node and added it to the control plane, cp5
I removed etcd running on cp2 from the etcd cluster.
I removed cp2 as a kubernetes node. I deleted the backing infrastructure.

I created a third and final control plane node (cp6) but during the kubeadm join --control-plane call I ended up with this error:

I0621 13:56:32.601586      73 patchnode.go:30] [patchnode] Uploading the CRI Socket information "/run/containerd/containerd.sock" to the Node API object "my-cluster-1561125359" as an annotation
I0621 13:56:33.110139      73 round_trippers.go:438] GET https://172.17.0.3:6443/api/v1/nodes/my-cluster-1561125359 404 Not Found in 8 milliseconds
*snip*
https://172.17.0.3:6443/api/v1/nodes/my-cluster-1561125359 200 OK in 53 milliseconds
I0621 13:56:45.687733      73 local.go:118] creating etcd client that connects to etcd pods
I0621 13:56:45.702601      73 round_trippers.go:438] GET https://172.17.0.3:6443/api/v1/namespaces/kube-system/configmaps/kubeadm-config 200 OK in 14 milliseconds
I0621 13:56:45.703024      73 etcd.go:139] etcd endpoints read from pods: https://172.17.0.4:2379,https://172.17.0.5:2379,https://172.17.0.8:2379,https://172.17.0.6:2379,https://172.17.0.6:2379
error execution phase control-plane-join/etcd: error creating local etcd static pod manifest file: error syncing endpoints with etc: dial tcp 172.17.0.4:2379: connect: connection refused
exit status 1
failed to join a control plane node with kubeadm

I thought that was weird so I looked into it. cp6 had the same IP as cp2 had when it existed.
etcd wasn't yet running on cp6 (because kubeadm wouldn't know it was talking to itself) and thus the connection was refused. Kubeadm used an outdated list of IP addresses from the config map:

root@ip-172-31-30-152:~# k get cm -n kube-system kubeadm-config -o yaml
*snip*
  ClusterStatus: |
    apiEndpoints:
      cp5:
        advertiseAddress: 172.17.0.8
        bindPort: 6443
      cp4:
        advertiseAddress: 172.17.0.6
        bindPort: 6443
     cp1:
        advertiseAddress: 172.17.0.6
        bindPort: 6443
      cp2:
        advertiseAddress: 172.17.0.4
        bindPort: 6443
      cp3:
        advertiseAddress: 172.17.0.5
        bindPort: 6443

As you can see...this list of endpoints is out of date. When I removed a node from the cluster it should no longer exist in this config map.

I traced the code to https://github.com/kubernetes/kubernetes/blob/master/cmd/kubeadm/app/util/etcd/etcd.go#L78-L81 and thankfully this comment shows the assertion being made that ClusterStatus is always up to date.

areHA kinbug prioritimportant-longterm

Most helpful comment

1.15 looks awesome. I'm using 1.14, but I think kubeadm reset will work for me, I'll give it a shot and let you know! Y'all have made some amazing progress while I wasn't looking 馃槃

All 10 comments

I had a 3 node stacked control plane, cp1, cp2, cp3
I created a new node and added it to the control plane., cp4
I removed etcd running on cp1 from the etcd cluster.
I removed cp1 as a kubernetes node. I deleted the backing infrastructure.

I created a new node and added it to the control plane, cp5
I removed etcd running on cp2 from the etcd cluster.
I removed cp2 as a kubernetes node. I deleted the backing infrastructure.

is removing the node done with kubeadm reset?
are you seeing ClusterStatus being correctly updated after that?
is it possibly to narrow this down with less steps?

I created a third and final control plane node (cp6) but during the kubeadm init --control-plane call I ended up with this error:

join here instead of init?

Kubeadm used an outdated list of IP addresses from the config map:

one of .6 should have been removed during kubeadm reset.

something else to keep in mind is that concurrent mods to the ClusterStatus are not supported in 1.14.

This is not done with kubeadm reset. I was under the impression that was a developer tool and not for use in a production cluster, but perhaps what I'm trying to do isn't generally supported by kubernetes?

You are right, that was supposed to be join, I will update that. Thank you for catching it.

I think it's possible to reproduce the behavior of clusterStatus not getting modified by removing any control plane without using kubeadm reset.

  1. Create a control plane
  2. Join a second control plane
  3. Remove the control plane (first remove the etcd member then remove the node)
  4. Look at the clusterStatus

i think kubeadm reset is a must at this point because it has a phase that removes the entry from the ClusterStatus. 1.15 also broke reset into phases.

  • could you please try with kubeadm reset?
  • is something preventing you from doing that?

1.15 looks awesome. I'm using 1.14, but I think kubeadm reset will work for me, I'll give it a shot and let you know! Y'all have made some amazing progress while I wasn't looking 馃槃

@neolit123 it doesn't seem like kubeadm reset is cleaning up ClusterStatus as of v1.14.1. I can read through the code real quick to see if I'm doing it wrong though

ok, looks like it should be cleaning up cluster status, I'll keep playing.

Closing because from what I can tell, this is a supported workflow.

thanks for testing. let me know if you find a bug.

i logged this ticket to track having more HA e2e tests:
https://github.com/kubernetes/kubeadm/issues/1633

a bug in my code. Works as expected. Thank you!

Was this page helpful?
0 / 5 - 0 ratings