Cluster-api: KCP should continuously reconcile endpoints in kubeadm ConfigMap Status

Created on 3 Apr 2020  路  20Comments  路  Source: kubernetes-sigs/cluster-api

What did you expect to happen:
As of today, KCP changes the endpoints list in the KubeadmConfig.Status when scaling down the control plane nodes.

https://github.com/kubernetes-sigs/cluster-api/blob/3950a9c5614ebb0e08e35ec6e1100ac3c40e89a3/controlplane/kubeadm/controllers/scale.go#L146

However, there are cases where the list of endpoints might change outside of the scale down sequence, like e.g. when a machine dies/gets deleted.

Therefore, on top of what already exists, we should implement an additional reconciliation function to be run during ReconcileNormal in order to ensure that the list of endpoints in KubeadmConfig.Status is in sync with the actual list of CP machines.

Anything else you would like to add:
As of today RemoveMachineFromKubeadmConfigMap updates the KubeadmConfig.Status no matter of the list is changed or not. This should be improved in order to update/patch the KubeadmConfig.Status only when the list changes

Environment:

  • Cluster-api version: v0.3.3

/kind bug
@vincepri @benmoss

arecontrol-plane help wanted kinbug prioritimportant-soon

All 20 comments

/retitle KCP should continuously reconcile endpoints in kubeadm ConfigMap Status

/milestone v0.3.x

@fabriziopandini This is partially covered here: https://github.com/kubernetes-sigs/cluster-api/pull/2841
When we detect a machine deletion (machine does not exist but ETCD has it as a member), it is being removed from kubeadm ConfigMap (and from ETCD member list).

I guess separating that logic by doing the same check starting from kubeadm ConfigMap makes sense. By this way, ETCD check only cleans up ETCD members instead of changing kubeadm ConfigMap.

I can combine this into that PR. WDYT?

Yes @sedefsavas, having a dedicated reconcile loop for the kubeadm ConfigMap IMO is more robust vs managing changes to the config map as a derivative of other workflows (scale, delete).

Hopefully, this can lead to the simplification of other workflows as well, but this is not the primary goal of this issue and we will get there incrementally.

Said that, considered #2841 I'll defer to @vincepri if to continue with this or bump priority/punt

Let's hold on to this until we have annotations removed from KCP, @benmoss has been working on it.

We can revisit the milestone once that's merged in, one reason I would like to have one refactor at a time is that we have few folks looking/reviewing at the codebase, and we should take our time to make sure the changes are correct and have been validated.

/priority important-soon

/milestone v0.3.9
/help

@vincepri:
This request has been marked as needing help from a contributor.

Please ensure the request meets the requirements listed here.

If this request no longer meets these requirements, the label can be removed
by commenting with the /remove-help command.

In response to this:

/milestone v0.3.9
/help

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

/milestone v0.3.10

Is anyone interested in doing this? We're cutting it a little close at this point, moving the milestone

/milestone v0.4.0

/area control-plane

@fabriziopandini I had a conversation with @sedefsavas about this, and she was of the view that I should wait for the KCP changes to be forward ported to the main branch before introducing any other changes to KCP controller.

cc: @vincepri

@srm09 I agree with @sedefsavas, I'm trying to unblock the latest forward porting PRs 馃槗

@fabriziopandini Have the PRs been forward ported into the main branch?

@srm09 just to avoid misunderstanding, which PR are you talking about, 3977

Going back to this issue, I think that we have reconcileEtcdMembers being called outside scale up/scale down this issue can be closed.
@srm09 @vincepri opinions?

Guess, the reconcileEtcdMembers does make sure that the list of machines is in sync with the actual CP nodes, so for the purpose of this issue, seems like the main concern should be taken care of.

/close

@fabriziopandini: Closing this issue.

In response to this:

/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

Was this page helpful?
0 / 5 - 0 ratings