https://github.com/kubernetes/kubernetes/pull/67944 changed how kubeadm uses the kubeadm-configConfigMap, that now gets updated at every kubeadm join --control-plane.
Therefore we should now implement configmap locks on kubeadm-config before the update.
rif code
/cc @liztio
and assigned to v1.12 milestone as per @timothysc comment
booting to 1.13
/cc @chuckha @detiber
JIC we have a race in cluster-api deployments.
There are a few hurdles that need to be handled before this can proceed.
One other thing that we'll want to consider for future implementation is that locking just around the ConfigMap update (even with fetching the latest ClusterStatus within the lock) is probably not sufficient, since there are other race conditions that could cause the configmap to change during the process (changes made to the ClusterConfiguration for example). The locking semantics will likely need to be extended to cover the entirety of the following workflows:
@detiber priority bump for 1.14 fixes.
/cc @fabriziopandini
feature freeze is next week and i don't think anyone will be able to work on this.
but the lack of CM locks can also be considered a bug.
@fabriziopandini @timothysc
i'm proposing to move this to the 1.15 milestone.
I can work on this one during this cycle still after I finish some other PR in the works for unit testing secret transfer during init/join. Depending on complexity/intrusiveness we can decide to postpone it to 1.15 if we see we are running out of time.
/assign
Is anyone actively working on this? I'd be happy to help here.
Hello @andrewrynhard! Yes, in fact I was going to look at this more in depth during this weekend, I should have something ready in the next days if everything goes fine.
@ereslibre Great! Any chance we would be able to cherry-pick for 1.14.1 or 1.14.2?
@ereslibre Great! Any chance we would be able to cherry-pick for 1.14.1 or 1.14.2?
I don't know, we can discuss when we have the patch and evaluate intrusiveness. Theoretically 1.14 HA support is not GA, so I don't know if we could strictly backport it, but personally I think it would make sense if rules allow it.
I have done some preliminary work on this, and while I have it working with leader election I have some comments that maybe other can shed light on to.
kubeadm-config-lock.Create the config map lock so I don't think I can let the election annotate the kubeadm-config original config map, this is why I'm using the kubeadm-config-lock approach.I will give a last go and possibly submit a PR tomorrow for early feedback. My main doubts are also around the main leader election API, because I don't know if it fits this use case very well.
I am also wondering if as an alternative it would be possible to use a create-only resource (again TTL lived) that can be used for exclusive access in the critical sections. E.g. a kubeadm-config-lock configmap (could be other kind of resource though) that excludes by creation. Something like:
kubeadm-config-lock configmap (annotation with identity holder as well, TTL-lived for e.g. 5 minutes)201), continue to the critical sectionkubeadm-config-lock configmap if my identity == identity holderThis has the benefit that if the client crashes or the network goes down and the last step is not performed by the client the configmap will eventually be GC.
added some comments on the PR.
my understanding is that we don't need a KEP for this, @fabriziopandini ?
https://github.com/kubernetes/enhancements/tree/master/keps/sig-cluster-lifecycle
I don't think, it's a feature on its own. It should be part of some HA KEP or be made a "bug". HA cannot be deployed in a safe manner without this.
@neolit123 if I remember well the HA KEP already states that we want to support automation and parallel execution, so IMO this doesn't need a separated KEP.
@neolit123 if I remember well the HA KEP already states that we want to support automation and parallel execution, so IMO this doesn't need a separated KEP.
thanks for confirming.
The PR is open for review, but I wrote a comment on https://github.com/kubernetes/kubernetes/pull/76241#issuecomment-484543261 because unless I'm missing something I think we don't need configmap locks (or leader election at all) for this matter. Please, have a look and tell me if I'm missing anything. Thanks!