Kubeadm: Use configmap locks on kubeadm-config before the update.

Created on 6 Sep 2018  路  16Comments  路  Source: kubernetes/kubeadm

https://github.com/kubernetes/kubernetes/pull/67944 changed how kubeadm uses the kubeadm-configConfigMap, that now gets updated at every kubeadm join --control-plane.

Therefore we should now implement configmap locks on kubeadm-config before the update.

rif code

areHA areupgrades help wanted kinfeature prioritimportant-soon

All 16 comments

/cc @liztio
and assigned to v1.12 milestone as per @timothysc comment

booting to 1.13

/cc @chuckha @detiber

JIC we have a race in cluster-api deployments.

There are a few hurdles that need to be handled before this can proceed.

  • Currently configmap locking is only exposed through leader election and the leader election interface doesn't expose a way to run a method that returns an error.
  • Creating a new configmap lock where the annotation does not already exist causes the configmap Data to be clobbered.
  • Currently when we update the kubeadm-config ConfigMap, we are using an Update, this needs to be switched to using Patch to avoid clobbering the annotation that contains the lock information.

One other thing that we'll want to consider for future implementation is that locking just around the ConfigMap update (even with fetching the latest ClusterStatus within the lock) is probably not sufficient, since there are other race conditions that could cause the configmap to change during the process (changes made to the ClusterConfiguration for example). The locking semantics will likely need to be extended to cover the entirety of the following workflows:

  • kubeadm init
  • kubeadm config upload
  • kubeadm upgrade
  • kubeadm join --control-plane

@detiber priority bump for 1.14 fixes.
/cc @fabriziopandini

feature freeze is next week and i don't think anyone will be able to work on this.
but the lack of CM locks can also be considered a bug.

@fabriziopandini @timothysc
i'm proposing to move this to the 1.15 milestone.

I can work on this one during this cycle still after I finish some other PR in the works for unit testing secret transfer during init/join. Depending on complexity/intrusiveness we can decide to postpone it to 1.15 if we see we are running out of time.

/assign

Is anyone actively working on this? I'd be happy to help here.

Hello @andrewrynhard! Yes, in fact I was going to look at this more in depth during this weekend, I should have something ready in the next days if everything goes fine.

@ereslibre Great! Any chance we would be able to cherry-pick for 1.14.1 or 1.14.2?

@ereslibre Great! Any chance we would be able to cherry-pick for 1.14.1 or 1.14.2?

I don't know, we can discuss when we have the patch and evaluate intrusiveness. Theoretically 1.14 HA support is not GA, so I don't know if we could strictly backport it, but personally I think it would make sense if rules allow it.

I have done some preliminary work on this, and while I have it working with leader election I have some comments that maybe other can shed light on to.

  1. Leader election looks more for long-living leader resolution, right? Right now I'm using a config map resource lock with a configmap named kubeadm-config-lock.
  2. It seems that the election logic tries to Create the config map lock so I don't think I can let the election annotate the kubeadm-config original config map, this is why I'm using the kubeadm-config-lock approach.
  3. This "extra" config map if let be created by the election logic could live forever in the cluster if its deletion is left to the client, ideally it should be owned by something TTL lived, so it would be garbage collected if left created because of a client/network error.

I will give a last go and possibly submit a PR tomorrow for early feedback. My main doubts are also around the main leader election API, because I don't know if it fits this use case very well.

I am also wondering if as an alternative it would be possible to use a create-only resource (again TTL lived) that can be used for exclusive access in the critical sections. E.g. a kubeadm-config-lock configmap (could be other kind of resource though) that excludes by creation. Something like:

  1. Try to create the kubeadm-config-lock configmap (annotation with identity holder as well, TTL-lived for e.g. 5 minutes)

    1. If I got a successful response (201), continue to the critical section

    2. If I got an error response, try again

  2. Run critical section
  3. Remove the kubeadm-config-lock configmap if my identity == identity holder

This has the benefit that if the client crashes or the network goes down and the last step is not performed by the client the configmap will eventually be GC.

added some comments on the PR.
my understanding is that we don't need a KEP for this, @fabriziopandini ?
https://github.com/kubernetes/enhancements/tree/master/keps/sig-cluster-lifecycle

I don't think, it's a feature on its own. It should be part of some HA KEP or be made a "bug". HA cannot be deployed in a safe manner without this.

@neolit123 if I remember well the HA KEP already states that we want to support automation and parallel execution, so IMO this doesn't need a separated KEP.

@neolit123 if I remember well the HA KEP already states that we want to support automation and parallel execution, so IMO this doesn't need a separated KEP.

thanks for confirming.

The PR is open for review, but I wrote a comment on https://github.com/kubernetes/kubernetes/pull/76241#issuecomment-484543261 because unless I'm missing something I think we don't need configmap locks (or leader election at all) for this matter. Please, have a look and tell me if I'm missing anything. Thanks!

Was this page helpful?
0 / 5 - 0 ratings