kubeadm upgrade ignores a configmap change in etcd's image repository if image has the same version

Created on 1 Sep 2020 · 17Comments · Source: kubernetes/kubeadm

I checked if there was anything similar by searching etcd or image, but didn't find something like this.

Is this a BUG REPORT or FEATURE REQUEST?

Bug report.

Versions

kubeadm version (use kubeadm version): master

Environment:

Kubernetes version (use kubectl version): 1.18 (not that it matters)
Cloud provider or hardware configuration: openstack (not that it matters)
OS (e.g. from /etc/os-release): SLES (not that it matters)
Kernel (e.g. uname -a): 5.3.18-24.9-default (not that it matters :p )
Others:

What happened?

If I have changed my image repository, and I execute an upgrade, and my version of etcd is the same in the new repository, then
the static pod manifest will not be updated according to the configmap. No change will be written to disk.

See also https://github.com/kubernetes/kubernetes/blob/master/cmd/kubeadm/app/phases/upgrade/staticpods.go#L325 for that logic (permalink: https://github.com/kubernetes/kubernetes/blob/e23d83eead3b5ae57731afb0209f4a2aaa4009dd/cmd/kubeadm/app/phases/upgrade/staticpods.go#L320-L326 ).

What you expected to happen?

For me, the static pod manifest should be rewritten if there was a change.
We should not only compare the version to prevent a downgrade, but also rewrite the manifest if its content is different than the expected content (we could hash the file's content and compare?)

How to reproduce it (as minimally and precisely as possible)?

Push an etcd image with the same version in two different repos.
Change the config map to point to the new image repository
kubeadm upgrade apply see the manifest not in sync with configmap.

Anything else we need to know?

This kind of problem might not be limited to etcd static pods.

kinbug

Source

evrardjp

Most helpful comment

PRs welcome btw @ereslibre @evrardjp

neolit123 on 1 Sep 2020

❤1 👍1

All 17 comments

@ereslibre I think you might be interested by this.

evrardjp on 1 Sep 2020

/kind bug

Could we SHA1 the contents of the marshaled pod, and compare it to the SHA1 of the pod in disk, and continue if they differ? I can imagine sources of changes that could lead to "phantom" rewrite of the manifest in this case though (e.g. initial-cluster computation).

ereslibre on 1 Sep 2020

Hi.
Are you providing an etcd tag or only the repository?

I'd expect If the user has passed a custom tag, 'apply' skips etcd upgrade completely. It should stll rewrite the etcd manifest with your custom repo if you only passed the repo. Else we have a bug.

neolit123 on 1 Sep 2020

cc @rosti
I don't remember if we did the last changes in 1.18.

neolit123 on 1 Sep 2020

Are you providing an etcd tag or only the repository?

We are providing the kubeadm config with a slightly different imageRepository and the same etcd imageTag in this upgrade apply instance. https://github.com/kubernetes/kubernetes/blob/e23d83eead3b5ae57731afb0209f4a2aaa4009dd/cmd/kubeadm/app/phases/upgrade/staticpods.go#L320-L326 is skipping the etcd upgrade because the image tag of etcd didn't change (despite the imageRepository did in the configuration).

ereslibre on 1 Sep 2020

I think the image tag should not be passed, otherwise kubeadm thinks you
want a custom etcd. IIRC the logic is 'tag != ""' to skip upgrade.

Maybe we want to check if they match too.

neolit123 on 1 Sep 2020

I think the image tag should not be passed, otherwise kubeadm thinks you
want a custom etcd

Yes, for kubeadm it's all custom images all over the place in our use case, because we override the version and the imageRepository in all components.

This bug was triggered because in this case, we wanted to change the imageRepository of etcd while keeping the same imageTag . We always kubeadm upgrade apply with --config, and is in this configuration where we changed the imageRepository while keeping the same imageTag for etcd.

ereslibre on 1 Sep 2020

I don't think we need to checksum these. The problem here is in the following check.

    cmpResult, err := desiredEtcdVersion.Compare(currentEtcdVersionStr)
    if err != nil {
        return true, errors.Wrapf(err, "failed comparing the current etcd version %q to the desired one %q", currentEtcdVersionStr, desiredEtcdVersion)
    }
    if cmpResult < 1 { // The problem is here, this should be cmpResult < 0
        return false, errors.Errorf("the desired etcd version %q is not newer than the currently installed %q. Skipping etcd upgrade", desiredEtcdVersion, currentEtcdVersionStr)
    }

The idea behind this check in first place is to deny the ability to accidentally downgrade etcd and corrupt the data store.
The problem here is that this check skips regeneration of the static pod if the version is the same (therefore ignoring changes in other etcd related fields in the config).

rosti on 1 Sep 2020

so "apply" should regenerate the manifest (which also can restart etcd) even if the old tag == new tag?
(i.e. use < 0)

this guarantees that a new version of kubeadm that uses the old etcd can still apply certain manifest fixes that we introduced.

neolit123 on 1 Sep 2020

@neolit123 I think that's reasonable. +1 from me.

ereslibre on 1 Sep 2020

PRs welcome btw @ereslibre @evrardjp

neolit123 on 1 Sep 2020

❤1 👍1

Wanted to verify first with you, because there's still room for changes not being applied (e.g. if dataDir, extraArgs, servertCertSANs or peerCertSANs change on the configuration, they won't be persisted to disk unless the version changes) -- this is why I was proposing something that compared sums of the marshaled configurations.

@evrardjp or me will submit a PR with what we discussed though.

ereslibre on 1 Sep 2020

👍1

cc @rosti
I don't remember if we did the last changes in 1.18.

It's not 1.18 related :)

evrardjp on 1 Sep 2020

I don't think we need to checksum these. The problem here is in the following check.
  cmpResult, err := desiredEtcdVersion.Compare(currentEtcdVersionStr)
  if err != nil {
      return true, errors.Wrapf(err, "failed comparing the current etcd version %q to the desired one %q", currentEtcdVersionStr, desiredEtcdVersion)
  }
  if cmpResult < 1 { // The problem is here, this should be cmpResult < 0
      return false, errors.Errorf("the desired etcd version %q is not newer than the currently installed %q. Skipping etcd upgrade", desiredEtcdVersion, currentEtcdVersionStr)
  }
The idea behind this check in first place is to deny the ability to accidentally downgrade etcd and corrupt the data store.
The problem here is that this check skips regeneration of the static pod if the version is the same (therefore ignoring changes in other etcd related fields in the config).

Correct, that's exactly what I meant in the bug report. Sorry if that wasn't clear, I tend to be a bit terse.

evrardjp on 1 Sep 2020

so "apply" should regenerate the manifest (which also can restart etcd) even if the old tag == new tag?
(i.e. use < 0)
this guarantees that a new version of kubeadm that uses the old etcd can still apply certain manifest fixes that we introduced.

Awesome, that's what I thought, as first patch, to allow the same version by changing those conditions over the place. That's a first part, for me the second part would be to indeed update the files if manifest change, like proposed here above by @ereslibre .

evrardjp on 1 Sep 2020

I opened the PR fixing this.

@evrardjp The way kubeadm works is to always regenerate static pods and deployments upon upgrade to carry over the current settings from the config maps. No detection should be required here.

rosti on 1 Sep 2020

👍1

Thanks for the info! Sounds logical.

evrardjp on 1 Sep 2020

Was this page helpful?

0 / 5 - 0 ratings