kubeadm upgrade ignores a configmap change in etcd's image repository if image has the same version

Created on 1 Sep 2020  路  17Comments  路  Source: kubernetes/kubeadm

I checked if there was anything similar by searching etcd or image, but didn't find something like this.

Is this a BUG REPORT or FEATURE REQUEST?

Bug report.

Versions

kubeadm version (use kubeadm version): master

Environment:

  • Kubernetes version (use kubectl version): 1.18 (not that it matters)
  • Cloud provider or hardware configuration: openstack (not that it matters)
  • OS (e.g. from /etc/os-release): SLES (not that it matters)
  • Kernel (e.g. uname -a): 5.3.18-24.9-default (not that it matters :p )
  • Others:

What happened?

If I have changed my image repository, and I execute an upgrade, and my version of etcd is the same in the new repository, then
the static pod manifest will not be updated according to the configmap. No change will be written to disk.

See also https://github.com/kubernetes/kubernetes/blob/master/cmd/kubeadm/app/phases/upgrade/staticpods.go#L325 for that logic (permalink: https://github.com/kubernetes/kubernetes/blob/e23d83eead3b5ae57731afb0209f4a2aaa4009dd/cmd/kubeadm/app/phases/upgrade/staticpods.go#L320-L326 ).

What you expected to happen?

For me, the static pod manifest should be rewritten if there was a change.
We should not only compare the version to prevent a downgrade, but also rewrite the manifest if its content is different than the expected content (we could hash the file's content and compare?)

How to reproduce it (as minimally and precisely as possible)?

Push an etcd image with the same version in two different repos.
Change the config map to point to the new image repository
kubeadm upgrade apply see the manifest not in sync with configmap.

Anything else we need to know?

This kind of problem might not be limited to etcd static pods.

kinbug

Most helpful comment

PRs welcome btw @ereslibre @evrardjp

All 17 comments

@ereslibre I think you might be interested by this.

/kind bug

Could we SHA1 the contents of the marshaled pod, and compare it to the SHA1 of the pod in disk, and continue if they differ? I can imagine sources of changes that could lead to "phantom" rewrite of the manifest in this case though (e.g. initial-cluster computation).

Hi.
Are you providing an etcd tag or only the repository?

I'd expect If the user has passed a custom tag, 'apply' skips etcd upgrade completely. It should stll rewrite the etcd manifest with your custom repo if you only passed the repo. Else we have a bug.

cc @rosti
I don't remember if we did the last changes in 1.18.

Are you providing an etcd tag or only the repository?

We are providing the kubeadm config with a slightly different imageRepository and the same etcd imageTag in this upgrade apply instance. https://github.com/kubernetes/kubernetes/blob/e23d83eead3b5ae57731afb0209f4a2aaa4009dd/cmd/kubeadm/app/phases/upgrade/staticpods.go#L320-L326 is skipping the etcd upgrade because the image tag of etcd didn't change (despite the imageRepository did in the configuration).

I think the image tag should not be passed, otherwise kubeadm thinks you
want a custom etcd. IIRC the logic is 'tag != ""' to skip upgrade.

Maybe we want to check if they match too.

I think the image tag should not be passed, otherwise kubeadm thinks you
want a custom etcd

Yes, for kubeadm it's all custom images all over the place in our use case, because we override the version and the imageRepository in all components.

This bug was triggered because in this case, we wanted to change the imageRepository of etcd while keeping the same imageTag . We always kubeadm upgrade apply with --config, and is in this configuration where we changed the imageRepository while keeping the same imageTag for etcd.

I don't think we need to checksum these. The problem here is in the following check.

    cmpResult, err := desiredEtcdVersion.Compare(currentEtcdVersionStr)
    if err != nil {
        return true, errors.Wrapf(err, "failed comparing the current etcd version %q to the desired one %q", currentEtcdVersionStr, desiredEtcdVersion)
    }
    if cmpResult < 1 { // The problem is here, this should be cmpResult < 0
        return false, errors.Errorf("the desired etcd version %q is not newer than the currently installed %q. Skipping etcd upgrade", desiredEtcdVersion, currentEtcdVersionStr)
    }

The idea behind this check in first place is to deny the ability to accidentally downgrade etcd and corrupt the data store.
The problem here is that this check skips regeneration of the static pod if the version is the same (therefore ignoring changes in other etcd related fields in the config).

so "apply" should regenerate the manifest (which also can restart etcd) even if the old tag == new tag?
(i.e. use < 0)

this guarantees that a new version of kubeadm that uses the old etcd can still apply certain manifest fixes that we introduced.

@neolit123 I think that's reasonable. +1 from me.

PRs welcome btw @ereslibre @evrardjp

Wanted to verify first with you, because there's still room for changes not being applied (e.g. if dataDir, extraArgs, servertCertSANs or peerCertSANs change on the configuration, they won't be persisted to disk unless the version changes) -- this is why I was proposing something that compared sums of the marshaled configurations.

@evrardjp or me will submit a PR with what we discussed though.

cc @rosti
I don't remember if we did the last changes in 1.18.

It's not 1.18 related :)

I don't think we need to checksum these. The problem here is in the following check.

  cmpResult, err := desiredEtcdVersion.Compare(currentEtcdVersionStr)
  if err != nil {
      return true, errors.Wrapf(err, "failed comparing the current etcd version %q to the desired one %q", currentEtcdVersionStr, desiredEtcdVersion)
  }
  if cmpResult < 1 { // The problem is here, this should be cmpResult < 0
      return false, errors.Errorf("the desired etcd version %q is not newer than the currently installed %q. Skipping etcd upgrade", desiredEtcdVersion, currentEtcdVersionStr)
  }

The idea behind this check in first place is to deny the ability to accidentally downgrade etcd and corrupt the data store.
The problem here is that this check skips regeneration of the static pod if the version is the same (therefore ignoring changes in other etcd related fields in the config).

Correct, that's exactly what I meant in the bug report. Sorry if that wasn't clear, I tend to be a bit terse.

so "apply" should regenerate the manifest (which also can restart etcd) even if the old tag == new tag?
(i.e. use < 0)
this guarantees that a new version of kubeadm that uses the old etcd can still apply certain manifest fixes that we introduced.

Awesome, that's what I thought, as first patch, to allow the same version by changing those conditions over the place. That's a first part, for me the second part would be to indeed update the files if manifest change, like proposed here above by @ereslibre .

I opened the PR fixing this.

@evrardjp The way kubeadm works is to always regenerate static pods and deployments upon upgrade to carry over the current settings from the config maps. No detection should be required here.

Thanks for the info! Sounds logical.

Was this page helpful?
0 / 5 - 0 ratings

Related issues

ep4eg picture ep4eg  路  3Comments

helphi picture helphi  路  3Comments

luxas picture luxas  路  3Comments

mlevesquedion picture mlevesquedion  路  3Comments

ggee picture ggee  路  4Comments