Kubeadm: Updating imageRepository and kubeadm config upgrade does not update registry for etcd

Created on 23 Mar 2020 · 14Comments · Source: kubernetes/kubeadm

Opening this ticket as per https://github.com/kubernetes/kubernetes/issues/89366

Versions

kubeadm version (use kubeadm version):
tested with kubeadm 1.15.5 and 1.16.4

Environment:

Kubernetes version (use kubectl version):
1.15.5 and 1.16.4
Cloud provider or hardware configuration:
on-prem, virtualized
OS (e.g: cat /etc/os-release):
NAME="Red Hat Enterprise Linux Server"
VERSION="7.7 (Maipo)"
Kernel (e.g. uname -a):
3.10.0-1062.12.1.el7.x86_64 #1 SMP Thu Dec 12 06:44:49 EST 2019 x86_64 x86_64 x86_64 GNU/Linux
Install tools:
kubeadm
Network plugin and version (if this is a network-related bug):
canal

What happened:
I'm switching a kubeadm initialized cluster to use a private repo via the imageRepository key in the kubeadm-config.

What you expected to happen:
I expect all control plane images to use the new repo and for those pods to be re-created and pull down the images via the private repo.

After modifying the kubeadm-config.yaml file i've created, i run the command: kubeadm upgrade apply --config kubeadm-config.yaml -f and the cluster performs the change.

Afterwards, if i run the command kubectl describe pods -n kube-system | grep Image:, i see the following:

λ kubectl describe pods -n kube-system | grep Image:
    ...
    Image:         my.repo.com:9999/coredns:1.6.2
    Image:         my.repo.com:9999/coredns:1.6.2
    Image:         k8s.gcr.io/etcd:3.3.15-0
    Image:         my.repo.com:9999/kube-apiserver:v1.16.4
    Image:         my.repo.com:9999/kube-controller-manager:v1.16.4
    Image:         my.repo.com:9999/kube-proxy:v1.16.4
    Image:         my.repo.com:9999/kube-proxy:v1.16.4
    Image:         my.repo.com:9999/kube-proxy:v1.16.4
    Image:         my.repo.com:9999/kube-scheduler:v1.16.4

How to reproduce it (as minimally and precisely as possible):

sudo kubeadm config view > kubeadm-config.yaml
edit kubeadm-config.yaml and replace k8s.gcr.io with your repo
sudo kubeadm upgrade apply --config kubeadm-config.yaml

Anything else we need to know?:
I'm lead to believe this is a bug because if I perform the following command: kubeadm config images list --image-repository my.repo.com:9999 we can see that etcd is listed with the new repo name - and if i were to perform a pull it would pull down etcd from the private repo as well.

$ kubeadm config images list --image-repository my.repo.com:9999
I0323 08:38:13.452370   30381 version.go:251] remote version is much newer: v1.17.4; falling back to: stable-1.16
my.repo.com:9999/kube-apiserver:v1.16.8
my.repo.com:9999/kube-controller-manager:v1.16.8
my.repo.com:9999/kube-scheduler:v1.16.8
my.repo.com:9999/kube-proxy:v1.16.8
my.repo.com:9999/pause:3.1
my.repo.com:9999/etcd:3.3.15-0
my.repo.com:9999/coredns:1.6.2

Other:

kubectl get cm -n kube-system kubeadm-config -o yaml

$ kubectl get cm -n kube-system kubeadm-config -o yaml
apiVersion: v1
data:
  ClusterConfiguration: |
    apiServer:
      extraArgs:
        authorization-mode: Node,RBAC
      timeoutForControlPlane: 4m0s
    apiVersion: kubeadm.k8s.io/v1beta2
    certificatesDir: /etc/kubernetes/pki
    clusterName: kubernetes
    controllerManager: {}
    dns:
      type: CoreDNS
    etcd:
      local:
        dataDir: /var/lib/etcd
    imageRepository: my.repo.com:9999
    kind: ClusterConfiguration
    kubernetesVersion: v1.16.4
    networking:
      dnsDomain: cluster.local
      podSubnet: 10.244.0.0/16
      serviceSubnet: 10.96.0.0/12
    scheduler: {}
  ClusterStatus: |
    apiEndpoints:
      vtorcordv01:
        advertiseAddress: x.x.x.x
        bindPort: 6443
    apiVersion: kubeadm.k8s.io/v1beta2
    kind: ClusterStatus
kind: ConfigMap
metadata:
  creationTimestamp: "2020-01-08T21:35:15Z"
  name: kubeadm-config
  namespace: kube-system
  resourceVersion: "9001918"
  selfLink: /api/v1/namespaces/kube-system/configmaps/kubeadm-config
  uid: c7b86ba7-b3a3-4659-9d02-651b41b9b6fd

`kubeadm-config.yaml`

$ kubeadm config view
apiServer:
  extraArgs:
    authorization-mode: Node,RBAC
  timeoutForControlPlane: 4m0s
apiVersion: kubeadm.k8s.io/v1beta2
certificatesDir: /etc/kubernetes/pki
clusterName: kubernetes
controllerManager: {}
dns:
  type: CoreDNS
etcd:
  local:
    dataDir: /var/lib/etcd
imageRepository: my.repo.com:9999
kind: ClusterConfiguration
kubernetesVersion: v1.16.4
networking:
  dnsDomain: cluster.local
  podSubnet: 10.244.0.0/16
  serviceSubnet: 10.96.0.0/12
scheduler: {}

I'll also add that if i look in the `/etc/kubernetes/manifest` directory, all of the files have been updated except etcd: I can also see that the manifests were updated, save for the etcd.yaml in /etc/kubernetes/etcd

$ sudo cat kube-*.yaml etcd.yaml | grep image:
    image: my.repo.com:9999/kube-apiserver:v1.16.4
    image: my.repo.com:9999/kube-controller-manager:v1.16.4
    image: my.repo.com:9999/kube-scheduler:v1.16.4
    image: k8s.gcr.io/etcd:3.3.15-0

areupgrades kinbug prioritawaiting-more-evidence

Source

jeanluclariviere

Most helpful comment

ok, in the case of same k8s version, this can happening because kubeadm upgrade will skip the etcd upgrade and not write a new etcd.yaml if the etcd version is the same (obviously the same between 1.16.4 and 1.16.4).

like mentioned already kubeadm upgrade should not be used for cluster reconfiguration and this is noted in our docs:
https://kubernetes.io/docs/reference/setup-tools/kubeadm/kubeadm-upgrade/#kubeadm-upgrade-guidance

your alternative is:

edit the kubeadm-config config map to update the imageRepository
sed / patch the /etc/kubernetes/manifests files with the new image URLs
after the files have changed the kubelet will restart the components.

neolit123 on 23 Mar 2020

👍2

All 14 comments

some notes:

this could be a "trailing" bug caused by the 1.14->15->16 upgrades
--config on upgrade is unsupported by the kubeadm team

i will assign @ereslibre as they have a similar use case:
/assign @ereslibre

neolit123 on 23 Mar 2020

I undid the kubeadm upgrade apply --config kubeadm-config.yaml and re-ran the upgrade according to the supported manner. I edited the existing kubeadm-config configmap and re-ran the upgrade a la kubeadm upgrade apply -f v1.16.4

I think I've isolated the issue, I should have been more careful, re-reading the upgrade logs gives the following.

[upgrade/etcd] Non fatal issue encountered during upgrade: the desired etcd version for this Kubernetes version "v1.16.4" is "3.3.15-0", but the current etcd version is "3.3.15". Won't downgrade etcd, instead just continue

What I don't understand is that the version of etcd i'm running IS k8s.gcr.io/etcd:3.3.15-0

λ kubectl describe pods -n kube-system | grep Image: | grep etcd
    Image:         k8s.gcr.io/etcd:3.3.15-0

jeanluclariviere on 23 Mar 2020

@neolit123 - I did some digging and it seems like this an additional symptom of https://github.com/kubernetes/kubeadm/issues/2058. Until this particular issue is addressed, using the upgrade command as a means of updating the imageRepository will trigger this, causing the repository not to get updated, which is incorrect.

As a workaround, I've tried manually editing the /etc/kubernetes/manifests/etcd.yaml with the external repo and this does redeploy the pod, but reviewing the events gives the following, so i'm not entirely sure if this can be considered successful.

  Warning  Failed                  117s                 kubelet, server01  Error: open /var/lib/kubelet/pods/b2b8801cb18d76a738104d106bcaa26e/etc-hosts: no such file or directory
  Warning  FailedCreatePodSandBox  117s                 kubelet, server01  Failed create pod sandbox: rpc error: code = Unknown desc = failed to create a sandbox for pod "etcd-server01": Error response from daemon: Conflict. The container name "/k8s_POD_etcd-server01-system_b2b8801cb18d76a738104d106bcaa26e_0" is already in use by container "5423513db248ad71087a71bc1921c0c49177c30891260090a2442b6bfe56c727". You have to remove (or rename) that container to be able to reuse that name.
  Normal   Pulled                  116s (x2 over 117s)  kubelet, server01  Container image "my.repo.com:9999/etcd:3.3.15-0" already present on machine
  Normal   Created                 116s                 kubelet, server01  Created container etcd
  Normal   Started                 116s                 kubelet, server01  Started container etcd

Edit: I should add that the etcd pod does come up and is ready, logs look clean too.

jeanluclariviere on 23 Mar 2020

[upgrade/etcd] Non fatal issue encountered during upgrade: the desired etcd version for this Kubernetes version "v1.16.4" is "3.3.15-0", but the current etcd version is "3.3.15". Won't downgrade etcd, instead just continue

ok, i see what is happening now. this is caused by the -0 bug mentioned in #2058.
thanks for confirming.

@neolit123 - I did some digging and it seems like this an additional symptom of #2058. Until this particular issue is addressed, using the upgrade command as a means of updating the imageRepository will trigger this, causing the repository not to get updated, which is incorrect.

without the -0 bug --config reconfiguration of the repository should work fine, but --config is still unsupported (and in fact dangerous in some cases). let's fold this issue in #2058.

As a workaround, I've tried manually editing the /etc/kubernetes/manifests/etcd.yaml with the external repo and this does redeploy the pod, but reviewing the events gives the following, so i'm not entirely sure if this can be considered successful.

looks like the manual edit was successful: Started container etcd

neolit123 on 23 Mar 2020

/reopen

@neolit123 Sorry to trouble you again, we may be be wrong. I just tested the upgrade in a 1.15.5 cluster and I do not encounter the -0 bug but the etcd image still does not update its repo.

Here is the output from the upgrade, notice there is no mention of the non-fatal error for etcd.

$ sudo kubeadm upgrade apply v1.15.5
[upgrade/config] Making sure the configuration is correct:
[upgrade/config] Reading configuration from the cluster...
[upgrade/config] FYI: You can look at this config file with 'kubectl -n kube-system get cm kubeadm-config -oyaml'
[preflight] Running pre-flight checks.
[upgrade] Making sure the cluster is healthy:
[upgrade/version] You have chosen to change the cluster version to "v1.15.5"
[upgrade/versions] Cluster version: v1.15.5
[upgrade/versions] kubeadm version: v1.15.5
[upgrade/confirm] Are you sure you want to proceed with the upgrade? [y/N]: y
[upgrade/prepull] Will prepull images for components [kube-apiserver kube-controller-manager kube-scheduler etcd]
[upgrade/prepull] Prepulling image for component etcd.
[upgrade/prepull] Prepulling image for component kube-apiserver.
[upgrade/prepull] Prepulling image for component kube-controller-manager.
[upgrade/prepull] Prepulling image for component kube-scheduler.
[apiclient] Found 1 Pods for label selector k8s-app=upgrade-prepull-kube-controller-manager
[apiclient] Found 0 Pods for label selector k8s-app=upgrade-prepull-kube-scheduler
[apiclient] Found 1 Pods for label selector k8s-app=upgrade-prepull-kube-apiserver
[apiclient] Found 0 Pods for label selector k8s-app=upgrade-prepull-etcd
[apiclient] Found 1 Pods for label selector k8s-app=upgrade-prepull-kube-scheduler
[apiclient] Found 1 Pods for label selector k8s-app=upgrade-prepull-etcd
[upgrade/prepull] Prepulled image for component kube-apiserver.
[upgrade/prepull] Prepulled image for component etcd.
[upgrade/prepull] Prepulled image for component kube-scheduler.
[upgrade/prepull] Prepulled image for component kube-controller-manager.
[upgrade/prepull] Successfully prepulled the images for all the control plane components
[upgrade/apply] Upgrading your Static Pod-hosted control plane to version "v1.15.5"...
Static pod: kube-apiserver-server01 hash: 9267a3edb3491a0ee73d627e3657d187
Static pod: kube-controller-manager-server01 hash: be3533d84c14ae17322afe7bb04742b6
Static pod: kube-scheduler-server01 hash: 131c3f63daec7c0750818f64a2f75d20
[upgrade/etcd] Upgrading to TLS for etcd
[upgrade/staticpods] Writing new Static Pod manifests to "/etc/kubernetes/tmp/kubeadm-upgraded-manifests433449531"
[upgrade/staticpods] Preparing for "kube-apiserver" upgrade
[upgrade/staticpods] Renewing apiserver certificate
[upgrade/staticpods] Renewing apiserver-kubelet-client certificate
[upgrade/staticpods] Renewing front-proxy-client certificate
[upgrade/staticpods] Renewing apiserver-etcd-client certificate
[upgrade/staticpods] Moved new manifest to "/etc/kubernetes/manifests/kube-apiserver.yaml" and backed up old manifest to "/etc/kubernetes/tmp/kubeadm-backup-manifests-2020-03-23-11-27-40/kube-apiserver.yaml"
[upgrade/staticpods] Waiting for the kubelet to restart the component
[upgrade/staticpods] This might take a minute or longer depending on the component/version gap (timeout 5m0s)
Static pod: kube-apiserver-server01 hash: 9267a3edb3491a0ee73d627e3657d187
Static pod: kube-apiserver-server01 hash: 13fe9a8a6452f9b8e8e3bd1c4a47bb11
[apiclient] Found 1 Pods for label selector component=kube-apiserver
[upgrade/staticpods] Component "kube-apiserver" upgraded successfully!
[upgrade/staticpods] Preparing for "kube-controller-manager" upgrade
[upgrade/staticpods] Renewing controller-manager.conf certificate
[upgrade/staticpods] Moved new manifest to "/etc/kubernetes/manifests/kube-controller-manager.yaml" and backed up old manifest to "/etc/kubernetes/tmp/kubeadm-backup-manifests-2020-03-23-11-27-40/kube-controller-manager.yaml"
[upgrade/staticpods] Waiting for the kubelet to restart the component
[upgrade/staticpods] This might take a minute or longer depending on the component/version gap (timeout 5m0s)
Static pod: kube-controller-manager-server01 hash: be3533d84c14ae17322afe7bb04742b6
Static pod: kube-controller-manager-server01 hash: 90467a18f355c24674d48145b887587c
[apiclient] Found 1 Pods for label selector component=kube-controller-manager
[upgrade/staticpods] Component "kube-controller-manager" upgraded successfully!
[upgrade/staticpods] Preparing for "kube-scheduler" upgrade
[upgrade/staticpods] Renewing scheduler.conf certificate
[upgrade/staticpods] Moved new manifest to "/etc/kubernetes/manifests/kube-scheduler.yaml" and backed up old manifest to "/etc/kubernetes/tmp/kubeadm-backup-manifests-2020-03-23-11-27-40/kube-scheduler.yaml"
[upgrade/staticpods] Waiting for the kubelet to restart the component
[upgrade/staticpods] This might take a minute or longer depending on the component/version gap (timeout 5m0s)
Static pod: kube-scheduler-server01 hash: 131c3f63daec7c0750818f64a2f75d20
Static pod: kube-scheduler-server01 hash: e3f95285b1258ca60dc8c86892e940b9
[apiclient] Found 1 Pods for label selector component=kube-scheduler
[upgrade/staticpods] Component "kube-scheduler" upgraded successfully!
[upload-config] Storing the configuration used in ConfigMap "kubeadm-config" in the "kube-system" Namespace
[kubelet] Creating a ConfigMap "kubelet-config-1.15" in namespace kube-system with the configuration for the kubelets in the cluster
[kubelet-start] Downloading configuration for the kubelet from the "kubelet-config-1.15" ConfigMap in the kube-system namespace
[kubelet-start] Writing kubelet configuration to file "/var/lib/kubelet/config.yaml"
[bootstrap-token] configured RBAC rules to allow Node Bootstrap tokens to post CSRs in order for nodes to get long term certificate credentials
[bootstrap-token] configured RBAC rules to allow the csrapprover controller automatically approve CSRs from a Node Bootstrap Token
[bootstrap-token] configured RBAC rules to allow certificate rotation for all node client certificates in the cluster
[addons] Applied essential addon: CoreDNS
[addons] Applied essential addon: kube-proxy

[upgrade/successful] SUCCESS! Your cluster was upgraded to "v1.15.5". Enjoy!

And finally, i can see the version of etcd that's running is still using the old repo:

$ kubectl describe pods -n kube-system | grep Image:
    ...
    Image:         my.repo.com:9999/coredns:1.3.1
    Image:         my.repo.com:9999/coredns:1.3.1
    Image:         k8s.gcr.io/etcd:3.3.10
    Image:         my.repo.com:9999/kube-apiserver:v1.15.5
    Image:         my.repo.com:9999/kube-controller-manager:v1.15.5
    Image:         my.repo.com:9999/kube-proxy:v1.15.5
    Image:         my.repo.com:9999/kube-proxy:v1.15.5
    Image:         my.repo.com:9999/kube-proxy:v1.15.5
    Image:         my.repo.com:9999/kube-scheduler:v1.15.5

Do we know if the imageRepository is being used for the etcd image? Should the /etc/kubernetes/manifests/etcd.yaml be upgraded as a result of the upgrade command?

Just a reminder that i'm no longer using the --config command which I understand is unsopported.

jeanluclariviere on 23 Mar 2020

@jeanluclariviere: Reopened this issue.

In response to this:

/reopen

@neolit123 Sorry to trouble you again, we may be be wrong. I just tested the upgrade in a 1.15.5 cluster and I do not encounter the -0 bug but the etcd image still does not update its repo.

Here is the output from the upgrade, notice there is no mention of the non-fatal error for etcd.

$ sudo kubeadm upgrade apply v1.15.5
[upgrade/config] Making sure the configuration is correct:
[upgrade/config] Reading configuration from the cluster...
[upgrade/config] FYI: You can look at this config file with 'kubectl -n kube-system get cm kubeadm-config -oyaml'
[preflight] Running pre-flight checks.
[upgrade] Making sure the cluster is healthy:
[upgrade/version] You have chosen to change the cluster version to "v1.15.5"
[upgrade/versions] Cluster version: v1.15.5
[upgrade/versions] kubeadm version: v1.15.5
[upgrade/confirm] Are you sure you want to proceed with the upgrade? [y/N]: y
[upgrade/prepull] Will prepull images for components [kube-apiserver kube-controller-manager kube-scheduler etcd]
[upgrade/prepull] Prepulling image for component etcd.
[upgrade/prepull] Prepulling image for component kube-apiserver.
[upgrade/prepull] Prepulling image for component kube-controller-manager.
[upgrade/prepull] Prepulling image for component kube-scheduler.
[apiclient] Found 1 Pods for label selector k8s-app=upgrade-prepull-kube-controller-manager
[apiclient] Found 0 Pods for label selector k8s-app=upgrade-prepull-kube-scheduler
[apiclient] Found 1 Pods for label selector k8s-app=upgrade-prepull-kube-apiserver
[apiclient] Found 0 Pods for label selector k8s-app=upgrade-prepull-etcd
[apiclient] Found 1 Pods for label selector k8s-app=upgrade-prepull-kube-scheduler
[apiclient] Found 1 Pods for label selector k8s-app=upgrade-prepull-etcd
[upgrade/prepull] Prepulled image for component kube-apiserver.
[upgrade/prepull] Prepulled image for component etcd.
[upgrade/prepull] Prepulled image for component kube-scheduler.
[upgrade/prepull] Prepulled image for component kube-controller-manager.
[upgrade/prepull] Successfully prepulled the images for all the control plane components
[upgrade/apply] Upgrading your Static Pod-hosted control plane to version "v1.15.5"...
Static pod: kube-apiserver-server01 hash: 9267a3edb3491a0ee73d627e3657d187
Static pod: kube-controller-manager-server01 hash: be3533d84c14ae17322afe7bb04742b6
Static pod: kube-scheduler-server01 hash: 131c3f63daec7c0750818f64a2f75d20
[upgrade/etcd] Upgrading to TLS for etcd
[upgrade/staticpods] Writing new Static Pod manifests to "/etc/kubernetes/tmp/kubeadm-upgraded-manifests433449531"
[upgrade/staticpods] Preparing for "kube-apiserver" upgrade
[upgrade/staticpods] Renewing apiserver certificate
[upgrade/staticpods] Renewing apiserver-kubelet-client certificate
[upgrade/staticpods] Renewing front-proxy-client certificate
[upgrade/staticpods] Renewing apiserver-etcd-client certificate
[upgrade/staticpods] Moved new manifest to "/etc/kubernetes/manifests/kube-apiserver.yaml" and backed up old manifest to "/etc/kubernetes/tmp/kubeadm-backup-manifests-2020-03-23-11-27-40/kube-apiserver.yaml"
[upgrade/staticpods] Waiting for the kubelet to restart the component
[upgrade/staticpods] This might take a minute or longer depending on the component/version gap (timeout 5m0s)
Static pod: kube-apiserver-server01 hash: 9267a3edb3491a0ee73d627e3657d187
Static pod: kube-apiserver-server01 hash: 13fe9a8a6452f9b8e8e3bd1c4a47bb11
[apiclient] Found 1 Pods for label selector component=kube-apiserver
[upgrade/staticpods] Component "kube-apiserver" upgraded successfully!
[upgrade/staticpods] Preparing for "kube-controller-manager" upgrade
[upgrade/staticpods] Renewing controller-manager.conf certificate
[upgrade/staticpods] Moved new manifest to "/etc/kubernetes/manifests/kube-controller-manager.yaml" and backed up old manifest to "/etc/kubernetes/tmp/kubeadm-backup-manifests-2020-03-23-11-27-40/kube-controller-manager.yaml"
[upgrade/staticpods] Waiting for the kubelet to restart the component
[upgrade/staticpods] This might take a minute or longer depending on the component/version gap (timeout 5m0s)
Static pod: kube-controller-manager-server01 hash: be3533d84c14ae17322afe7bb04742b6
Static pod: kube-controller-manager-server01 hash: 90467a18f355c24674d48145b887587c
[apiclient] Found 1 Pods for label selector component=kube-controller-manager
[upgrade/staticpods] Component "kube-controller-manager" upgraded successfully!
[upgrade/staticpods] Preparing for "kube-scheduler" upgrade
[upgrade/staticpods] Renewing scheduler.conf certificate
[upgrade/staticpods] Moved new manifest to "/etc/kubernetes/manifests/kube-scheduler.yaml" and backed up old manifest to "/etc/kubernetes/tmp/kubeadm-backup-manifests-2020-03-23-11-27-40/kube-scheduler.yaml"
[upgrade/staticpods] Waiting for the kubelet to restart the component
[upgrade/staticpods] This might take a minute or longer depending on the component/version gap (timeout 5m0s)
Static pod: kube-scheduler-server01 hash: 131c3f63daec7c0750818f64a2f75d20
Static pod: kube-scheduler-server01 hash: e3f95285b1258ca60dc8c86892e940b9
[apiclient] Found 1 Pods for label selector component=kube-scheduler
[upgrade/staticpods] Component "kube-scheduler" upgraded successfully!
[upload-config] Storing the configuration used in ConfigMap "kubeadm-config" in the "kube-system" Namespace
[kubelet] Creating a ConfigMap "kubelet-config-1.15" in namespace kube-system with the configuration for the kubelets in the cluster
[kubelet-start] Downloading configuration for the kubelet from the "kubelet-config-1.15" ConfigMap in the kube-system namespace
[kubelet-start] Writing kubelet configuration to file "/var/lib/kubelet/config.yaml"
[bootstrap-token] configured RBAC rules to allow Node Bootstrap tokens to post CSRs in order for nodes to get long term certificate credentials
[bootstrap-token] configured RBAC rules to allow the csrapprover controller automatically approve CSRs from a Node Bootstrap Token
[bootstrap-token] configured RBAC rules to allow certificate rotation for all node client certificates in the cluster
[addons] Applied essential addon: CoreDNS
[addons] Applied essential addon: kube-proxy

[upgrade/successful] SUCCESS! Your cluster was upgraded to "v1.15.5". Enjoy!

And finally, i can see the version of etcd that's running is still using the old repo:

$ kubectl describe pods -n kube-system | grep Image:
   ...
   Image:         my.repo.com:9999/coredns:1.3.1
   Image:         my.repo.com:9999/coredns:1.3.1
   Image:         k8s.gcr.io/etcd:3.3.10
   Image:         my.repo.com:9999/kube-apiserver:v1.15.5
   Image:         my.repo.com:9999/kube-controller-manager:v1.15.5
   Image:         my.repo.com:9999/kube-proxy:v1.15.5
   Image:         my.repo.com:9999/kube-proxy:v1.15.5
   Image:         my.repo.com:9999/kube-proxy:v1.15.5
   Image:         my.repo.com:9999/kube-scheduler:v1.15.5

Do we know if the imageRepository is being used for the etcd image? Should the /etc/kubernetes/manifests/etcd.yaml be upgraded as a result of the upgrade command?

Just a reminder that i'm no longer using the --config command which I understand is unsopported.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

k8s-ci-robot on 23 Mar 2020

Do we know if the imageRepository is being used for the etcd image?

it should be used unless the user specified overrides under the etcd: field.

Should the /etc/kubernetes/manifests/etcd.yaml be upgraded as a result of the upgrade command?

if etcd.yaml on disk is using an image with registry X, the kubeadm upgrade command (without --config) should still technically generate a new etcd.yaml that uses the registry stored in the ClusterConfiguration and ignore the value in the yaml on disk.

i'm going to debug this later today.

neolit123 on 23 Mar 2020

Thanks for confirming, appreciate you looking in to this - here i can see that all files are being updated except etcd.yaml during the upgrade:

$ ls -al
total 24
drwxr-xr-x. 2 root root 4096 Mar 23 11:35 .
drwxr-xr-x. 8 root root 4096 Dec  2 09:52 ..
-rw-------. 1 root root 1925 May 23  2019 etcd.yaml
-rw-------. 1 root root 2626 Mar 23 11:27 kube-apiserver.yaml
-rw-------. 1 root root 2504 Mar 23 11:27 kube-controller-manager.yaml
-rw-------. 1 root root 1008 Mar 23 11:27 kube-scheduler.yaml

jeanluclariviere on 23 Mar 2020

we shouldn't be debugging 1.15* as it is going out of support with the release 1.18 (maybe tomorrow).
i'm going to verify if etcd.yaml is being updated with a custom registry for a 1.16->1.17 upgrade, where the etcd server version differs between releases:
https://github.com/kubernetes/kubernetes/blob/release-1.17/cmd/kubeadm/app/constants/constants.go#L422-L427

neolit123 on 23 Mar 2020

That makes sense, we're looking to get our clusters upgraded to 1.16-1.17 anyway, but the hope was to start leveraging our private image registries instead of preloading images on all the nodes - we're mostly on-prem / offline installs and upgrades.

jeanluclariviere on 23 Mar 2020

my test:

created a kubeadm cluster with 1.16.8 using this config:

apiVersion: kubeadm.k8s.io/v1beta2
kind: ClusterConfiguration
kubernetesVersion: v1.16.8
imageRepository: k8s.gcr.io
networking:
  podSubnet: "192.168.0.0/16" # to be used with calico

modified the kubead-config CM to have imageRepository: gcr.io/google_containers
installed kubeadm 1.17.3
ran kubeadm upgrade apply v1.17.3
checked the images in the manifests files:

sudo grep "image:" /etc/kubernetes/manifests/ -rnI
/etc/kubernetes/manifests/kube-controller-manager.yaml:31:    image: gcr.io/google_containers/kube-controller-manager:v1.17.3
/etc/kubernetes/manifests/kube-scheduler.yaml:19:    image: gcr.io/google_containers/kube-scheduler:v1.17.3
/etc/kubernetes/manifests/kube-apiserver.yaml:40:    image: gcr.io/google_containers/kube-apiserver:v1.17.3
/etc/kubernetes/manifests/etcd.yaml:31:    image: gcr.io/google_containers/etcd:3.4.3-0

as you can see the registry was applied correctly on upgrade.

/priority awaiting-more-evidence
/remove-priority backlog

neolit123 on 23 Mar 2020

@neolit123: Those labels are not set on the issue: priority/backlog

In response to this:

my test:

created a kubeadm cluster with 1.16.8 using this config:
apiVersion: kubeadm.k8s.io/v1beta2
kind: ClusterConfiguration
kubernetesVersion: v1.16.8
imageRepository: k8s.gcr.io
networking:
 podSubnet: "192.168.0.0/16" # to be used with calico
modified the kubead-config CM to have imageRepository: gcr.io/google_containers

installed kubeadm 1.17.3

ran kubeadm upgrade apply v1.17.3

checked the images in the manifests files:
sudo grep "image:" /etc/kubernetes/manifests/ -rnI
/etc/kubernetes/manifests/kube-controller-manager.yaml:31:    image: gcr.io/google_containers/kube-controller-manager:v1.17.3
/etc/kubernetes/manifests/kube-scheduler.yaml:19:    image: gcr.io/google_containers/kube-scheduler:v1.17.3
/etc/kubernetes/manifests/kube-apiserver.yaml:40:    image: gcr.io/google_containers/kube-apiserver:v1.17.3
/etc/kubernetes/manifests/etcd.yaml:31:    image: gcr.io/google_containers/etcd:3.4.3-0
as you can see the registry was applied correctly on upgrade.

/priority awaiting-more-evidence
/remove-priority backlog

k8s-ci-robot on 23 Mar 2020

Thanks for the update. The only difference I see here is that you are upgrading to 1.17, whereas I am not performing an actual upgrade. My cluster is 1.16.4, and all i'm trying to do is update the config, not the cluster to a new version, so i'd simply run kubectl upgrade apply -f v1.16.4

jeanluclariviere on 23 Mar 2020

your alternative is: