kubeadm upgrade diff looses configuration options

Created on 22 Jul 2018 · 26Comments · Source: kubernetes/kubeadm

What keywords did you search in kubeadm issues before filing this one?

diff, upgrade

Is this a BUG REPORT or FEATURE REQUEST?

Choose one: BUG REPORT

Versions

kubeadm version (use kubeadm version):

kubeadm version 
kubeadm version: &version.Info{Major:"1", Minor:"11", GitVersion:"v1.11.1", GitCommit:"b1b29978270dc22fecc592ac55d903350454310a", GitTreeState:"clean", BuildDate:"2018-07-17T18:50:16Z", GoVersion:"go1.10.3", Compiler:"gc", Platform:"linux/amd64"}

Environment:

Kubernetes version (use kubectl version):

k version 
Client Version: version.Info{Major:"1", Minor:"10", GitVersion:"v1.10.5", GitCommit:"32ac1c9073b132b8ba18aa830f46b77dcceb0723", GitTreeState:"clean", BuildDate:"2018-06-21T11:46:00Z", GoVersion:"go1.9.3", Compiler:"gc", Platform:"linux/amd64"}
Server Version: version.Info{Major:"1", Minor:"8", GitVersion:"v1.8.6", GitCommit:"6260bb08c46c31eea6cb538b34a9ceb3e406689c", GitTreeState:"clean", BuildDate:"2017-12-21T06:23:29Z", GoVersion:"go1.8.3", Compiler:"gc", Platform:"linux/amd64"}

Cloud provider or hardware configuration: Bare metal (VM's in cloud)
OS (e.g. from /etc/os-release):

PRETTY_NAME="Debian GNU/Linux 9 (stretch)"
NAME="Debian GNU/Linux"
VERSION_ID="9"
VERSION="9 (stretch)"

Kernel (e.g. uname -a): Linux s03 4.9.0-6-amd64 #1 SMP Debian 4.9.88-1+deb9u1 (2018-05-07) x86_64 GNU/Linux
Others:

kubeadm init \
  --pod-network-cidr=192.168.0.0/16 \
  --apiserver-advertise-address=10.20.0.100 \
  --apiserver-cert-extra-sans=XXXXX,XXXX

What happened?

I'm planning the upgrade 1.11.0 -> 1.11.1 . I upgraded deb packages for all nodes in cluster and then I did kubeadm upgrade diff to see the differences. I've noticed some configuration options change in a way that will break the cluster, and some I don't know about:

the advertise ip address changes from my VPN ip to the public IP address -> network will break most likely
the oidc configuration options get lost -> dashboard SSO is lost
the certificates locations get changed -> have not idea what is going to happen

What you expected to happen?

Upgrade to be performed with minimal/no configuration changes.

How to reproduce it (as minimally and precisely as possible)?

Make 1.11 cluster with oidc values and custom advertise IP and then try to upgrade.

Anything else we need to know?

You are awesome ! :)

kubeadm upgrade diff 
--- /etc/kubernetes/manifests/kube-scheduler.yaml
+++ new manifest
@@ -16,7 +16,7 @@
     - --address=127.0.0.1
     - --kubeconfig=/etc/kubernetes/scheduler.conf
     - --leader-elect=true
-    image: k8s.gcr.io/kube-scheduler-amd64:v1.11.0
+    image: k8s.gcr.io/kube-scheduler-amd64:v1.11.1
     imagePullPolicy: IfNotPresent
     livenessProbe:
       failureThreshold: 8
--- /etc/kubernetes/manifests/kube-apiserver.yaml
+++ new manifest
@@ -14,7 +14,7 @@
   - command:
     - kube-apiserver
     - --authorization-mode=Node,RBAC
-    - --advertise-address=10.20.0.100
+    - --advertise-address=REDACTED
     - --allow-privileged=true
     - --client-ca-file=/etc/kubernetes/pki/ca.crt
     - --disable-admission-plugins=PersistentVolumeLabel
@@ -40,15 +40,12 @@
     - --service-cluster-ip-range=10.96.0.0/12
     - --tls-cert-file=/etc/kubernetes/pki/apiserver.crt
     - --tls-private-key-file=/etc/kubernetes/pki/apiserver.key
-    - --oidc-issuer-url=https://auth.REDACTED/auth/realms/gpi-infra
-    - --oidc-client-id=kubernetes
-    - --oidc-groups-claim=groups
-    image: k8s.gcr.io/kube-apiserver-amd64:v1.11.0
+    image: k8s.gcr.io/kube-apiserver-amd64:v1.11.1
     imagePullPolicy: IfNotPresent
     livenessProbe:
       failureThreshold: 8
       httpGet:
-        host: 10.20.0.100
+        host: REDACTED
         path: /healthz
         port: 6443
         scheme: HTTPS
--- /etc/kubernetes/manifests/kube-controller-manager.yaml
+++ new manifest
@@ -14,18 +14,15 @@
   - command:
     - kube-controller-manager
     - --address=127.0.0.1
-    - --allocate-node-cidrs=true
-    - --cluster-cidr=192.168.0.0/16
     - --cluster-signing-cert-file=/etc/kubernetes/pki/ca.crt
     - --cluster-signing-key-file=/etc/kubernetes/pki/ca.key
     - --controllers=*,bootstrapsigner,tokencleaner
     - --kubeconfig=/etc/kubernetes/controller-manager.conf
     - --leader-elect=true
-    - --node-cidr-mask-size=24
     - --root-ca-file=/etc/kubernetes/pki/ca.crt
     - --service-account-private-key-file=/etc/kubernetes/pki/sa.key
     - --use-service-account-credentials=true
-    image: k8s.gcr.io/kube-controller-manager-amd64:v1.11.0
+    image: k8s.gcr.io/kube-controller-manager-amd64:v1.11.1
     imagePullPolicy: IfNotPresent
     livenessProbe:
       failureThreshold: 8
@@ -41,6 +38,15 @@
       requests:
         cpu: 200m
     volumeMounts:
+    - mountPath: /usr/local/share/ca-certificates
+      name: usr-local-share-ca-certificates
+      readOnly: true
+    - mountPath: /etc/ca-certificates
+      name: etc-ca-certificates
+      readOnly: true
+    - mountPath: /etc/kubernetes/pki
+      name: k8s-certs
+      readOnly: true
     - mountPath: /etc/ssl/certs
       name: ca-certs
       readOnly: true
@@ -52,22 +58,9 @@
     - mountPath: /usr/share/ca-certificates
       name: usr-share-ca-certificates
       readOnly: true
-    - mountPath: /usr/local/share/ca-certificates
-      name: usr-local-share-ca-certificates
-      readOnly: true
-    - mountPath: /etc/ca-certificates
-      name: etc-ca-certificates
-      readOnly: true
-    - mountPath: /etc/kubernetes/pki
-      name: k8s-certs
-      readOnly: true
   hostNetwork: true
   priorityClassName: system-cluster-critical
   volumes:
-  - hostPath:
-      path: /usr/local/share/ca-certificates
-      type: DirectoryOrCreate
-    name: usr-local-share-ca-certificates
   - hostPath:
       path: /etc/ca-certificates
       type: DirectoryOrCreate
@@ -92,5 +85,9 @@
       path: /usr/share/ca-certificates
       type: DirectoryOrCreate
     name: usr-share-ca-certificates
+  - hostPath:
+      path: /usr/local/share/ca-certificates
+      type: DirectoryOrCreate
+    name: usr-local-share-ca-certificates
 status: {}

areupgrades help wanted kinbug prioritimportant-longterm

Source

ieugen

Most helpful comment

We lost networking to the pods after 1.11.1 upgrade from 1.10.6. It looks like --cluster-cidr is no longer working as all our pods came up with IPs from 172.17.x.x and not 10.244.x.x which is configured for flanel. How can we resolve this situation?

@mkretzer In my case worker nodes kubelet loses its network parameters during upgrades, my personal fix is
echo "KUBELET_KUBEADM_ARGS=--cgroup-driver=cgroupfs --network-plugin=cni" > /var/lib/kubelet/kubeadm-flags.env

Brightside56 on 30 Sep 2018

❤1 😄1 👍1

All 26 comments

/assign @liztio @timothysc

luxas on 22 Jul 2018

@ieugen I'd recommend using the configuration migrate utility prior to attempting to upgrade. The configuration file format has significantly changed from v1.10 -> v1.11 but folks have done a good job in testing that migration.

timothysc on 24 Jul 2018

@timothysc I've installed 1.11 and I am upgrading to 1.11.1 so there should not be much to upgrade.
I did use the utility and I got there results:

kubeadm config view > kubeadm-old.yaml
kubeadm config migrate --old-config kubeadm-old.yaml > kubeadm-new.yaml
diff kubeadm-old.yaml kubeadm-new.yaml 

10d9
<   oidc-issuer-url: https://REDACTED
12a12
>   oidc-issuer-url: https://REDACTED
17a18,25
> bootstrapTokens:
> - groups:
>   - system:bootstrappers:kubeadm:default-node-token
>   token: REDACTED
>   ttl: 24h0m0s
>   usages:
>   - signing
>   - authentication
137c145,150
< nodeRegistration: {}
---
> nodeRegistration:
>   criSocket: /var/run/dockershim.sock
>   name: m01
>   taints:
>   - effect: NoSchedule
>     key: node-role.kubernetes.io/master

ieugen on 24 Jul 2018

Confirming. In my case (1.11.0 -> 1.11.1) it looses apiServerExtraArgs like etcd-cafile, feature-gates etc... and replaces them with some defaults.

I can find expected values inside of configmap (key: MasterConfiguration) like this
kubectl get configmap -n kube-system kubeadm-config -oyaml

wizard580 on 2 Aug 2018

I've made the upgrade and it wen't smooth so I am a bit confused about this. I also rebooted the cluster (one node at a time, starting with master) to see if there are any issues and I did not see any.

I don't remember having to change anything after the upgrade and I did not document it :(.

Regards,

ieugen on 2 Aug 2018

We lost networking to the pods after 1.11.1 upgrade from 1.10.6. It looks like --cluster-cidr is no longer working as all our pods came up with IPs from 172.17.x.x and not 10.244.x.x which is configured for flanel. How can we resolve this situation?

mkretzer on 2 Aug 2018

Even better:
ATM I'm at v1.11.0
kubeadm upgrade diff v1.11.0 gives me same broken result.
```kubeadm upgrade diff v1.11.0
--- /etc/kubernetes/manifests/kube-apiserver.yaml
+++ new manifest
@@ -14,17 +14,16 @@

command:
- kube-apiserver
- --authorization-mode=Node,RBAC
  
  - - --etcd-cafile=/opt/etcd/ca.pem
  
  - - --etcd-certfile=/opt/etcd/staging-cluster2node.pem
  
  - - --etcd-keyfile=/opt/etcd/staging-cluster2node-key.pem
  
  - - --feature-gates=PodPriority=false
- --advertise-address=192.168.6.161
- --allow-privileged=true
- --client-ca-file=/etc/kubernetes/pki/ca.crt
- --disable-admission-plugins=PersistentVolumeLabel
- --enable-admission-plugins=NodeRestriction
- --enable-bootstrap-token-auth=true
  
  - - --etcd-servers=https://192.168.6.161:2379,https://192.168.6.162:2379,https://192.168.6.163:2379
  - - --etcd-cafile=/etc/kubernetes/pki/etcd/ca.crt
  - - --etcd-certfile=/etc/kubernetes/pki/apiserver-etcd-client.crt
  - - --etcd-keyfile=/etc/kubernetes/pki/apiserver-etcd-client.key
  - - --etcd-servers=https://127.0.0.1:2379
- --insecure-port=0
- --kubelet-client-certificate=/etc/kubernetes/pki/apiserver-kubelet-client.crt
  
  ....
  
```

wizard580 on 2 Aug 2018

@wizard580 In my case the upgrade went ok. No issues with the cluster (and I'm also running on top of wireguard VPN)

kubeadm upgrade apply v1.11.1 worked ok
kubeadm config diff shows same bad diff like in your case

ieugen on 2 Aug 2018

I'll try tomorrow after backup. But anyway broken diff is a bug. From my perspective - major.

wizard580 on 2 Aug 2018

For us this is not only broken diff as node cidr really seems to get lost.

mkretzer on 2 Aug 2018

Upgrade with kubeadm upgrade apply v1.11.1 worked fine, configs are not broken as far as I can see.
It generated unneeded etcd certs, but they are ignored by our configs/setup

wizard580 on 3 Aug 2018

For us upgrade also did not seem broken at first but after uncordoning the upgraded nodes and draining the old nodes our application went down right away because the pods all used wrong IPs.

mkretzer on 3 Aug 2018

😕1

Confirming. Found similar issues... in our case IPVS was stuck at old service:pods mappings. You can check for kube-proxy logs and probably you'll find a lot of errors about ipset. Reboot (of nodes) helped us.
Observing...

wizard580 on 3 Aug 2018

Can still reproduce this in the latest v1.12.0 alpha. Gonna see if I can if I can't sort this out for the code freeze.

liztio on 7 Sep 2018

ETOOCOMPLICATED, punting to 1.13

timothysc on 18 Sep 2018

Some updates,

I've made the upgrade to 1.11.2 and 1.11.3 without any issue. Every time I performed the upgrade the diff showed it was dropping the information however that does not seem to happen. At this point I believe it is bad reporting.

ieugen on 18 Sep 2018

@ieugen Minor Upgrades were also not affected here, but every major upgrade (1.10.x -> 1.11.x) was!

mkretzer on 18 Sep 2018

We lost networking to the pods after 1.11.1 upgrade from 1.10.6. It looks like --cluster-cidr is no longer working as all our pods came up with IPs from 172.17.x.x and not 10.244.x.x which is configured for flanel. How can we resolve this situation?

Brightside56 on 30 Sep 2018

❤1 😄1 👍1

since 1.11, /var/lib/kubelet/kubeadm-flags.env is a file that kubeadm init and join generate automatically on runtime each time:
https://kubernetes.io/docs/setup/independent/kubelet-integration/#the-kubelet-drop-in-file-for-systemd

if you write it:

before init/join, kubeadm will overwrite it and discard it's contents.
after init/join, kubeadm or the kubelet will not use it.

neolit123 on 30 Sep 2018

since 1.11, /var/lib/kubelet/kubeadm-flags.env is a file that kubeadm init and join generate automatically on runtime each time:
https://kubernetes.io/docs/setup/independent/kubelet-integration/#the-kubelet-drop-in-file-for-systemd

if you write it:
* before `init/join`, kubeadm will overwrite it and discard it's contents.

* after `init/join`, kubeadm or the kubelet will not use it.

It's great, but kubeadm init/join wasn't run during cluster upgrade and cgroup/cni args were lost on worker nodes, that's why pods had 172.0.0.x IPs

Brightside56 on 30 Sep 2018

It's great, but kubeadm init/join wasn't run during cluster upgrade and cgroup/cni args were lost on worker nodes, that's why pods had 172.0.0.x IPs

that makes the issue valid.

neolit123 on 30 Sep 2018

On it.

rdodev on 2 Nov 2018

@mkretzer In my case worker nodes kubelet loses its network parameters during upgrades, my personal fix is
echo "KUBELET_KUBEADM_ARGS=--cgroup-driver=cgroupfs --network-plugin=cni" > /var/lib/kubelet/kubeadm-flags.env

That helped, thank you very much! For all our clusters: Its upgrade time! :-)

mkretzer on 21 Nov 2018

@neolit123

that makes the issue valid.

I've added my notes about this issue (upgrading the cluster) over here:
https://github.com/kubernetes/kubeadm/issues/1347#issuecomment-456739287

adoerler on 23 Jan 2019

@adoerler it seems like the unit file issue you outlined here is a separate one:
https://github.com/kubernetes/kubeadm/issues/1347#issuecomment-456739287

but you are right, we do recommend to use package managers in recent versions and by using a package manager a unit file will be updated as well. i guess that was a problem in the ->1.12 upgrade doc.

neolit123 on 12 Feb 2019

as far as this issue goes we are pushing a fix for a certain bug in our library for DIFF:
https://github.com/kubernetes/kubernetes/pull/73941

but this will only land in 1.14 and cannot be backported to older releases.

i'm going to have to close this issue, but if anyone finds a problem related to DIFF in 1.13 -> 1.14 upgrades please feel free to open a new ticket.

neolit123 on 12 Feb 2019

Was this page helpful?

0 / 5 - 0 ratings