kubeadm upgrade diff looses configuration options

Created on 22 Jul 2018  路  26Comments  路  Source: kubernetes/kubeadm

What keywords did you search in kubeadm issues before filing this one?

diff, upgrade

Is this a BUG REPORT or FEATURE REQUEST?

Choose one: BUG REPORT

Versions

kubeadm version (use kubeadm version):

kubeadm version 
kubeadm version: &version.Info{Major:"1", Minor:"11", GitVersion:"v1.11.1", GitCommit:"b1b29978270dc22fecc592ac55d903350454310a", GitTreeState:"clean", BuildDate:"2018-07-17T18:50:16Z", GoVersion:"go1.10.3", Compiler:"gc", Platform:"linux/amd64"}

Environment:

  • Kubernetes version (use kubectl version):
k version 
Client Version: version.Info{Major:"1", Minor:"10", GitVersion:"v1.10.5", GitCommit:"32ac1c9073b132b8ba18aa830f46b77dcceb0723", GitTreeState:"clean", BuildDate:"2018-06-21T11:46:00Z", GoVersion:"go1.9.3", Compiler:"gc", Platform:"linux/amd64"}
Server Version: version.Info{Major:"1", Minor:"8", GitVersion:"v1.8.6", GitCommit:"6260bb08c46c31eea6cb538b34a9ceb3e406689c", GitTreeState:"clean", BuildDate:"2017-12-21T06:23:29Z", GoVersion:"go1.8.3", Compiler:"gc", Platform:"linux/amd64"}
  • Cloud provider or hardware configuration: Bare metal (VM's in cloud)
  • OS (e.g. from /etc/os-release):
PRETTY_NAME="Debian GNU/Linux 9 (stretch)"
NAME="Debian GNU/Linux"
VERSION_ID="9"
VERSION="9 (stretch)"
  • Kernel (e.g. uname -a): Linux s03 4.9.0-6-amd64 #1 SMP Debian 4.9.88-1+deb9u1 (2018-05-07) x86_64 GNU/Linux
  • Others:
kubeadm init \
  --pod-network-cidr=192.168.0.0/16 \
  --apiserver-advertise-address=10.20.0.100 \
  --apiserver-cert-extra-sans=XXXXX,XXXX

What happened?

I'm planning the upgrade 1.11.0 -> 1.11.1 . I upgraded deb packages for all nodes in cluster and then I did kubeadm upgrade diff to see the differences. I've noticed some configuration options change in a way that will break the cluster, and some I don't know about:

  • the advertise ip address changes from my VPN ip to the public IP address -> network will break most likely
  • the oidc configuration options get lost -> dashboard SSO is lost
  • the certificates locations get changed -> have not idea what is going to happen

What you expected to happen?

Upgrade to be performed with minimal/no configuration changes.

How to reproduce it (as minimally and precisely as possible)?

Make 1.11 cluster with oidc values and custom advertise IP and then try to upgrade.

Anything else we need to know?

You are awesome ! :)

kubeadm upgrade diff 
--- /etc/kubernetes/manifests/kube-scheduler.yaml
+++ new manifest
@@ -16,7 +16,7 @@
     - --address=127.0.0.1
     - --kubeconfig=/etc/kubernetes/scheduler.conf
     - --leader-elect=true
-    image: k8s.gcr.io/kube-scheduler-amd64:v1.11.0
+    image: k8s.gcr.io/kube-scheduler-amd64:v1.11.1
     imagePullPolicy: IfNotPresent
     livenessProbe:
       failureThreshold: 8
--- /etc/kubernetes/manifests/kube-apiserver.yaml
+++ new manifest
@@ -14,7 +14,7 @@
   - command:
     - kube-apiserver
     - --authorization-mode=Node,RBAC
-    - --advertise-address=10.20.0.100
+    - --advertise-address=REDACTED
     - --allow-privileged=true
     - --client-ca-file=/etc/kubernetes/pki/ca.crt
     - --disable-admission-plugins=PersistentVolumeLabel
@@ -40,15 +40,12 @@
     - --service-cluster-ip-range=10.96.0.0/12
     - --tls-cert-file=/etc/kubernetes/pki/apiserver.crt
     - --tls-private-key-file=/etc/kubernetes/pki/apiserver.key
-    - --oidc-issuer-url=https://auth.REDACTED/auth/realms/gpi-infra
-    - --oidc-client-id=kubernetes
-    - --oidc-groups-claim=groups
-    image: k8s.gcr.io/kube-apiserver-amd64:v1.11.0
+    image: k8s.gcr.io/kube-apiserver-amd64:v1.11.1
     imagePullPolicy: IfNotPresent
     livenessProbe:
       failureThreshold: 8
       httpGet:
-        host: 10.20.0.100
+        host: REDACTED
         path: /healthz
         port: 6443
         scheme: HTTPS
--- /etc/kubernetes/manifests/kube-controller-manager.yaml
+++ new manifest
@@ -14,18 +14,15 @@
   - command:
     - kube-controller-manager
     - --address=127.0.0.1
-    - --allocate-node-cidrs=true
-    - --cluster-cidr=192.168.0.0/16
     - --cluster-signing-cert-file=/etc/kubernetes/pki/ca.crt
     - --cluster-signing-key-file=/etc/kubernetes/pki/ca.key
     - --controllers=*,bootstrapsigner,tokencleaner
     - --kubeconfig=/etc/kubernetes/controller-manager.conf
     - --leader-elect=true
-    - --node-cidr-mask-size=24
     - --root-ca-file=/etc/kubernetes/pki/ca.crt
     - --service-account-private-key-file=/etc/kubernetes/pki/sa.key
     - --use-service-account-credentials=true
-    image: k8s.gcr.io/kube-controller-manager-amd64:v1.11.0
+    image: k8s.gcr.io/kube-controller-manager-amd64:v1.11.1
     imagePullPolicy: IfNotPresent
     livenessProbe:
       failureThreshold: 8
@@ -41,6 +38,15 @@
       requests:
         cpu: 200m
     volumeMounts:
+    - mountPath: /usr/local/share/ca-certificates
+      name: usr-local-share-ca-certificates
+      readOnly: true
+    - mountPath: /etc/ca-certificates
+      name: etc-ca-certificates
+      readOnly: true
+    - mountPath: /etc/kubernetes/pki
+      name: k8s-certs
+      readOnly: true
     - mountPath: /etc/ssl/certs
       name: ca-certs
       readOnly: true
@@ -52,22 +58,9 @@
     - mountPath: /usr/share/ca-certificates
       name: usr-share-ca-certificates
       readOnly: true
-    - mountPath: /usr/local/share/ca-certificates
-      name: usr-local-share-ca-certificates
-      readOnly: true
-    - mountPath: /etc/ca-certificates
-      name: etc-ca-certificates
-      readOnly: true
-    - mountPath: /etc/kubernetes/pki
-      name: k8s-certs
-      readOnly: true
   hostNetwork: true
   priorityClassName: system-cluster-critical
   volumes:
-  - hostPath:
-      path: /usr/local/share/ca-certificates
-      type: DirectoryOrCreate
-    name: usr-local-share-ca-certificates
   - hostPath:
       path: /etc/ca-certificates
       type: DirectoryOrCreate
@@ -92,5 +85,9 @@
       path: /usr/share/ca-certificates
       type: DirectoryOrCreate
     name: usr-share-ca-certificates
+  - hostPath:
+      path: /usr/local/share/ca-certificates
+      type: DirectoryOrCreate
+    name: usr-local-share-ca-certificates
 status: {}
areupgrades help wanted kinbug prioritimportant-longterm

Most helpful comment

We lost networking to the pods after 1.11.1 upgrade from 1.10.6. It looks like --cluster-cidr is no longer working as all our pods came up with IPs from 172.17.x.x and not 10.244.x.x which is configured for flanel. How can we resolve this situation?

@mkretzer In my case worker nodes kubelet loses its network parameters during upgrades, my personal fix is
echo "KUBELET_KUBEADM_ARGS=--cgroup-driver=cgroupfs --network-plugin=cni" > /var/lib/kubelet/kubeadm-flags.env

All 26 comments

/assign @liztio @timothysc

@ieugen I'd recommend using the configuration migrate utility prior to attempting to upgrade. The configuration file format has significantly changed from v1.10 -> v1.11 but folks have done a good job in testing that migration.

@timothysc I've installed 1.11 and I am upgrading to 1.11.1 so there should not be much to upgrade.
I did use the utility and I got there results:

kubeadm config view > kubeadm-old.yaml
kubeadm config migrate --old-config kubeadm-old.yaml > kubeadm-new.yaml
diff kubeadm-old.yaml kubeadm-new.yaml 

10d9
<   oidc-issuer-url: https://REDACTED
12a12
>   oidc-issuer-url: https://REDACTED
17a18,25
> bootstrapTokens:
> - groups:
>   - system:bootstrappers:kubeadm:default-node-token
>   token: REDACTED
>   ttl: 24h0m0s
>   usages:
>   - signing
>   - authentication
137c145,150
< nodeRegistration: {}
---
> nodeRegistration:
>   criSocket: /var/run/dockershim.sock
>   name: m01
>   taints:
>   - effect: NoSchedule
>     key: node-role.kubernetes.io/master

Confirming. In my case (1.11.0 -> 1.11.1) it looses apiServerExtraArgs like etcd-cafile, feature-gates etc... and replaces them with some defaults.

I can find expected values inside of configmap (key: MasterConfiguration) like this
kubectl get configmap -n kube-system kubeadm-config -oyaml

I've made the upgrade and it wen't smooth so I am a bit confused about this. I also rebooted the cluster (one node at a time, starting with master) to see if there are any issues and I did not see any.

I don't remember having to change anything after the upgrade and I did not document it :(.

Regards,

We lost networking to the pods after 1.11.1 upgrade from 1.10.6. It looks like --cluster-cidr is no longer working as all our pods came up with IPs from 172.17.x.x and not 10.244.x.x which is configured for flanel. How can we resolve this situation?

Even better:
ATM I'm at v1.11.0
kubeadm upgrade diff v1.11.0 gives me same broken result.
```kubeadm upgrade diff v1.11.0
--- /etc/kubernetes/manifests/kube-apiserver.yaml
+++ new manifest
@@ -14,17 +14,16 @@

  • command:

    • kube-apiserver

    • --authorization-mode=Node,RBAC

      - - --etcd-cafile=/opt/etcd/ca.pem

      - - --etcd-certfile=/opt/etcd/staging-cluster2node.pem

      - - --etcd-keyfile=/opt/etcd/staging-cluster2node-key.pem

      - - --feature-gates=PodPriority=false

    • --advertise-address=192.168.6.161

    • --allow-privileged=true

    • --client-ca-file=/etc/kubernetes/pki/ca.crt

    • --disable-admission-plugins=PersistentVolumeLabel

    • --enable-admission-plugins=NodeRestriction

    • --enable-bootstrap-token-auth=true

      - - --etcd-servers=https://192.168.6.161:2379,https://192.168.6.162:2379,https://192.168.6.163:2379






        • --etcd-cafile=/etc/kubernetes/pki/etcd/ca.crt








        • --etcd-certfile=/etc/kubernetes/pki/apiserver-etcd-client.crt








        • --etcd-keyfile=/etc/kubernetes/pki/apiserver-etcd-client.key








    • --insecure-port=0

    • --kubelet-client-certificate=/etc/kubernetes/pki/apiserver-kubelet-client.crt

      ....

      ```

@wizard580 In my case the upgrade went ok. No issues with the cluster (and I'm also running on top of wireguard VPN)

  • kubeadm upgrade apply v1.11.1 worked ok
  • kubeadm config diff shows same bad diff like in your case

I'll try tomorrow after backup. But anyway broken diff is a bug. From my perspective - major.

For us this is not only broken diff as node cidr really seems to get lost.

Upgrade with kubeadm upgrade apply v1.11.1 worked fine, configs are not broken as far as I can see.
It generated unneeded etcd certs, but they are ignored by our configs/setup

For us upgrade also did not seem broken at first but after uncordoning the upgraded nodes and draining the old nodes our application went down right away because the pods all used wrong IPs.

Confirming. Found similar issues... in our case IPVS was stuck at old service:pods mappings. You can check for kube-proxy logs and probably you'll find a lot of errors about ipset. Reboot (of nodes) helped us.
Observing...

Can still reproduce this in the latest v1.12.0 alpha. Gonna see if I can if I can't sort this out for the code freeze.

ETOOCOMPLICATED, punting to 1.13

Some updates,

I've made the upgrade to 1.11.2 and 1.11.3 without any issue. Every time I performed the upgrade the diff showed it was dropping the information however that does not seem to happen. At this point I believe it is bad reporting.

@ieugen Minor Upgrades were also not affected here, but every major upgrade (1.10.x -> 1.11.x) was!

We lost networking to the pods after 1.11.1 upgrade from 1.10.6. It looks like --cluster-cidr is no longer working as all our pods came up with IPs from 172.17.x.x and not 10.244.x.x which is configured for flanel. How can we resolve this situation?

@mkretzer In my case worker nodes kubelet loses its network parameters during upgrades, my personal fix is
echo "KUBELET_KUBEADM_ARGS=--cgroup-driver=cgroupfs --network-plugin=cni" > /var/lib/kubelet/kubeadm-flags.env

since 1.11, /var/lib/kubelet/kubeadm-flags.env is a file that kubeadm init and join generate automatically on runtime each time:
https://kubernetes.io/docs/setup/independent/kubelet-integration/#the-kubelet-drop-in-file-for-systemd

if you write it:

  • before init/join, kubeadm will overwrite it and discard it's contents.
  • after init/join, kubeadm or the kubelet will not use it.

since 1.11, /var/lib/kubelet/kubeadm-flags.env is a file that kubeadm init and join generate automatically on runtime each time:
https://kubernetes.io/docs/setup/independent/kubelet-integration/#the-kubelet-drop-in-file-for-systemd

if you write it:

* before `init/join`, kubeadm will overwrite it and discard it's contents.

* after `init/join`, kubeadm or the kubelet will not use it.

It's great, but kubeadm init/join wasn't run during cluster upgrade and cgroup/cni args were lost on worker nodes, that's why pods had 172.0.0.x IPs

It's great, but kubeadm init/join wasn't run during cluster upgrade and cgroup/cni args were lost on worker nodes, that's why pods had 172.0.0.x IPs

that makes the issue valid.

On it.

@mkretzer In my case worker nodes kubelet loses its network parameters during upgrades, my personal fix is
echo "KUBELET_KUBEADM_ARGS=--cgroup-driver=cgroupfs --network-plugin=cni" > /var/lib/kubelet/kubeadm-flags.env

That helped, thank you very much! For all our clusters: Its upgrade time! :-)

@neolit123

that makes the issue valid.

I've added my notes about this issue (upgrading the cluster) over here:
https://github.com/kubernetes/kubeadm/issues/1347#issuecomment-456739287

@adoerler it seems like the unit file issue you outlined here is a separate one:
https://github.com/kubernetes/kubeadm/issues/1347#issuecomment-456739287

but you are right, we do recommend to use package managers in recent versions and by using a package manager a unit file will be updated as well. i guess that was a problem in the ->1.12 upgrade doc.

as far as this issue goes we are pushing a fix for a certain bug in our library for DIFF:
https://github.com/kubernetes/kubernetes/pull/73941

but this will only land in 1.14 and cannot be backported to older releases.

i'm going to have to close this issue, but if anyone finds a problem related to DIFF in 1.13 -> 1.14 upgrades please feel free to open a new ticket.

Was this page helpful?
0 / 5 - 0 ratings

Related issues

cnmade picture cnmade  路  4Comments

helphi picture helphi  路  3Comments

chuckha picture chuckha  路  3Comments

jbrandes picture jbrandes  路  4Comments

kvaps picture kvaps  路  3Comments