What steps did you take and what happened:
On a single-node control-plane, I upgraded kubernetes version, etcd and CoreDNS image tags by modifying KCP object. New node with upgraded Kubernetes version is created, old node is physically deleted but old node object and all pods that was on the old pod are dangling.
root@cluster-tvnb5a-cluster-tvnb5a-control-plane-cdrtr:/# kubectl get nodes -A
NAME STATUS ROLES AGE VERSION
cluster-tvnb5a-cluster-tvnb5a-control-plane-9jqkj NotReady master 7m11s v1.17.0
cluster-tvnb5a-cluster-tvnb5a-control-plane-cdrtr Ready master 6m20s v1.17.2
cluster-tvnb5a-cluster-tvnb5a-md-0-66496845f6-kvh9v Ready <none> 6m51s v1.17.0
root@cluster-tvnb5a-cluster-tvnb5a-control-plane-cdrtr:/# kubectl get pods -A
NAMESPACE NAME READY STATUS RESTARTS AGE
kube-system coredns-6955765f44-dpwvr 1/1 Terminating 0 7m12s
kube-system coredns-6955765f44-g4df6 1/1 Terminating 0 7m12s
kube-system coredns-7987b8d68f-f2d78 1/1 Running 0 4m24s
kube-system coredns-7987b8d68f-rftvl 1/1 Running 0 4m23s
kube-system etcd-cluster-tvnb5a-cluster-tvnb5a-control-plane-9jqkj 0/1 CrashLoopBackOff 3 7m16s
kube-system etcd-cluster-tvnb5a-cluster-tvnb5a-control-plane-cdrtr 1/1 Running 0 6m19s
kube-system kindnet-s6rvq 1/1 Running 0 7m8s
kube-system kindnet-wmxqx 1/1 Running 0 6m37s
kube-system kindnet-xvdks 1/1 Running 0 7m12s
kube-system kube-apiserver-cluster-tvnb5a-cluster-tvnb5a-control-plane-9jqkj 1/1 Running 0 7m16s
kube-system kube-apiserver-cluster-tvnb5a-cluster-tvnb5a-control-plane-cdrtr 1/1 Running 0 6m36s
kube-system kube-controller-manager-cluster-tvnb5a-cluster-tvnb5a-control-plane-9jqkj 1/1 Running 1 7m16s
kube-system kube-controller-manager-cluster-tvnb5a-cluster-tvnb5a-control-plane-cdrtr 1/1 Running 0 6m35s
kube-system kube-proxy-gfghs 1/1 Running 0 4m22s
kube-system kube-proxy-lj9lv 1/1 Terminating 0 7m12s
kube-system kube-proxy-qlgpw 1/1 Running 0 3m57s
kube-system kube-scheduler-cluster-tvnb5a-cluster-tvnb5a-control-plane-9jqkj 1/1 Running 1 7m16s
kube-system kube-scheduler-cluster-tvnb5a-cluster-tvnb5a-control-plane-cdrtr 1/1 Running 0 6m35s
root@cluster-tvnb5a-cluster-tvnb5a-control-plane-cdrtr:/#
What did you expect to happen:
I expected KCP to remove node object and cleanup resources that was on that node.
/kind bug
/area control-plane
This seems like needs to be investigated
/priority critical-urgent
/milestone v0.3.4
I tried reproducing with CAPA and everything worked as it should, the new machine appears and joins the cluster, the old machine is deleted and its node disappears when it's removed
I am testing it on a clusterctl initiated cluster with CAPD. I am investigating a bit more to see if this is related to image tag upgrades, will update soon.
e2e test (docker_upgrade_test.go) fails when I change control plane replica number from 3 to 1.
Physical container is being deleted but kubernetes cluster still has it in its node list.
E.g., below test-upgrade-0-test-upgrade-0-cwcb2 is the old node.
root@test-upgrade-0-test-upgrade-0-kzhxq:/# kubectl get nodes -A
NAME STATUS ROLES AGE VERSION
test-upgrade-0-test-upgrade-0-cwcb2 NotReady master 8m21s v1.16.3
test-upgrade-0-test-upgrade-0-kzhxq Ready master 6m20s v1.17.2
test-upgrade-0-test-upgrade-0-md-5b5fdfd689-8dfvw Ready <none> 7m49s v1.16.3
âžś cluster-api git:(sss) âś— docker ps
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
f1afbb205656 kindest/node:v1.17.2 "/usr/local/bin/entr…" 7 minutes ago Up 7 minutes 45697/tcp, 127.0.0.1:45697->6443/tcp test-upgrade-0-test-upgrade-0-kzhxq
b1431d3879d4 kindest/node:v1.16.3 "/usr/local/bin/entr…" 8 minutes ago Up 8 minutes test-upgrade-0-test-upgrade-0-md-5b5fdfd689-8dfvw
ceadbb46bfb1 kindest/haproxy:2.1.1-alpine "/docker-entrypoint.…" 9 minutes ago Up 9 minutes 33733/tcp, 0.0.0.0:33733->6443/tcp test-upgrade-0-lb
54f344d2666e kindest/node:v1.17.2 "/usr/local/bin/entr…" 11 minutes ago Up 11 minutes 127.0.0.1:63611->6443/tcp docker-e2e-hjwvew-control-plane
I also tried upgrading just the Kubernetes version and not touch image tags. Same result, old node is dangling. Since @benmoss confirmed that it is working in aws, I suspect this is a CAPD issue.
@sedefsavas scaling down a KCP replicas isn't a supported use case, @detiber can you confirm?
Scale down is something that we had originally intended to support in the proposal (mainly as a pre-requisite for upgrade), not sure if anything has changed since, though.
I was under the assumption that we were not allowing to go 1 replica -> 3 replicas -> 1 replica
@vincepri @detiber Sorry for the confusion. I am not scaling down, I changed the hardcoded 3 control plane replica number in the test to 1.
I see this issue with CAPV too. It is happening more often than not. @benmoss can you redo this test for CAPA too to see if it is consistently succeeding?
Is scale-down not working for this node? Can you up the logs in the controller and try to trace what happens?
Yes, in the scale down. It is failing to remove etcd member, hence machine is never deleted.
[manager] E0416 21:46:02.455746 8 scale.go:117] "msg"="Failed to remove etcd member for machine" "error"="failed to create etcd client: unable to create etcd client: context deadline exceeded" "cluster-nanme"="sedef" "name"="sedef" "namespace"="default"
I don't understand why it happens only sometimes though.
Sounds like it could be a timing issue, it'd be great to have an exact trace when it fails
Can you give some more details of what the change you're making is? You're upgrading Kubernetes, etcd, and CoreDNS all at the same time?
No, only upgrading Kubernetes version.
I see different errors at different times. One of them is panic during ForwardETCDLeardship(). This is rarely happening
[manager] I0417 01:59:06.749227 7 controller.go:179] controllers/KubeadmControlPlane "msg"="Reconcile KubeadmControlPlane" "cluster"="sedef" "kubeadmControlPlane"="sedef" "namespace"="default”
[manager] I0417 01:59:06.750460 7 controller.go:225] controllers/KubeadmControlPlane "msg"="Upgrading Control Plane" "cluster"="sedef" "kubeadmControlPlane"="sedef" "namespace"=“default"
[manager] E0417 01:59:41.357403 7 runtime.go:78] Observed a panic: "invalid memory address or nil pointer dereference" (runtime error: invalid memory address or nil pointer dereference)
[manager] goroutine 316 [running]:
[manager] k8s.io/apimachinery/pkg/util/runtime.logPanic(0x16b0ec0
, 0x2712180)
[manager] /Users/ssavas/go/pkg/mod/k8s.io/[email protected]/pkg/util/runtime/runtime.go:74 +0xa3
[manager] k8s.io/apimachinery/pkg/util/runtime.HandleCrash(0x0
, 0x0, 0x0)
[manager] /Users/ssavas/go/pkg/mod/k8s.io/[email protected]/pkg/util/runtime/runtime.go:48 +0x82
[manager] panic(0x16b0ec0, 0x2712180)
[manager] /usr/local/Cellar/go/1.13.8/libexec/src/runtime/panic.go:679 +0x1b2
[manager]
sigs.k8s.io/cluster-api/controlplane/kubeadm/internal.(*Workload).ForwardEtcdLeadership(0xc00e0051d0
, 0x1b0da20, 0xc000046098, 0xc00d614900, 0xc00d614b20, 0xc00bc79788, 0x0)
[manager] /Users/ssavas/dev/capi/tilttest/cluster-api/controlplane/kubeadm/internal/workload_cluster_etcd.go:275 +0x1b9
[manager]
sigs.k8s.io/cluster-api/controlplane/kubeadm/controllers.(*KubeadmControlPlaneReconciler).scaleDownControlPlane(0xc000120ba0
, 0x1b0da20, 0xc000046098, 0xc00b70fc80, 0xc00c99eb00, 0xc00e857788, 0x0, 0x0, 0x0, 0x0)
[manager] /Users/ssavas/dev/capi/tilttest/cluster-api/controlplane/kubeadm/controllers/scale.go:103 +0x260
[manager]
sigs.k8s.io/cluster-api/controlplane/kubeadm/controllers.(*KubeadmControlPlaneReconciler).upgradeControlPlane(0xc000120ba0
, 0x1b0da20, 0xc000046098, 0xc00b70fc80, 0xc00c99eb00, 0xc00bc79788, 0x5, 0xc00012fdd0, 0x1, 0x1)
[manager] /Users/ssavas/dev/capi/tilttest/cluster-api/controlplane/kubeadm/controllers/upgrade.go:91 +0x69c
[manager] sigs.k8s.io/cluster-api/controlplane/kubeadm/controllers.(*KubeadmControlPlaneReconciler).reconcile(0xc000120ba0, 0x1b0da20, 0xc000046098, 0xc00b70fc80, 0xc00c99eb00, 0x0, 0x0, 0x0, 0xc000b34750)
[manager] /Users/ssavas/dev/capi/tilttest/cluster-api/controlplane/kubeadm/controllers/controller.go:226 +0x11c1
[manager] sigs.k8s.io/cluster-api/controlplane/kubeadm/controllers.(*KubeadmControlPlaneReconciler).Reconcile(0xc000120ba0, 0xc0007f1500, 0x7, 0xc0007f14f0, 0x5, 0xc00bc79c00, 0x4a817c800, 0x0, 0x0)
[manager] /Users/ssavas/dev/capi/tilttest/cluster-api/controlplane/kubeadm/controllers/controller.go:173 +0x632
[manager] sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).reconcileHandler(0xc00071e180, 0x1712260, 0xc0037ced80, 0x0)
[manager] /Users/ssavas/go/pkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:256 +0x162
[manager] sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem(0xc00071e180, 0xc0008e4700)
[manager] /Users/ssavas/go/pkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:232 +0xcb
[manager] sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).worker(0xc00071e180)
[manager] /Users/ssavas/go/pkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:211 +0x2b
[manager] k8s.io/apimachinery/pkg/util/wait.JitterUntil.func1(0xc000314200)
[manager] /Users/ssavas/go/pkg/mod/k8s.io/[email protected]/pkg/util/wait/wait.go:152 +0x5e
[manager] k8s.io/apimachinery/pkg/util/wait.JitterUntil(0xc000314200, 0x3b9aca00, 0x0, 0x1, 0xc00016d3e0)
[manager] /Users/ssavas/go/pkg/mod/k8s.io/[email protected]/pkg/util/wait/wait.go:153 +0xf8
[manager] k8s.io/apimachinery/pkg/util/wait.Until(0xc000314200, 0x3b9aca00, 0xc00016d3e0)
[manager] /Users/ssavas/go/pkg/mod/k8s.io/[email protected]/pkg/util/wait/wait.go:88 +0x4d
[manager] created by sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func1
[manager] /Users/ssavas/go/pkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:193 +0x328
[manager] I0417 01:59:41.367953 7 controller.go:179] controllers/KubeadmControlPlane "msg"="Reconcile KubeadmControlPlane" "cluster"="sedef" "kubeadmControlPlane"="sedef" "namespace"="default”
[manager] I0417 01:59:41.372939 7 controller.go:225] controllers/KubeadmControlPlane "msg"="Upgrading Control Plane" "cluster"="sedef" "kubeadmControlPlane"="sedef" "namespace"="default”
[manager] panic: runtime error: invalid memory address or nil pointer dereference [recovered]
[manager] panic: runtime error: invalid memory address or nil pointer dereference
[manager] [signal SIGSEGV: segmentation violation code=0x1 addr=0x28 pc=0x152f099]
[manager] [manager] goroutine 316 [running]:
[manager] k8s.io/apimachinery/pkg/util/runtime.HandleCrash(0x0, 0x0, 0x0
[manager] /Users/ssavas/go/pkg/mod/k8s.io/[email protected]/pkg/util/runtime/runtime.go:55 +0x105
[manager] panic(0x16b0ec0, 0x2712180)
[manager] /usr/local/Cellar/go/1.13.8/libexec/src/runtime/panic.go:679 +0x1b2
[manager] sigs.k8s.io/cluster-api/controlplane/kubeadm/internal.(*Workload).ForwardEtcdLeadership(0xc00e0051d0, 0x1b0da20, 0xc000046098, 0xc00d614900, 0xc00d614b20, 0xc00bc79788, 0x0)
[manager] /Users/ssavas/dev/capi/tilttest/cluster-api/controlplane/kubeadm/internal/workload_cluster_etcd.go:275 +0x1b9
[manager] sigs.k8s.io/cluster-api/controlplane/kubeadm/controllers.(*KubeadmControlPlaneReconciler).scaleDownControlPlane(0xc000120ba0, 0x1b0da20, 0xc000046098, 0xc00b70fc80, 0xc00c99eb00, 0xc00e857788, 0x0, 0x0, 0x0, 0x0) [manager] /Users/ssavas/dev/capi/tilttest/cluster-api/controlplane/kubeadm/controllers/scale.go:103 +0x260 [manager] sigs.k8s.io/cluster-api/controlplane/kubeadm/controllers.(*KubeadmControlPlaneReconciler).upgradeControlPlane(0xc000120ba0, 0x1b0da20, 0xc000046098, 0xc00b70fc80, 0xc00c99eb00, 0xc00bc79788, 0x5, 0xc00012fdd0, 0x1, 0x1)
[manager] /Users/ssavas/dev/capi/tilttest/cluster-api/controlplane/kubeadm/controllers/upgrade.go:91 +0x69c
[manager] sigs.k8s.io/cluster-api/controlplane/kubeadm/controllers.(*KubeadmControlPlaneReconciler).reconcile(0xc000120ba0, 0x1b0da20, 0xc000046098, 0xc00b70fc80, 0xc00c99eb00, 0x0, 0x0, 0x0, 0xc000b34750) [manager] /Users/ssavas/dev/capi/tilttest/cluster-api/controlplane/kubeadm/controllers/controller.go:226 +0x11c1 [manager] sigs.k8s.io/cluster-api/controlplane/kubeadm/controllers.(*KubeadmControlPlaneReconciler).Reconcile(0xc000120ba0, 0xc0007f1500, 0x7, 0xc0007f14f0, 0x5, 0xc00bc79c00, 0x4a817c800, 0x0, 0x0)
[manager] /Users/ssavas/dev/capi/tilttest/cluster-api/controlplane/kubeadm/controllers/controller.go:173 +0x632
It seems like that failure is related to the leaderCandidate not having a NodeRef yet, which is a little strange.
Aren't we waiting any longer that all the nodes in the control plane are ready before proceeding to delete the older machines?
I don't see a node ready check, without CNI is installed, 3 node control-plane upgrades are working fine.
Can you run a test locally with a custom build after adding a check the NodeRef is there for the leaderCandidate and see if that fixes it?
Another set of errors I see when the times it does not panic during upgrade is etcd remove member error:
I0417 18:26:52.486997 7 controller.go:225] controllers/KubeadmControlPlane "msg"="Upgrading Control Plane" "cluster"="sedef" "kubeadmControlPlane"="sedef" "namespace"="default"
E0417 18:27:03.883410 7 controller.go:258] controller-runtime/controller "msg"="Reconciler error" "error"="failed to reconcile the remote kubelet RBAC role: failed to determine if kubelet config rbac role \"kubeadm:kubelet-config-1.17\" already exists: etcdserver: request timed out" "controller"="kubeadmcontrolplane" "request"={"Namespace":"default","Name":"sedef"}
I0417 18:27:03.883958 7 controller.go:179] controllers/KubeadmControlPlane "msg"="Reconcile KubeadmControlPlane" "cluster"="sedef" "kubeadmControlPlane"="sedef" "namespace"="default"
I0417 18:27:03.884551 7 controller.go:225] controllers/KubeadmControlPlane "msg"="Upgrading Control Plane" "cluster"="sedef" "kubeadmControlPlane"="sedef" "namespace"="default"
I0417 18:27:14.322259 7 controller.go:179] controllers/KubeadmControlPlane "msg"="Reconcile KubeadmControlPlane" "cluster"="sedef" "kubeadmControlPlane"="sedef" "namespace"="default"
I0417 18:27:14.322810 7 controller.go:225] controllers/KubeadmControlPlane "msg"="Upgrading Control Plane" "cluster"="sedef" "kubeadmControlPlane"="sedef" "namespace"="default"
I0417 18:27:25.408750 7 controller.go:179] controllers/KubeadmControlPlane "msg"="Reconcile KubeadmControlPlane" "cluster"="sedef" "kubeadmControlPlane"="sedef" "namespace"="default"
I0417 18:27:25.409980 7 controller.go:225] controllers/KubeadmControlPlane "msg"="Upgrading Control Plane" "cluster"="sedef" "kubeadmControlPlane"="sedef" "namespace"="default"
I0417 18:27:49.643253 7 controller.go:179] controllers/KubeadmControlPlane "msg"="Reconcile KubeadmControlPlane" "cluster"="sedef" "kubeadmControlPlane"="sedef" "namespace"="default"
I0417 18:27:49.643485 7 controller.go:225] controllers/KubeadmControlPlane "msg"="Upgrading Control Plane" "cluster"="sedef" "kubeadmControlPlane"="sedef" "namespace"="default"
I0417 18:28:09.644642 7 controller.go:179] controllers/KubeadmControlPlane "msg"="Reconcile KubeadmControlPlane" "cluster"="sedef" "kubeadmControlPlane"="sedef" "namespace"="default"
I0417 18:28:09.645094 7 controller.go:225] controllers/KubeadmControlPlane "msg"="Upgrading Control Plane" "cluster"="sedef" "kubeadmControlPlane"="sedef" "namespace"="default"
I0417 18:28:59.460770 7 controller.go:179] controllers/KubeadmControlPlane "msg"="Reconcile KubeadmControlPlane" "cluster"="sedef" "kubeadmControlPlane"="sedef" "namespace"="default"
I0417 18:28:59.461633 7 controller.go:225] controllers/KubeadmControlPlane "msg"="Upgrading Control Plane" "cluster"="sedef" "kubeadmControlPlane"="sedef" "namespace"="default"
I0417 18:29:44.932322 7 controller.go:179] controllers/KubeadmControlPlane "msg"="Reconcile KubeadmControlPlane" "cluster"="sedef" "kubeadmControlPlane"="sedef" "namespace"="default"
I0417 18:29:44.932766 7 controller.go:225] controllers/KubeadmControlPlane "msg"="Upgrading Control Plane" "cluster"="sedef" "kubeadmControlPlane"="sedef" "namespace"="default"
I0417 18:30:31.972144 7 controller.go:179] controllers/KubeadmControlPlane "msg"="Reconcile KubeadmControlPlane" "cluster"="sedef" "kubeadmControlPlane"="sedef" "namespace"="default"
I0417 18:30:31.973192 7 controller.go:225] controllers/KubeadmControlPlane "msg"="Upgrading Control Plane" "cluster"="sedef" "kubeadmControlPlane"="sedef" "namespace"="default"
I0417 18:31:02.415326 7 controller.go:179] controllers/KubeadmControlPlane "msg"="Reconcile KubeadmControlPlane" "cluster"="sedef" "kubeadmControlPlane"="sedef" "namespace"="default"
I0417 18:31:02.415963 7 controller.go:225] controllers/KubeadmControlPlane "msg"="Upgrading Control Plane" "cluster"="sedef" "kubeadmControlPlane"="sedef" "namespace"="default"
I0417 18:31:29.956943 7 controller.go:179] controllers/KubeadmControlPlane "msg"="Reconcile KubeadmControlPlane" "cluster"="sedef" "kubeadmControlPlane"="sedef" "namespace"="default"
I0417 18:31:29.957272 7 controller.go:225] controllers/KubeadmControlPlane "msg"="Upgrading Control Plane" "cluster"="sedef" "kubeadmControlPlane"="sedef" "namespace"="default"
I0417 18:31:59.924889 7 controller.go:179] controllers/KubeadmControlPlane "msg"="Reconcile KubeadmControlPlane" "cluster"="sedef" "kubeadmControlPlane"="sedef" "namespace"="default"
I0417 18:31:59.925248 7 controller.go:225] controllers/KubeadmControlPlane "msg"="Upgrading Control Plane" "cluster"="sedef" "kubeadmControlPlane"="sedef" "namespace"="default"
E0417 18:32:32.872552 7 scale.go:108] "msg"="Failed to remove etcd member for machine" "error"="failed to create etcd client: unable to create etcd client: context deadline exceeded" "cluster-name"="sedef" "name"="sedef" "namespace"="default"
E0417 18:32:34.915981 7 controller.go:258] controller-runtime/controller "msg"="Reconciler error" "error"="failed to create etcd client: unable to create etcd client: context deadline exceeded" "controller"="kubeadmcontrolplane" "request"={"Namespace":"default","Name":"sedef"}
I0417 18:32:34.916224 7 controller.go:179] controllers/KubeadmControlPlane "msg"="Reconcile KubeadmControlPlane" "cluster"="sedef" "kubeadmControlPlane"="sedef" "namespace"="default"
I0417 18:32:34.916528 7 controller.go:225] controllers/KubeadmControlPlane "msg"="Upgrading Control Plane" "cluster"="sedef" "kubeadmControlPlane"="sedef" "namespace"="default"
I0417 18:32:59.927359 7 controller.go:179] controllers/KubeadmControlPlane "msg"="Reconcile KubeadmControlPlane" "cluster"="sedef" "kubeadmControlPlane"="sedef" "namespace"="default"
I0417 18:32:59.927820 7 controller.go:225] controllers/KubeadmControlPlane "msg"="Upgrading Control Plane" "cluster"="sedef" "kubeadmControlPlane"="sedef" "namespace"="default"
E0417 18:34:38.345803 7 controller.go:147] controllers/KubeadmControlPlane "msg"="Failed to update KubeadmControlPlane Status" "error"="Get https://192.168.5.16:6443/api/v1/nodes?labelSelector=node-role.kubernetes.io%2Fmaster%3D\u0026timeout=30s: net/http: request canceled (Client.Timeout exceeded while awaiting headers)" "cluster"="sedef" "kubeadmControlPlane"="sedef" "namespace"="default"
E0417 18:34:38.355740 7 controller.go:258] controller-runtime/controller "msg"="Reconciler error" "error"="Get https://192.168.5.16:6443/api/v1/nodes?labelSelector=node-role.kubernetes.io%2Fmaster%3D\u0026timeout=30s: net/http: request canceled (Client.Timeout exceeded while awaiting headers)" "controller"="kubeadmcontrolplane" "request"={"Namespace":"default","Name":"sedef"}
I0417 18:34:38.356737 7 controller.go:179] controllers/KubeadmControlPlane "msg"="Reconcile KubeadmControlPlane" "cluster"="sedef" "kubeadmControlPlane"="sedef" "namespace"="default"
I0417 18:34:38.359137 7 controller.go:225] controllers/KubeadmControlPlane "msg"="Upgrading Control Plane" "cluster"="sedef" "kubeadmControlPlane"="sedef" "namespace"="default"
E0417 18:34:56.915696 7 controller.go:258] controller-runtime/controller "msg"="Reconciler error" "error"="failed to reconcile the remote kubelet RBAC role: failed to determine if kubelet config rbac role \"kubeadm:kubelet-config-1.17\" already exists: Get https://192.168.5.16:6443/apis/rbac.authorization.k8s.io/v1/namespaces/kube-system/roles/kubeadm:kubelet-config-1.17?timeout=30s: http2: server sent GOAWAY and closed the connection; LastStreamID=14267, ErrCode=NO_ERROR, debug=\"\"" "controller"="kubeadmcontrolplane" "request"={"Namespace":"default","Name":"sedef"}
I0417 18:34:56.916109 7 controller.go:179] controllers/KubeadmControlPlane "msg"="Reconcile KubeadmControlPlane" "cluster"="sedef" "kubeadmControlPlane"="sedef" "namespace"="default"
I0417 18:34:56.916387 7 controller.go:225] controllers/KubeadmControlPlane "msg"="Upgrading Control Plane" "cluster"="sedef" "kubeadmControlPlane"="sedef" "namespace"="default"
W0417 18:35:14.844579 7 http.go:392] Error reading backend response: unexpected EOF
Now testing what @vincepri suggested.
If I had to guess, it seems like it's trying to remove the etcd member too soon
@vincepri Noderef is not the issue, we already wait for kube-api-server to be ready.
Tested, same result.
/assign
We shouldn't have panics though, so we need a check in place somewhere before doing that scale down
we already wait for kube-api-server to be ready
We need to wait for the NodeRef to be on Machines
Yes, in the scale down. It is failing to remove etcd member, hence machine is never deleted.
[manager] E0416 21:46:02.455746 8 scale.go:117] "msg"="Failed to remove etcd member for machine" "error"="failed to create etcd client: unable to create etcd client: context deadline exceeded" "cluster-nanme"="sedef" "name"="sedef" "namespace"="default"I don't understand why it happens only sometimes though.
I have seen that consistently in metal3-dev-env.
In the upgrade of K8S version process, both during scale up and scale down. In order to reach the scale down phase.
@sedefsavas Do you have any update on this issue or some time line ?
@Xenwar There is a PR currently open that has more information #2958, this is considered release blocking, need to have a fix before v0.3.4 is cut.
@vincepri Thanks, will follow the issue.