kops version
Version 1.12.1
During kops upgrade:
k8s: 1.11.9 > 1.12.8
image: kope.io/k8s-1.11-debian-stretch-amd64-hvm-ebs-2018-08-17 -> kope.io/k8s-1.12-debian-stretch-amd64-hvm-ebs-2019-05-13
Running on AWS, after running kops upgrade cluster --yes
kops rolling-update cluster --yes
NAME STATUS NEEDUPDATE READY MIN MAX NODES
bastions NeedsUpdate 1 0 1 1 0
master-us-east-2a NeedsUpdate 1 0 1 1 1
master-us-east-2b NeedsUpdate 1 0 1 1 1
master-us-east-2c NeedsUpdate 1 0 1 1 1
nodes NeedsUpdate 2 0 1 2 2
I0619 10:13:54.722629 25392 instancegroups.go:301] Stopping instance "i-0133631b3142fe6e2", in group "bastions.us-east-2.k8s.redactedinc.com" (this may take a while).
I0619 10:13:55.290878 25392 instancegroups.go:198] waiting for 5m0s after terminating instance
I0619 10:18:55.282281 25392 instancegroups.go:202] Deleted a bastion instance, i-0133631b3142fe6e2, and continuing with rolling-update.
I0619 10:18:58.151787 25392 instancegroups.go:165] Draining the node: "ip-10-40-63-102.us-east-2.compute.internal".
node/ip-10-40-63-102.us-east-2.compute.internal cordoned
node/ip-10-40-63-102.us-east-2.compute.internal cordoned
WARNING: Ignoring DaemonSet-managed pods: lacework-agent-snxnp, weave-net-n95tf; Deleting pods with local storage: kubernetes-dashboard-6c664cf6c5-6pvrv
pod/dns-controller-779bbdc6dd-shpkr evicted
pod/kubernetes-dashboard-6c664cf6c5-6pvrv evicted
I0619 10:19:04.649307 25392 instancegroups.go:358] Waiting for 1m30s for pods to stabilize after draining.
I0619 10:20:34.647492 25392 instancegroups.go:185] deleting node "ip-10-40-63-102.us-east-2.compute.internal" from kubernetes
I0619 10:20:34.762982 25392 instancegroups.go:299] Stopping instance "i-0b0b71a57df7da5c8", node "ip-10-40-63-102.us-east-2.compute.internal", in group "master-us-east-2a.masters.us-east-2.k8s.redactedinc.com" (this may take a while).
I0619 10:20:35.595988 25392 instancegroups.go:198] waiting for 5m0s after terminating instance
I0619 10:25:35.588130 25392 instancegroups.go:209] Validating the cluster.
I0619 10:25:38.189710 25392 instancegroups.go:273] Cluster did not pass validation, will try again in "30s" until duration "5m0s" expires: machine "i-0d31990e514939c2c" has not yet joined cluster.
I0619 10:26:10.016482 25392 instancegroups.go:273] Cluster did not pass validation, will try again in "30s" until duration "5m0s" expires: machine "i-0d31990e514939c2c" has not yet joined cluster.
I0619 10:26:40.289218 25392 instancegroups.go:273] Cluster did not pass validation, will try again in "30s" until duration "5m0s" expires: machine "i-0d31990e514939c2c" has not yet joined cluster.
I0619 10:27:10.197925 25392 instancegroups.go:273] Cluster did not pass validation, will try again in "30s" until duration "5m0s" expires: machine "i-0d31990e514939c2c" has not yet joined cluster.
I0619 10:27:40.076651 25392 instancegroups.go:273] Cluster did not pass validation, will try again in "30s" until duration "5m0s" expires: machine "i-0d31990e514939c2c" has not yet joined cluster.
I0619 10:28:10.179322 25392 instancegroups.go:273] Cluster did not pass validation, will try again in "30s" until duration "5m0s" expires: machine "i-0d31990e514939c2c" has not yet joined cluster.
I0619 10:28:39.913678 25392 instancegroups.go:273] Cluster did not pass validation, will try again in "30s" until duration "5m0s" expires: machine "i-0d31990e514939c2c" has not yet joined cluster.
I0619 10:29:10.133790 25392 instancegroups.go:273] Cluster did not pass validation, will try again in "30s" until duration "5m0s" expires: machine "i-0d31990e514939c2c" has not yet joined cluster.
I0619 10:29:40.084387 25392 instancegroups.go:273] Cluster did not pass validation, will try again in "30s" until duration "5m0s" expires: machine "i-0d31990e514939c2c" has not yet joined cluster.
I0619 10:30:10.499483 25392 instancegroups.go:273] Cluster did not pass validation, will try again in "30s" until duration "5m0s" expires: machine "i-0d31990e514939c2c" has not yet joined cluster.
E0619 10:30:38.180754 25392 instancegroups.go:214] Cluster did not validate within 5m0s
master not healthy after update, stopping rolling-update: "error validating cluster after removing a node: cluster did not validate within a duration of \"5m0s\""
A new master node was created on AWS and boots.
After SSHing into the master I see:
systemctl status kubelet
● kubelet.service - Kubernetes Kubelet Server
Loaded: loaded (/lib/systemd/system/kubelet.service; static; vendor preset: enabled)
Active: active (running) since Wed 2019-06-19 17:54:28 UTC; 14min ago
Docs: https://github.com/kubernetes/kubernetes
Main PID: 2607 (kubelet)
Tasks: 17 (limit: 4915)
Memory: 55.8M
CPU: 17.193s
CGroup: /system.slice/kubelet.service
└─2607 /usr/local/bin/kubelet --allow-privileged=true --anonymous-auth=false --cgroup-root=/ --client-ca-file=/srv/kubernetes/ca.crt --cloud-provider=aws --cluster-dns=100.64.0.10 --cluster-d
omain=cluster.local --enable-debugging-handlers=true --eviction-hard=memory.available<100Mi,nodefs.available<10%,nodefs.inodesFree<5%,imagefs.available<10%,imagefs.inodesFree<5% --feature-gates=Experime
ntalCriticalPodAnnotation=true --hostname-override=ip-10-40-42-180.us-east-2.compute.internal --kubeconfig=/var/lib/kubelet/kubeconfig --network-plugin=cni --node-labels=kops.k8s.io/instancegroup=master
-us-east-2a,kubernetes.io/role=master,node-role.kubernetes.io/master= --non-masquerade-cidr=100.64.0.0/10 --pod-infra-container-image=k8s.gcr.io/pause-amd64:3.0 --pod-manifest-path=/etc/kubernetes/manif
ests --register-schedulable=true --register-with-taints=node-role.kubernetes.io/master=:NoSchedule --v=2 --cni-bin-dir=/opt/cni/bin/ --cni-conf-dir=/etc/cni/net.d/
Jun 19 18:08:56 ip-10-40-42-180 kubelet[2607]: E0619 18:08:56.513836 2607 kubelet.go:2236] node "ip-10-40-42-180.us-east-2.compute.internal" not found
Jun 19 18:08:56 ip-10-40-42-180 kubelet[2607]: E0619 18:08:56.614643 2607 kubelet.go:2236] node "ip-10-40-42-180.us-east-2.compute.internal" not found
Jun 19 18:08:56 ip-10-40-42-180 kubelet[2607]: E0619 18:08:56.715408 2607 kubelet.go:2236] node "ip-10-40-42-180.us-east-2.compute.internal" not found
Jun 19 18:08:56 ip-10-40-42-180 kubelet[2607]: E0619 18:08:56.816111 2607 kubelet.go:2236] node "ip-10-40-42-180.us-east-2.compute.internal" not found
Jun 19 18:08:56 ip-10-40-42-180 kubelet[2607]: E0619 18:08:56.916789 2607 kubelet.go:2236] node "ip-10-40-42-180.us-east-2.compute.internal" not found
Jun 19 18:08:56 ip-10-40-42-180 kubelet[2607]: I0619 18:08:56.960509 2607 prober.go:111] Liveness probe for "kube-apiserver-ip-10-40-42-180.us-east-2.compute.internal_kube-system(958e40c338777b454bee
6c4539a87db3):kube-apiserver" failed (failure): Get http://127.0.0.1:8080/healthz: net/http: request canceled (Client.Timeout exceeded while awaiting headers)
Jun 19 18:08:57 ip-10-40-42-180 kubelet[2607]: E0619 18:08:57.017549 2607 kubelet.go:2236] node "ip-10-40-42-180.us-east-2.compute.internal" not found
Jun 19 18:08:57 ip-10-40-42-180 kubelet[2607]: E0619 18:08:57.118275 2607 kubelet.go:2236] node "ip-10-40-42-180.us-east-2.compute.internal" not found
Jun 19 18:08:57 ip-10-40-42-180 kubelet[2607]: E0619 18:08:57.219166 2607 kubelet.go:2236] node "ip-10-40-42-180.us-east-2.compute.internal" not found
Jun 19 18:08:57 ip-10-40-42-180 kubelet[2607]: E0619 18:08:57.319979 2607 kubelet.go:2236] node "ip-10-40-42-180.us-east-2.compute.internal" not found
Cluster manifest:
apiVersion: kops/v1alpha2
kind: Cluster
metadata:
creationTimestamp: 2018-09-18T09:31:13Z
name: us-east-2.k8s.redactedinc.com
spec:
api:
loadBalancer:
type: Public
authorization:
rbac: {}
channel: stable
cloudLabels:
Dept: ops
cloudProvider: aws
configBase: s3://us-east-2-il-kops-state-store/us-east-2.k8s.redactedinc.com
etcdClusters:
- etcdMembers:
- instanceGroup: master-us-east-2a
name: a
- instanceGroup: master-us-east-2b
name: b
- instanceGroup: master-us-east-2c
name: c
name: main
- etcdMembers:
- instanceGroup: master-us-east-2a
name: a
- instanceGroup: master-us-east-2b
name: b
- instanceGroup: master-us-east-2c
name: c
name: events
iam:
allowContainerRegistry: true
legacy: false
kubelet:
anonymousAuth: false
kubernetesApiAccess:
- 0.0.0.0/0
kubernetesVersion: 1.12.8
masterInternalName: api.internal.us-east-2.k8s.redactedinc.com
masterPublicName: api.us-east-2.k8s.redactedinc.com
networkCIDR: 10.40.0.0/16
networkID: vpc-08bcbfdee2514b305
networking:
weave:
mtu: 8912
nonMasqueradeCIDR: 100.64.0.0/10
sshAccess:
- 0.0.0.0/0
subnets:
- cidr: 10.40.32.0/19
name: us-east-2a
type: Private
zone: us-east-2a
- cidr: 10.40.64.0/19
name: us-east-2b
type: Private
zone: us-east-2b
- cidr: 10.40.96.0/19
name: us-east-2c
type: Private
zone: us-east-2c
- cidr: 10.40.0.0/22
name: utility-us-east-2a
type: Utility
zone: us-east-2a
- cidr: 10.40.4.0/22
name: utility-us-east-2b
type: Utility
zone: us-east-2b
- cidr: 10.40.8.0/22
name: utility-us-east-2c
type: Utility
zone: us-east-2c
topology:
bastion:
bastionPublicName: bastion.us-east-2.k8s.redactedinc.com
dns:
type: Public
masters: private
nodes: private
---
apiVersion: kops/v1alpha2
kind: InstanceGroup
metadata:
creationTimestamp: 2018-09-18T09:31:14Z
labels:
kops.k8s.io/cluster: us-east-2.k8s.redactedinc.com
name: bastions
spec:
image: kope.io/k8s-1.12-debian-stretch-amd64-hvm-ebs-2019-05-13
machineType: t2.micro
maxSize: 1
minSize: 1
nodeLabels:
kops.k8s.io/instancegroup: bastions
role: Bastion
subnets:
- utility-us-east-2a
- utility-us-east-2b
- utility-us-east-2c
---
apiVersion: kops/v1alpha2
kind: InstanceGroup
metadata:
creationTimestamp: 2018-09-18T09:31:13Z
labels:
kops.k8s.io/cluster: us-east-2.k8s.redactedinc.com
name: master-us-east-2a
spec:
image: kope.io/k8s-1.12-debian-stretch-amd64-hvm-ebs-2019-05-13
machineType: t3.medium
maxSize: 1
minSize: 1
nodeLabels:
kops.k8s.io/instancegroup: master-us-east-2a
role: Master
subnets:
- us-east-2a
---
apiVersion: kops/v1alpha2
kind: InstanceGroup
metadata:
creationTimestamp: 2018-09-18T09:31:13Z
labels:
kops.k8s.io/cluster: us-east-2.k8s.redactedinc.com
name: master-us-east-2b
spec:
image: kope.io/k8s-1.12-debian-stretch-amd64-hvm-ebs-2019-05-13
machineType: t3.medium
maxSize: 1
minSize: 1
nodeLabels:
kops.k8s.io/instancegroup: master-us-east-2b
role: Master
subnets:
- us-east-2b
---
apiVersion: kops/v1alpha2
kind: InstanceGroup
metadata:
creationTimestamp: 2018-09-18T09:31:13Z
labels:
kops.k8s.io/cluster: us-east-2.k8s.redactedinc.com
name: master-us-east-2c
spec:
image: kope.io/k8s-1.12-debian-stretch-amd64-hvm-ebs-2019-05-13
machineType: t3.medium
maxSize: 1
minSize: 1
nodeLabels:
kops.k8s.io/instancegroup: master-us-east-2c
role: Master
subnets:
- us-east-2c
---
apiVersion: kops/v1alpha2
kind: InstanceGroup
metadata:
creationTimestamp: 2018-09-18T09:31:14Z
labels:
kops.k8s.io/cluster: us-east-2.k8s.redactedinc.com
name: nodes
spec:
image: kope.io/k8s-1.12-debian-stretch-amd64-hvm-ebs-2019-05-13
machineType: r5a.large
maxSize: 2
minSize: 1
nodeLabels:
kops.k8s.io/instancegroup: nodes
role: Node
subnets:
- us-east-2a
Please let me know if there is any other info I can get. Pretty concerning as now our cluster is in a bad state with a missing master. Any suggestions greatly welcomed!
From /var/log/etcd.log -- looks like maybe an etcd mismatch? Shouldn't kops be aware of this/handle this as part of the upgrade command?
W0619 18:16:55.567966 3754 etcdclusterstate.go:97] unable to find node for member "etcd-c"; using default clientURLs [http://etcd-c.internal.us-east-2.k8s.redactedinc.com:4001]
W0619 18:16:55.588756 3754 etcdclusterstate.go:97] unable to find node for member "etcd-b"; using default clientURLs [http://etcd-b.internal.us-east-2.k8s.redactedinc.com:4001]
I0619 18:16:55.590672 3754 controller.go:276] etcd cluster state: etcdClusterState
members:
{"name":"etcd-c","peerURLs":["http://etcd-c.internal.us-east-2.k8s.redactedinc.com:2380"],"endpoints":["http://etcd-c.internal.us-east-2.k8s.redactedinc.com:4001"],"ID":"50114dd36d2f3dc3"}
{"name":"etcd-a","peerURLs":["https://etcd-a.internal.us-east-2.k8s.redactedinc.com:2380"],"endpoints":["https://etcd-a.internal.us-east-2.k8s.redactedinc.com:4001"],"ID":"c0029d9ef59e42bc"}
{"name":"etcd-b","peerURLs":["http://etcd-b.internal.us-east-2.k8s.redactedinc.com:2380"],"endpoints":["http://etcd-b.internal.us-east-2.k8s.redactedinc.com:4001"],"ID":"e0ad5a0dbcdd8e79"}
peers:
etcdClusterPeerInfo{peer=peer{id:"etcd-a" endpoints:"10.40.42.180:3996" }, info=cluster_name:"etcd" node_configuration:<name:"etcd-a" peer_urls:"https://etcd-a.internal.us-east-2.k8s.redactedinc.com:2380" client_urls:"https://etcd-a.internal.us-east-2.k8s.redactedinc.com:4001" quarantined_client_urls:"https://etcd-a.internal.us-east-2.k8s.redactedinc.com:3994" > etcd_state:<cluster:<desired_cluster_size:3 cluster_token:"etcd-cluster-token-etcd" nodes:<name:"etcd-a" peer_urls:"https://etcd-a.internal.us-east-2.k8s.redactedinc.com:2380" client_urls:"https://0.0.0.0:4001" quarantined_client_urls:"http://0.0.0.0:3994" tls_enabled:true > nodes:<name:"etcd-b" peer_urls:"http://etcd-b.internal.us-east-2.k8s.redactedinc.com:2380" client_urls:"http://0.0.0.0:4001" quarantined_client_urls:"http://0.0.0.0:3994" > nodes:<name:"etcd-c" peer_urls:"http://etcd-c.internal.us-east-2.k8s.redactedinc.com:2380" client_urls:"http://0.0.0.0:4001" quarantined_client_urls:"http://0.0.0.0:3994" > > etcd_version:"2.2.1" > }
I0619 18:16:55.590782 3754 controller.go:277] etcd cluster members: map[50114dd36d2f3dc3:{"name":"etcd-c","peerURLs":["http://etcd-c.internal.us-east-2.k8s.redactedinc.com:2380"],"endpoints":["http://etcd-c.internal.us-east-2.k8s.redactedinc.com:4001"],"ID":"50114dd36d2f3dc3"} c0029d9ef59e42bc:{"name":"etcd-a","peerURLs":["https://etcd-a.internal.us-east-2.k8s.redactedinc.com:2380"],"endpoints":["https://etcd-a.internal.us-east-2.k8s.redactedinc.com:4001"],"ID":"c0029d9ef59e42bc"} e0ad5a0dbcdd8e79:{"name":"etcd-b","peerURLs":["http://etcd-b.internal.us-east-2.k8s.redactedinc.com:2380"],"endpoints":["http://etcd-b.internal.us-east-2.k8s.redactedinc.com:4001"],"ID":"e0ad5a0dbcdd8e79"}]
I0619 18:16:55.590804 3754 controller.go:615] sending member map to all peers: members:<name:"etcd-a" dns:"etcd-a.internal.us-east-2.k8s.redactedinc.com" addresses:"10.40.42.180" >
I0619 18:16:55.590979 3754 etcdserver.go:222] updating hosts: map[10.40.42.180:[etcd-a.internal.us-east-2.k8s.redactedinc.com]]
I0619 18:16:55.590993 3754 hosts.go:84] hosts update: primary=map[10.40.42.180:[etcd-a.internal.us-east-2.k8s.redactedinc.com]], fallbacks=map[etcd-a.internal.us-east-2.k8s.redactedinc.com:[10.40.42.180 10.40.42.180] etcd-b.internal.us-east-2.k8s.redactedinc.com:[10.40.66.188 10.40.66.188] etcd-c.internal.us-east-2.k8s.redactedinc.com:[10.40.126.31 10.40.126.31]], final=map[10.40.126.31:[etcd-c.internal.us-east-2.k8s.redactedinc.com etcd-c.internal.us-east-2.k8s.redactedinc.com] 10.40.42.180:[etcd-a.internal.us-east-2.k8s.redactedinc.com] 10.40.66.188:[etcd-b.internal.us-east-2.k8s.redactedinc.com etcd-b.internal.us-east-2.k8s.redactedinc.com]]
I0619 18:16:55.600464 3754 commands.go:22] not refreshing commands - TTL not hit
I0619 18:16:55.600490 3754 s3fs.go:220] Reading file "s3://us-east-2-il-kops-state-store/us-east-2.k8s.redactedinc.com/backups/etcd/main/control/etcd-cluster-created"
I0619 18:16:55.609470 3754 controller.go:369] spec member_count:3 etcd_version:"3.2.24"
I0619 18:16:55.609531 3754 controller.go:417] mismatched version for peer peer{id:"etcd-a" endpoints:"10.40.42.180:3996" }: want "3.2.24", have "2.2.1"
I0619 18:16:55.609556 3754 controller.go:423] can't do in-place upgrade from "2.2.1" -> "3.2.24"
I0619 18:16:55.609576 3754 controller.go:517] detected that we need to upgrade/downgrade etcd
I0619 18:16:55.609590 3754 controller.go:526] upgrade/downgrade needed, but we don't have sufficient peers
I0619 18:16:58.043759 3754 peers.go:281] connecting to peer "etcd-b" with TLS policy, servername="etcd-manager-server-etcd-b"
W0619 18:16:58.048338 3754 peers.go:325] unable to grpc-ping discovered peer 10.40.66.188:3996: rpc error: code = Unavailable desc = all SubConns are in TransientFailure
I0619 18:16:58.048367 3754 peers.go:347] was not able to connect to peer etcd-b: map[10.40.66.188:3996:true]
W0619 18:16:58.048381 3754 peers.go:215] unexpected error from peer intercommunications: unable to connect to peer etcd-b
I0619 18:16:58.147534 3754 peers.go:281] connecting to peer "etcd-c" with TLS policy, servername="etcd-manager-server-etcd-c"
W0619 18:16:58.152316 3754 peers.go:325] unable to grpc-ping discovered peer 10.40.126.31:3996: rpc error: code = Unavailable desc = all SubConns are in TransientFailure
I0619 18:16:58.152345 3754 peers.go:347] was not able to connect to peer etcd-c: map[10.40.126.31:3996:true]
W0619 18:16:58.152360 3754 peers.go:215] unexpected error from peer intercommunications: unable to connect to peer etcd-c
I0619 18:17:03.048498 3754 peers.go:281] connecting to peer "etcd-b" with TLS policy, servername="etcd-manager-server-etcd-b"
W0619 18:17:03.049586 3754 peers.go:325] unable to grpc-ping discovered peer 10.40.66.188:3996: rpc error: code = Unavailable desc = all SubConns are in TransientFailure
I0619 18:17:03.049603 3754 peers.go:347] was not able to connect to peer etcd-b: map[10.40.66.188:3996:true]
W0619 18:17:03.049617 3754 peers.go:215] unexpected error from peer intercommunications: unable to connect to peer etcd-b
I0619 18:17:03.152485 3754 peers.go:281] connecting to peer "etcd-c" with TLS policy, servername="etcd-manager-server-etcd-c"
W0619 18:17:03.153952 3754 peers.go:325] unable to grpc-ping discovered peer 10.40.126.31:3996: rpc error: code = Unavailable desc = all SubConns are in TransientFailure
I0619 18:17:03.153973 3754 peers.go:347] was not able to connect to peer etcd-c: map[10.40.126.31:3996:true]
W0619 18:17:03.156241 3754 peers.go:215] unexpected error from peer intercommunications: unable to connect to peer etcd-c
I0619 18:17:05.450434 3754 volumes.go:85] AWS API Request: ec2/DescribeVolumes
I0619 18:17:05.543514 3754 volumes.go:85] AWS API Request: ec2/DescribeInstances
I0619 18:17:05.586794 3754 hosts.go:84] hosts update: primary=map[10.40.42.180:[etcd-a.internal.us-east-2.k8s.redactedinc.com]], fallbacks=map[etcd-a.internal.us-east-2.k8s.redactedinc.com:[10.40.42.180 10.40.42.180] etcd-b.internal.us-east-2.k8s.redactedinc.com:[10.40.66.188 10.40.66.188] etcd-c.internal.us-east-2.k8s.redactedinc.com:[10.40.126.31 10.40.126.31]], final=map[10.40.126.31:[etcd-c.internal.us-east-2.k8s.redactedinc.com etcd-c.internal.us-east-2.k8s.redactedinc.com] 10.40.42.180:[etcd-a.internal.us-east-2.k8s.redactedinc.com] 10.40.66.188:[etcd-b.internal.us-east-2.k8s.redactedinc.com etcd-b.internal.us-east-2.k8s.redactedinc.com]]
I0619 18:17:05.616264 3754 controller.go:173] starting controller iteration
I0619 18:17:05.616303 3754 controller.go:269] I am leader with token "rgD81JOa6aHA3wsBY3qGfQ"
W0619 18:17:05.889836 3754 etcdclusterstate.go:97] unable to find node for member "etcd-b"; using default clientURLs [http://etcd-b.internal.us-east-2.k8s.redactedinc.com:4001]
W0619 18:17:05.896361 3754 etcdclusterstate.go:97] unable to find node for member "etcd-c"; using default clientURLs [http://etcd-c.internal.us-east-2.k8s.redactedinc.com:4001]
I0619 18:17:05.900282 3754 controller.go:276] etcd cluster state: etcdClusterState
members:
{"name":"etcd-c","peerURLs":["http://etcd-c.internal.us-east-2.k8s.redactedinc.com:2380"],"endpoints":["http://etcd-c.internal.us-east-2.k8s.redactedinc.com:4001"],"ID":"50114dd36d2f3dc3"}
{"name":"etcd-a","peerURLs":["https://etcd-a.internal.us-east-2.k8s.redactedinc.com:2380"],"endpoints":["https://etcd-a.internal.us-east-2.k8s.redactedinc.com:4001"],"ID":"c0029d9ef59e42bc"}
{"name":"etcd-b","peerURLs":["http://etcd-b.internal.us-east-2.k8s.redactedinc.com:2380"],"endpoints":["http://etcd-b.internal.us-east-2.k8s.redactedinc.com:4001"],"ID":"e0ad5a0dbcdd8e79"}
peers:
etcdClusterPeerInfo{peer=peer{id:"etcd-a" endpoints:"10.40.42.180:3996" }, info=cluster_name:"etcd" node_configuration:<name:"etcd-a" peer_urls:"https://etcd-a.internal.us-east-2.k8s.redactedinc.com:2380" client_urls:"https://etcd-a.internal.us-east-2.k8s.redactedinc.com:4001" quarantined_client_urls:"https://etcd-a.internal.us-east-2.k8s.redactedinc.com:3994" > etcd_state:<cluster:<desired_cluster_size:3 cluster_token:"etcd-cluster-token-etcd" nodes:<name:"etcd-a" peer_urls:"https://etcd-a.internal.us-east-2.k8s.redactedinc.com:2380" client_urls:"https://0.0.0.0:4001" quarantined_client_urls:"http://0.0.0.0:3994" tls_enabled:true > nodes:<name:"etcd-b" peer_urls:"http://etcd-b.internal.us-east-2.k8s.redactedinc.com:2380" client_urls:"http://0.0.0.0:4001" quarantined_client_urls:"http://0.0.0.0:3994" > nodes:<name:"etcd-c" peer_urls:"http://etcd-c.internal.us-east-2.k8s.redactedinc.com:2380" client_urls:"http://0.0.0.0:4001" quarantined_client_urls:"http://0.0.0.0:3994" > > etcd_version:"2.2.1" > }
I0619 18:17:05.900398 3754 controller.go:277] etcd cluster members: map[50114dd36d2f3dc3:{"name":"etcd-c","peerURLs":["http://etcd-c.internal.us-east-2.k8s.redactedinc.com:2380"],"endpoints":["http://etcd-c.internal.us-east-2.k8s.redactedinc.com:4001"],"ID":"50114dd36d2f3dc3"} c0029d9ef59e42bc:{"name":"etcd-a","peerURLs":["https://etcd-a.internal.us-east-2.k8s.redactedinc.com:2380"],"endpoints":["https://etcd-a.internal.us-east-2.k8s.redactedinc.com:4001"],"ID":"c0029d9ef59e42bc"} e0ad5a0dbcdd8e79:{"name":"etcd-b","peerURLs":["http://etcd-b.internal.us-east-2.k8s.redactedinc.com:2380"],"endpoints":["http://etcd-b.internal.us-east-2.k8s.redactedinc.com:4001"],"ID":"e0ad5a0dbcdd8e79"}]
I0619 18:17:05.900656 3754 controller.go:615] sending member map to all peers: members:<name:"etcd-a" dns:"etcd-a.internal.us-east-2.k8s.redactedinc.com" addresses:"10.40.42.180" >
I0619 18:17:05.900971 3754 etcdserver.go:222] updating hosts: map[10.40.42.180:[etcd-a.internal.us-east-2.k8s.redactedinc.com]]
I0619 18:17:05.900990 3754 hosts.go:84] hosts update: primary=map[10.40.42.180:[etcd-a.internal.us-east-2.k8s.redactedinc.com]], fallbacks=map[etcd-a.internal.us-east-2.k8s.redactedinc.com:[10.40.42.180 10.40.42.180] etcd-b.internal.us-east-2.k8s.redactedinc.com:[10.40.66.188 10.40.66.188] etcd-c.internal.us-east-2.k8s.redactedinc.com:[10.40.126.31 10.40.126.31]], final=map[10.40.126.31:[etcd-c.internal.us-east-2.k8s.redactedinc.com etcd-c.internal.us-east-2.k8s.redactedinc.com] 10.40.42.180:[etcd-a.internal.us-east-2.k8s.redactedinc.com] 10.40.66.188:[etcd-b.internal.us-east-2.k8s.redactedinc.com etcd-b.internal.us-east-2.k8s.redactedinc.com]]
I0619 18:17:05.901060 3754 hosts.go:181] skipping update of unchanged /etc/hosts
I0619 18:17:05.901141 3754 commands.go:22] not refreshing commands - TTL not hit
I0619 18:17:05.901154 3754 s3fs.go:220] Reading file "s3://us-east-2-il-kops-state-store/us-east-2.k8s.redactedinc.com/backups/etcd/main/control/etcd-cluster-created"
I0619 18:17:05.908400 3754 controller.go:369] spec member_count:3 etcd_version:"3.2.24"
I0619 18:17:05.908454 3754 controller.go:417] mismatched version for peer peer{id:"etcd-a" endpoints:"10.40.42.180:3996" }: want "3.2.24", have "2.2.1"
I0619 18:17:05.908479 3754 controller.go:423] can't do in-place upgrade from "2.2.1" -> "3.2.24"
I0619 18:17:05.908500 3754 controller.go:517] detected that we need to upgrade/downgrade etcd
I0619 18:17:05.908513 3754 controller.go:526] upgrade/downgrade needed, but we don't have sufficient peers
I0619 18:17:08.052250 3754 peers.go:281] connecting to peer "etcd-b" with TLS policy, servername="etcd-manager-server-etcd-b"
W0619 18:17:08.053181 3754 peers.go:325] unable to grpc-ping discovered peer 10.40.66.188:3996: rpc error: code = Unavailable desc = all SubConns are in TransientFailure
I0619 18:17:08.053197 3754 peers.go:347] was not able to connect to peer etcd-b: map[10.40.66.188:3996:true]
W0619 18:17:08.053209 3754 peers.go:215] unexpected error from peer intercommunications: unable to connect to peer etcd-b
I0619 18:17:08.156376 3754 peers.go:281] connecting to peer "etcd-c" with TLS policy, servername="etcd-manager-server-etcd-c"
W0619 18:17:08.157656 3754 peers.go:325] unable to grpc-ping discovered peer 10.40.126.31:3996: rpc error: code = Unavailable desc = all SubConns are in TransientFailure
I0619 18:17:08.157673 3754 peers.go:347] was not able to connect to peer etcd-c: map[10.40.126.31:3996:true]
W0619 18:17:08.157686 3754 peers.go:215] unexpected error from peer intercommunications: unable to connect to peer etcd-c
I0619 18:17:13.056516 3754 peers.go:281] connecting to peer "etcd-b" with TLS policy, servername="etcd-manager-server-etcd-b"
W0619 18:17:13.060307 3754 peers.go:325] unable to grpc-ping discovered peer 10.40.66.188:3996: rpc error: code = Unavailable desc = all SubConns are in TransientFailure
I0619 18:17:13.060331 3754 peers.go:347] was not able to connect to peer etcd-b: map[10.40.66.188:3996:true]
W0619 18:17:13.060343 3754 peers.go:215] unexpected error from peer intercommunications: unable to connect to peer etcd-b
I0619 18:17:13.160261 3754 peers.go:281] connecting to peer "etcd-c" with TLS policy, servername="etcd-manager-server-etcd-c"
W0619 18:17:13.161570 3754 peers.go:325] unable to grpc-ping discovered peer 10.40.126.31:3996: rpc error: code = Unavailable desc = all SubConns are in TransientFailure
I0619 18:17:13.161588 3754 peers.go:347] was not able to connect to peer etcd-c: map[10.40.126.31:3996:true]
W0619 18:17:13.161602 3754 peers.go:215] unexpected error from peer intercommunications: unable to connect to peer etcd-c
I0619 18:17:15.916254 3754 controller.go:173] starting controller iteration
I0619 18:17:15.916299 3754 controller.go:269] I am leader with token "rgD81JOa6aHA3wsBY3qGfQ"
W0619 18:17:15.932160 3754 etcdclusterstate.go:97] unable to find node for member "etcd-c"; using default clientURLs [http://etcd-c.internal.us-east-2.k8s.redactedinc.com:4001]
W0619 18:17:15.978855 3754 etcdclusterstate.go:97] unable to find node for member "etcd-b"; using default clientURLs [http://etcd-b.internal.us-east-2.k8s.redactedinc.com:4001]
I0619 18:17:15.988353 3754 controller.go:276] etcd cluster state: etcdClusterState
members:
{"name":"etcd-c","peerURLs":["http://etcd-c.internal.us-east-2.k8s.redactedinc.com:2380"],"endpoints":["http://etcd-c.internal.us-east-2.k8s.redactedinc.com:4001"],"ID":"50114dd36d2f3dc3"}
{"name":"etcd-a","peerURLs":["https://etcd-a.internal.us-east-2.k8s.redactedinc.com:2380"],"endpoints":["https://etcd-a.internal.us-east-2.k8s.redactedinc.com:4001"],"ID":"c0029d9ef59e42bc"}
{"name":"etcd-b","peerURLs":["http://etcd-b.internal.us-east-2.k8s.redactedinc.com:2380"],"endpoints":["http://etcd-b.internal.us-east-2.k8s.redactedinc.com:4001"],"ID":"e0ad5a0dbcdd8e79"}
peers:
etcdClusterPeerInfo{peer=peer{id:"etcd-a" endpoints:"10.40.42.180:3996" }, info=cluster_name:"etcd" node_configuration:<name:"etcd-a" peer_urls:"https://etcd-a.internal.us-east-2.k8s.redactedinc.com:2380" client_urls:"https://etcd-a.internal.us-east-2.k8s.redactedinc.com:4001" quarantined_client_urls:"https://etcd-a.internal.us-east-2.k8s.redactedinc.com:3994" > etcd_state:<cluster:<desired_cluster_size:3 cluster_token:"etcd-cluster-token-etcd" nodes:<name:"etcd-a" peer_urls:"https://etcd-a.internal.us-east-2.k8s.redactedinc.com:2380" client_urls:"https://0.0.0.0:4001" quarantined_client_urls:"http://0.0.0.0:3994" tls_enabled:true > nodes:<name:"etcd-b" peer_urls:"http://etcd-b.internal.us-east-2.k8s.redactedinc.com:2380" client_urls:"http://0.0.0.0:4001" quarantined_client_urls:"http://0.0.0.0:3994" > nodes:<name:"etcd-c" peer_urls:"http://etcd-c.internal.us-east-2.k8s.redactedinc.com:2380" client_urls:"http://0.0.0.0:4001" quarantined_client_urls:"http://0.0.0.0:3994" > > etcd_version:"2.2.1" > }
I0619 18:17:15.988471 3754 controller.go:277] etcd cluster members: map[50114dd36d2f3dc3:{"name":"etcd-c","peerURLs":["http://etcd-c.internal.us-east-2.k8s.redactedinc.com:2380"],"endpoints":["http://etcd-c.internal.us-east-2.k8s.redactedinc.com:4001"],"ID":"50114dd36d2f3dc3"} c0029d9ef59e42bc:{"name":"etcd-a","peerURLs":["https://etcd-a.internal.us-east-2.k8s.redactedinc.com:2380"],"endpoints":["https://etcd-a.internal.us-east-2.k8s.redactedinc.com:4001"],"ID":"c0029d9ef59e42bc"} e0ad5a0dbcdd8e79:{"name":"etcd-b","peerURLs":["http://etcd-b.internal.us-east-2.k8s.redactedinc.com:2380"],"endpoints":["http://etcd-b.internal.us-east-2.k8s.redactedinc.com:4001"],"ID":"e0ad5a0dbcdd8e79"}]
I0619 18:17:15.988493 3754 controller.go:615] sending member map to all peers: members:<name:"etcd-a" dns:"etcd-a.internal.us-east-2.k8s.redactedinc.com" addresses:"10.40.42.180" >
I0619 18:17:15.988684 3754 etcdserver.go:222] updating hosts: map[10.40.42.180:[etcd-a.internal.us-east-2.k8s.redactedinc.com]]
I0619 18:17:15.988699 3754 hosts.go:84] hosts update: primary=map[10.40.42.180:[etcd-a.internal.us-east-2.k8s.redactedinc.com]], fallbacks=map[etcd-a.internal.us-east-2.k8s.redactedinc.com:[10.40.42.180 10.40.42.180] etcd-b.internal.us-east-2.k8s.redactedinc.com:[10.40.66.188 10.40.66.188] etcd-c.internal.us-east-2.k8s.redactedinc.com:[10.40.126.31 10.40.126.31]], final=map[10.40.126.31:[etcd-c.internal.us-east-2.k8s.redactedinc.com etcd-c.internal.us-east-2.k8s.redactedinc.com] 10.40.42.180:[etcd-a.internal.us-east-2.k8s.redactedinc.com] 10.40.66.188:[etcd-b.internal.us-east-2.k8s.redactedinc.com etcd-b.internal.us-east-2.k8s.redactedinc.com]]
I0619 18:17:15.995143 3754 commands.go:22] not refreshing commands - TTL not hit
I0619 18:17:15.995167 3754 s3fs.go:220] Reading file "s3://us-east-2-il-kops-state-store/us-east-2.k8s.redactedinc.com/backups/etcd/main/control/etcd-cluster-created"
I0619 18:17:16.002575 3754 controller.go:369] spec member_count:3 etcd_version:"3.2.24"
I0619 18:17:16.002630 3754 controller.go:417] mismatched version for peer peer{id:"etcd-a" endpoints:"10.40.42.180:3996" }: want "3.2.24", have "2.2.1"
I0619 18:17:16.002655 3754 controller.go:423] can't do in-place upgrade from "2.2.1" -> "3.2.24"
I0619 18:17:16.002674 3754 controller.go:517] detected that we need to upgrade/downgrade etcd
I0619 18:17:16.002688 3754 controller.go:526] upgrade/downgrade needed, but we don't have sufficient peers
I0619 18:17:18.060468 3754 peers.go:281] connecting to peer "etcd-b" with TLS policy, servername="etcd-manager-server-etcd-b"
W0619 18:17:18.061439 3754 peers.go:325] unable to grpc-ping discovered peer 10.40.66.188:3996: rpc error: code = Unavailable desc = all SubConns are in TransientFailure
I0619 18:17:18.061456 3754 peers.go:347] was not able to connect to peer etcd-b: map[10.40.66.188:3996:true]
W0619 18:17:18.061467 3754 peers.go:215] unexpected error from peer intercommunications: unable to connect to peer etcd-b
I0619 18:17:18.161739 3754 peers.go:281] connecting to peer "etcd-c" with TLS policy, servername="etcd-manager-server-etcd-c"
W0619 18:17:18.164142 3754 peers.go:325] unable to grpc-ping discovered peer 10.40.126.31:3996: rpc error: code = Unavailable desc = all SubConns are in TransientFailure
I0619 18:17:18.164162 3754 peers.go:347] was not able to connect to peer etcd-c: map[10.40.126.31:3996:true]
W0619 18:17:18.164195 3754 peers.go:215] unexpected error from peer intercommunications: unable to connect to peer etcd-c
I0619 18:17:23.061592 3754 peers.go:281] connecting to peer "etcd-b" with TLS policy, servername="etcd-manager-server-etcd-b"
W0619 18:17:23.062497 3754 peers.go:325] unable to grpc-ping discovered peer 10.40.66.188:3996: rpc error: code = Unavailable desc = all SubConns are in TransientFailure
I0619 18:17:23.062513 3754 peers.go:347] was not able to connect to peer etcd-b: map[10.40.66.188:3996:true]
W0619 18:17:23.062524 3754 peers.go:215] unexpected error from peer intercommunications: unable to connect to peer etcd-b
I0619 18:17:23.165198 3754 peers.go:281] connecting to peer "etcd-c" with TLS policy, servername="etcd-manager-server-etcd-c"
W0619 18:17:23.166648 3754 peers.go:325] unable to grpc-ping discovered peer 10.40.126.31:3996: rpc error: code = Unavailable desc = all SubConns are in TransientFailure
I0619 18:17:23.166665 3754 peers.go:347] was not able to connect to peer etcd-c: map[10.40.126.31:3996:true]
W0619 18:17:23.166686 3754 peers.go:215] unexpected error from peer intercommunications: unable to connect to peer etcd-c
I0619 18:17:26.003897 3754 controller.go:173] starting controller iteration
I0619 18:17:26.003934 3754 controller.go:269] I am leader with token "rgD81JOa6aHA3wsBY3qGfQ"
W0619 18:17:26.024674 3754 etcdclusterstate.go:97] unable to find node for member "etcd-c"; using default clientURLs [http://etcd-c.internal.us-east-2.k8s.redactedinc.com:4001]
W0619 18:17:26.040406 3754 etcdclusterstate.go:97] unable to find node for member "etcd-b"; using default clientURLs [http://etcd-b.internal.us-east-2.k8s.redactedinc.com:4001]
I0619 18:17:26.046479 3754 controller.go:276] etcd cluster state: etcdClusterState
members:
{"name":"etcd-c","peerURLs":["http://etcd-c.internal.us-east-2.k8s.redactedinc.com:2380"],"endpoints":["http://etcd-c.internal.us-east-2.k8s.redactedinc.com:4001"],"ID":"50114dd36d2f3dc3"}
{"name":"etcd-a","peerURLs":["https://etcd-a.internal.us-east-2.k8s.redactedinc.com:2380"],"endpoints":["https://etcd-a.internal.us-east-2.k8s.redactedinc.com:4001"],"ID":"c0029d9ef59e42bc"}
{"name":"etcd-b","peerURLs":["http://etcd-b.internal.us-east-2.k8s.redactedinc.com:2380"],"endpoints":["http://etcd-b.internal.us-east-2.k8s.redactedinc.com:4001"],"ID":"e0ad5a0dbcdd8e79"}
peers:
etcdClusterPeerInfo{peer=peer{id:"etcd-a" endpoints:"10.40.42.180:3996" }, info=cluster_name:"etcd" node_configuration:<name:"etcd-a" peer_urls:"https://etcd-a.internal.us-east-2.k8s.redactedinc.com:2380" client_urls:"https://etcd-a.internal.us-east-2.k8s.redactedinc.com:4001" quarantined_client_urls:"https://etcd-a.internal.us-east-2.k8s.redactedinc.com:3994" > etcd_state:<cluster:<desired_cluster_size:3 cluster_token:"etcd-cluster-token-etcd" nodes:<name:"etcd-a" peer_urls:"https://etcd-a.internal.us-east-2.k8s.redactedinc.com:2380" client_urls:"https://0.0.0.0:4001" quarantined_client_urls:"http://0.0.0.0:3994" tls_enabled:true > nodes:<name:"etcd-b" peer_urls:"http://etcd-b.internal.us-east-2.k8s.redactedinc.com:2380" client_urls:"http://0.0.0.0:4001" quarantined_client_urls:"http://0.0.0.0:3994" > nodes:<name:"etcd-c" peer_urls:"http://etcd-c.internal.us-east-2.k8s.redactedinc.com:2380" client_urls:"http://0.0.0.0:4001" quarantined_client_urls:"http://0.0.0.0:3994" > > etcd_version:"2.2.1" > }
I0619 18:17:26.046583 3754 controller.go:277] etcd cluster members: map[50114dd36d2f3dc3:{"name":"etcd-c","peerURLs":["http://etcd-c.internal.us-east-2.k8s.redactedinc.com:2380"],"endpoints":["http://etcd-c.internal.us-east-2.k8s.redactedinc.com:4001"],"ID":"50114dd36d2f3dc3"} c0029d9ef59e42bc:{"name":"etcd-a","peerURLs":["https://etcd-a.internal.us-east-2.k8s.redactedinc.com:2380"],"endpoints":["https://etcd-a.internal.us-east-2.k8s.redactedinc.com:4001"],"ID":"c0029d9ef59e42bc"} e0ad5a0dbcdd8e79:{"name":"etcd-b","peerURLs":["http://etcd-b.internal.us-east-2.k8s.redactedinc.com:2380"],"endpoints":["http://etcd-b.internal.us-east-2.k8s.redactedinc.com:4001"],"ID":"e0ad5a0dbcdd8e79"}]
I0619 18:17:26.046605 3754 controller.go:615] sending member map to all peers: members:<name:"etcd-a" dns:"etcd-a.internal.us-east-2.k8s.redactedinc.com" addresses:"10.40.42.180" >
I0619 18:17:26.046776 3754 etcdserver.go:222] updating hosts: map[10.40.42.180:[etcd-a.internal.us-east-2.k8s.redactedinc.com]]
I0619 18:17:26.046790 3754 hosts.go:84] hosts update: primary=map[10.40.42.180:[etcd-a.internal.us-east-2.k8s.redactedinc.com]], fallbacks=map[etcd-a.internal.us-east-2.k8s.redactedinc.com:[10.40.42.180 10.40.42.180] etcd-b.internal.us-east-2.k8s.redactedinc.com:[10.40.66.188 10.40.66.188] etcd-c.internal.us-east-2.k8s.redactedinc.com:[10.40.126.31 10.40.126.31]], final=map[10.40.126.31:[etcd-c.internal.us-east-2.k8s.redactedinc.com etcd-c.internal.us-east-2.k8s.redactedinc.com] 10.40.42.180:[etcd-a.internal.us-east-2.k8s.redactedinc.com] 10.40.66.188:[etcd-b.internal.us-east-2.k8s.redactedinc.com etcd-b.internal.us-east-2.k8s.redactedinc.com]]
I0619 18:17:26.051305 3754 commands.go:22] not refreshing commands - TTL not hit
I0619 18:17:26.051542 3754 s3fs.go:220] Reading file "s3://us-east-2-il-kops-state-store/us-east-2.k8s.redactedinc.com/backups/etcd/main/control/etcd-cluster-created"
I0619 18:17:26.059599 3754 controller.go:369] spec member_count:3 etcd_version:"3.2.24"
I0619 18:17:26.059807 3754 controller.go:417] mismatched version for peer peer{id:"etcd-a" endpoints:"10.40.42.180:3996" }: want "3.2.24", have "2.2.1"
I0619 18:17:26.059947 3754 controller.go:423] can't do in-place upgrade from "2.2.1" -> "3.2.24"
I0619 18:17:26.060067 3754 controller.go:517] detected that we need to upgrade/downgrade etcd
I0619 18:17:26.060157 3754 controller.go:526] upgrade/downgrade needed, but we don't have sufficient peers
I0619 18:17:28.064265 3754 peers.go:281] connecting to peer "etcd-b" with TLS policy, servername="etcd-manager-server-etcd-b"
W0619 18:17:28.068313 3754 peers.go:325] unable to grpc-ping discovered peer 10.40.66.188:3996: rpc error: code = Unavailable desc = all SubConns are in TransientFailure
I0619 18:17:28.068340 3754 peers.go:347] was not able to connect to peer etcd-b: map[10.40.66.188:3996:true]
W0619 18:17:28.068354 3754 peers.go:215] unexpected error from peer intercommunications: unable to connect to peer etcd-b
I0619 18:17:28.168276 3754 peers.go:281] connecting to peer "etcd-c" with TLS policy, servername="etcd-manager-server-etcd-c"
W0619 18:17:28.169593 3754 peers.go:325] unable to grpc-ping discovered peer 10.40.126.31:3996: rpc error: code = Unavailable desc = all SubConns are in TransientFailure
I0619 18:17:28.169610 3754 peers.go:347] was not able to connect to peer etcd-c: map[10.40.126.31:3996:true]
W0619 18:17:28.169621 3754 peers.go:215] unexpected error from peer intercommunications: unable to connect to peer etcd-c
I ran kops rolling-update cluster --yes 2 more times to force the other masters to be recreated.
The API server was unavailable between the final master being removed and the new one starting/etcd resyncing, which took approx. 15 minutes (as expected, but a heads up to anyone who finds this thread).
Not sure if a bug or a feature request, but it feels like kops should be able to detect this problem between upgrades and handle/advise accordingly.
Hi @richstokes
I believe that the Kops 1.12 Release Notes mention the disruption to masters due to the etcd migration. It suggests running kops rolling-update cluster --cloudonly --instance-group-roles master --master-interval=1s --yes, terminating all 3 masters and forcing them to be recreated all at once to minimize the downtime. If you have any suggestions on how to improve the visibility of this information or the overall UX of the k8s 1.11 -> 1.12 upgrade it would be much appreciated!
Thanks @rifelpet - there's a lesson in here for me about checking the release notes. Previous kops upgrades had been so smooth that it didn't occur to me. (And the benefit of upgrading a test environment first)
The only real UX feedback is that if kops plans to apply a "breaking change" like this (even though temporary and intentional), it would be helpful to display this at the time of running the cluster upgrade commands.
Thanks
Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.
If this issue is safe to close now please do so with /close.
Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle stale
Stale issues rot after 30d of inactivity.
Mark the issue as fresh with /remove-lifecycle rotten.
Rotten issues close after an additional 30d of inactivity.
If this issue is safe to close now please do so with /close.
Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle rotten
Rotten issues close after 30d of inactivity.
Reopen the issue with /reopen.
Mark the issue as fresh with /remove-lifecycle rotten.
Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/close
@fejta-bot: Closing this issue.
In response to this:
Rotten issues close after 30d of inactivity.
Reopen the issue with/reopen.
Mark the issue as fresh with/remove-lifecycle rotten.Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/close
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.
Most helpful comment
Hi @richstokes
I believe that the Kops 1.12 Release Notes mention the disruption to masters due to the etcd migration. It suggests running
kops rolling-update cluster --cloudonly --instance-group-roles master --master-interval=1s --yes, terminating all 3 masters and forcing them to be recreated all at once to minimize the downtime. If you have any suggestions on how to improve the visibility of this information or the overall UX of the k8s 1.11 -> 1.12 upgrade it would be much appreciated!