Kops: Master unable to join cluster after kops upgrade (k8s: 1.11.9 > 1.12.8)

Created on 19 Jun 2019  ·  8Comments  ·  Source: kubernetes/kops

kops version
Version 1.12.1

During kops upgrade:

k8s: 1.11.9 > 1.12.8
image: kope.io/k8s-1.11-debian-stretch-amd64-hvm-ebs-2018-08-17 -> kope.io/k8s-1.12-debian-stretch-amd64-hvm-ebs-2019-05-13

Running on AWS, after running kops upgrade cluster --yes

kops rolling-update cluster --yes
NAME            STATUS      NEEDUPDATE  READY   MIN MAX NODES
bastions        NeedsUpdate 1       0   1   1   0
master-us-east-2a   NeedsUpdate 1       0   1   1   1
master-us-east-2b   NeedsUpdate 1       0   1   1   1
master-us-east-2c   NeedsUpdate 1       0   1   1   1
nodes           NeedsUpdate 2       0   1   2   2
I0619 10:13:54.722629   25392 instancegroups.go:301] Stopping instance "i-0133631b3142fe6e2", in group "bastions.us-east-2.k8s.redactedinc.com" (this may take a while).
I0619 10:13:55.290878   25392 instancegroups.go:198] waiting for 5m0s after terminating instance
I0619 10:18:55.282281   25392 instancegroups.go:202] Deleted a bastion instance, i-0133631b3142fe6e2, and continuing with rolling-update.
I0619 10:18:58.151787   25392 instancegroups.go:165] Draining the node: "ip-10-40-63-102.us-east-2.compute.internal".
node/ip-10-40-63-102.us-east-2.compute.internal cordoned
node/ip-10-40-63-102.us-east-2.compute.internal cordoned
WARNING: Ignoring DaemonSet-managed pods: lacework-agent-snxnp, weave-net-n95tf; Deleting pods with local storage: kubernetes-dashboard-6c664cf6c5-6pvrv
pod/dns-controller-779bbdc6dd-shpkr evicted
pod/kubernetes-dashboard-6c664cf6c5-6pvrv evicted
I0619 10:19:04.649307   25392 instancegroups.go:358] Waiting for 1m30s for pods to stabilize after draining.
I0619 10:20:34.647492   25392 instancegroups.go:185] deleting node "ip-10-40-63-102.us-east-2.compute.internal" from kubernetes
I0619 10:20:34.762982   25392 instancegroups.go:299] Stopping instance "i-0b0b71a57df7da5c8", node "ip-10-40-63-102.us-east-2.compute.internal", in group "master-us-east-2a.masters.us-east-2.k8s.redactedinc.com" (this may take a while).
I0619 10:20:35.595988   25392 instancegroups.go:198] waiting for 5m0s after terminating instance
I0619 10:25:35.588130   25392 instancegroups.go:209] Validating the cluster.
I0619 10:25:38.189710   25392 instancegroups.go:273] Cluster did not pass validation, will try again in "30s" until duration "5m0s" expires: machine "i-0d31990e514939c2c" has not yet joined cluster.
I0619 10:26:10.016482   25392 instancegroups.go:273] Cluster did not pass validation, will try again in "30s" until duration "5m0s" expires: machine "i-0d31990e514939c2c" has not yet joined cluster.
I0619 10:26:40.289218   25392 instancegroups.go:273] Cluster did not pass validation, will try again in "30s" until duration "5m0s" expires: machine "i-0d31990e514939c2c" has not yet joined cluster.
I0619 10:27:10.197925   25392 instancegroups.go:273] Cluster did not pass validation, will try again in "30s" until duration "5m0s" expires: machine "i-0d31990e514939c2c" has not yet joined cluster.
I0619 10:27:40.076651   25392 instancegroups.go:273] Cluster did not pass validation, will try again in "30s" until duration "5m0s" expires: machine "i-0d31990e514939c2c" has not yet joined cluster.
I0619 10:28:10.179322   25392 instancegroups.go:273] Cluster did not pass validation, will try again in "30s" until duration "5m0s" expires: machine "i-0d31990e514939c2c" has not yet joined cluster.
I0619 10:28:39.913678   25392 instancegroups.go:273] Cluster did not pass validation, will try again in "30s" until duration "5m0s" expires: machine "i-0d31990e514939c2c" has not yet joined cluster.
I0619 10:29:10.133790   25392 instancegroups.go:273] Cluster did not pass validation, will try again in "30s" until duration "5m0s" expires: machine "i-0d31990e514939c2c" has not yet joined cluster.
I0619 10:29:40.084387   25392 instancegroups.go:273] Cluster did not pass validation, will try again in "30s" until duration "5m0s" expires: machine "i-0d31990e514939c2c" has not yet joined cluster.
I0619 10:30:10.499483   25392 instancegroups.go:273] Cluster did not pass validation, will try again in "30s" until duration "5m0s" expires: machine "i-0d31990e514939c2c" has not yet joined cluster.
E0619 10:30:38.180754   25392 instancegroups.go:214] Cluster did not validate within 5m0s

master not healthy after update, stopping rolling-update: "error validating cluster after removing a node: cluster did not validate within a duration of \"5m0s\""

A new master node was created on AWS and boots.

After SSHing into the master I see:

systemctl status kubelet
● kubelet.service - Kubernetes Kubelet Server
   Loaded: loaded (/lib/systemd/system/kubelet.service; static; vendor preset: enabled)
   Active: active (running) since Wed 2019-06-19 17:54:28 UTC; 14min ago
     Docs: https://github.com/kubernetes/kubernetes
 Main PID: 2607 (kubelet)
    Tasks: 17 (limit: 4915)
   Memory: 55.8M
      CPU: 17.193s
   CGroup: /system.slice/kubelet.service
           └─2607 /usr/local/bin/kubelet --allow-privileged=true --anonymous-auth=false --cgroup-root=/ --client-ca-file=/srv/kubernetes/ca.crt --cloud-provider=aws --cluster-dns=100.64.0.10 --cluster-d
omain=cluster.local --enable-debugging-handlers=true --eviction-hard=memory.available<100Mi,nodefs.available<10%,nodefs.inodesFree<5%,imagefs.available<10%,imagefs.inodesFree<5% --feature-gates=Experime
ntalCriticalPodAnnotation=true --hostname-override=ip-10-40-42-180.us-east-2.compute.internal --kubeconfig=/var/lib/kubelet/kubeconfig --network-plugin=cni --node-labels=kops.k8s.io/instancegroup=master
-us-east-2a,kubernetes.io/role=master,node-role.kubernetes.io/master= --non-masquerade-cidr=100.64.0.0/10 --pod-infra-container-image=k8s.gcr.io/pause-amd64:3.0 --pod-manifest-path=/etc/kubernetes/manif
ests --register-schedulable=true --register-with-taints=node-role.kubernetes.io/master=:NoSchedule --v=2 --cni-bin-dir=/opt/cni/bin/ --cni-conf-dir=/etc/cni/net.d/

Jun 19 18:08:56 ip-10-40-42-180 kubelet[2607]: E0619 18:08:56.513836    2607 kubelet.go:2236] node "ip-10-40-42-180.us-east-2.compute.internal" not found
Jun 19 18:08:56 ip-10-40-42-180 kubelet[2607]: E0619 18:08:56.614643    2607 kubelet.go:2236] node "ip-10-40-42-180.us-east-2.compute.internal" not found
Jun 19 18:08:56 ip-10-40-42-180 kubelet[2607]: E0619 18:08:56.715408    2607 kubelet.go:2236] node "ip-10-40-42-180.us-east-2.compute.internal" not found
Jun 19 18:08:56 ip-10-40-42-180 kubelet[2607]: E0619 18:08:56.816111    2607 kubelet.go:2236] node "ip-10-40-42-180.us-east-2.compute.internal" not found
Jun 19 18:08:56 ip-10-40-42-180 kubelet[2607]: E0619 18:08:56.916789    2607 kubelet.go:2236] node "ip-10-40-42-180.us-east-2.compute.internal" not found
Jun 19 18:08:56 ip-10-40-42-180 kubelet[2607]: I0619 18:08:56.960509    2607 prober.go:111] Liveness probe for "kube-apiserver-ip-10-40-42-180.us-east-2.compute.internal_kube-system(958e40c338777b454bee
6c4539a87db3):kube-apiserver" failed (failure): Get http://127.0.0.1:8080/healthz: net/http: request canceled (Client.Timeout exceeded while awaiting headers)
Jun 19 18:08:57 ip-10-40-42-180 kubelet[2607]: E0619 18:08:57.017549    2607 kubelet.go:2236] node "ip-10-40-42-180.us-east-2.compute.internal" not found
Jun 19 18:08:57 ip-10-40-42-180 kubelet[2607]: E0619 18:08:57.118275    2607 kubelet.go:2236] node "ip-10-40-42-180.us-east-2.compute.internal" not found
Jun 19 18:08:57 ip-10-40-42-180 kubelet[2607]: E0619 18:08:57.219166    2607 kubelet.go:2236] node "ip-10-40-42-180.us-east-2.compute.internal" not found
Jun 19 18:08:57 ip-10-40-42-180 kubelet[2607]: E0619 18:08:57.319979    2607 kubelet.go:2236] node "ip-10-40-42-180.us-east-2.compute.internal" not found

Cluster manifest:

apiVersion: kops/v1alpha2
kind: Cluster
metadata:
  creationTimestamp: 2018-09-18T09:31:13Z
  name: us-east-2.k8s.redactedinc.com
spec:
  api:
    loadBalancer:
      type: Public
  authorization:
    rbac: {}
  channel: stable
  cloudLabels:
    Dept: ops
  cloudProvider: aws
  configBase: s3://us-east-2-il-kops-state-store/us-east-2.k8s.redactedinc.com
  etcdClusters:
  - etcdMembers:
    - instanceGroup: master-us-east-2a
      name: a
    - instanceGroup: master-us-east-2b
      name: b
    - instanceGroup: master-us-east-2c
      name: c
    name: main
  - etcdMembers:
    - instanceGroup: master-us-east-2a
      name: a
    - instanceGroup: master-us-east-2b
      name: b
    - instanceGroup: master-us-east-2c
      name: c
    name: events
  iam:
    allowContainerRegistry: true
    legacy: false
  kubelet:
    anonymousAuth: false
  kubernetesApiAccess:
  - 0.0.0.0/0
  kubernetesVersion: 1.12.8
  masterInternalName: api.internal.us-east-2.k8s.redactedinc.com
  masterPublicName: api.us-east-2.k8s.redactedinc.com
  networkCIDR: 10.40.0.0/16
  networkID: vpc-08bcbfdee2514b305
  networking:
    weave:
      mtu: 8912
  nonMasqueradeCIDR: 100.64.0.0/10
  sshAccess:
  - 0.0.0.0/0
  subnets:
  - cidr: 10.40.32.0/19
    name: us-east-2a
    type: Private
    zone: us-east-2a
  - cidr: 10.40.64.0/19
    name: us-east-2b
    type: Private
    zone: us-east-2b
  - cidr: 10.40.96.0/19
    name: us-east-2c
    type: Private
    zone: us-east-2c
  - cidr: 10.40.0.0/22
    name: utility-us-east-2a
    type: Utility
    zone: us-east-2a
  - cidr: 10.40.4.0/22
    name: utility-us-east-2b
    type: Utility
    zone: us-east-2b
  - cidr: 10.40.8.0/22
    name: utility-us-east-2c
    type: Utility
    zone: us-east-2c
  topology:
    bastion:
      bastionPublicName: bastion.us-east-2.k8s.redactedinc.com
    dns:
      type: Public
    masters: private
    nodes: private

---

apiVersion: kops/v1alpha2
kind: InstanceGroup
metadata:
  creationTimestamp: 2018-09-18T09:31:14Z
  labels:
    kops.k8s.io/cluster: us-east-2.k8s.redactedinc.com
  name: bastions
spec:
  image: kope.io/k8s-1.12-debian-stretch-amd64-hvm-ebs-2019-05-13
  machineType: t2.micro
  maxSize: 1
  minSize: 1
  nodeLabels:
    kops.k8s.io/instancegroup: bastions
  role: Bastion
  subnets:
  - utility-us-east-2a
  - utility-us-east-2b
  - utility-us-east-2c

---

apiVersion: kops/v1alpha2
kind: InstanceGroup
metadata:
  creationTimestamp: 2018-09-18T09:31:13Z
  labels:
    kops.k8s.io/cluster: us-east-2.k8s.redactedinc.com
  name: master-us-east-2a
spec:
  image: kope.io/k8s-1.12-debian-stretch-amd64-hvm-ebs-2019-05-13
  machineType: t3.medium
  maxSize: 1
  minSize: 1
  nodeLabels:
    kops.k8s.io/instancegroup: master-us-east-2a
  role: Master
  subnets:
  - us-east-2a

---

apiVersion: kops/v1alpha2
kind: InstanceGroup
metadata:
  creationTimestamp: 2018-09-18T09:31:13Z
  labels:
    kops.k8s.io/cluster: us-east-2.k8s.redactedinc.com
  name: master-us-east-2b
spec:
  image: kope.io/k8s-1.12-debian-stretch-amd64-hvm-ebs-2019-05-13
  machineType: t3.medium
  maxSize: 1
  minSize: 1
  nodeLabels:
    kops.k8s.io/instancegroup: master-us-east-2b
  role: Master
  subnets:
  - us-east-2b

---

apiVersion: kops/v1alpha2
kind: InstanceGroup
metadata:
  creationTimestamp: 2018-09-18T09:31:13Z
  labels:
    kops.k8s.io/cluster: us-east-2.k8s.redactedinc.com
  name: master-us-east-2c
spec:
  image: kope.io/k8s-1.12-debian-stretch-amd64-hvm-ebs-2019-05-13
  machineType: t3.medium
  maxSize: 1
  minSize: 1
  nodeLabels:
    kops.k8s.io/instancegroup: master-us-east-2c
  role: Master
  subnets:
  - us-east-2c

---

apiVersion: kops/v1alpha2
kind: InstanceGroup
metadata:
  creationTimestamp: 2018-09-18T09:31:14Z
  labels:
    kops.k8s.io/cluster: us-east-2.k8s.redactedinc.com
  name: nodes
spec:
  image: kope.io/k8s-1.12-debian-stretch-amd64-hvm-ebs-2019-05-13
  machineType: r5a.large
  maxSize: 2
  minSize: 1
  nodeLabels:
    kops.k8s.io/instancegroup: nodes
  role: Node
  subnets:
  - us-east-2a

Please let me know if there is any other info I can get. Pretty concerning as now our cluster is in a bad state with a missing master. Any suggestions greatly welcomed!

lifecyclrotten

Most helpful comment

Hi @richstokes

I believe that the Kops 1.12 Release Notes mention the disruption to masters due to the etcd migration. It suggests running kops rolling-update cluster --cloudonly --instance-group-roles master --master-interval=1s --yes, terminating all 3 masters and forcing them to be recreated all at once to minimize the downtime. If you have any suggestions on how to improve the visibility of this information or the overall UX of the k8s 1.11 -> 1.12 upgrade it would be much appreciated!

All 8 comments

From /var/log/etcd.log -- looks like maybe an etcd mismatch? Shouldn't kops be aware of this/handle this as part of the upgrade command?

W0619 18:16:55.567966    3754 etcdclusterstate.go:97] unable to find node for member "etcd-c"; using default clientURLs [http://etcd-c.internal.us-east-2.k8s.redactedinc.com:4001]
W0619 18:16:55.588756    3754 etcdclusterstate.go:97] unable to find node for member "etcd-b"; using default clientURLs [http://etcd-b.internal.us-east-2.k8s.redactedinc.com:4001]
I0619 18:16:55.590672    3754 controller.go:276] etcd cluster state: etcdClusterState
  members:
    {"name":"etcd-c","peerURLs":["http://etcd-c.internal.us-east-2.k8s.redactedinc.com:2380"],"endpoints":["http://etcd-c.internal.us-east-2.k8s.redactedinc.com:4001"],"ID":"50114dd36d2f3dc3"}
    {"name":"etcd-a","peerURLs":["https://etcd-a.internal.us-east-2.k8s.redactedinc.com:2380"],"endpoints":["https://etcd-a.internal.us-east-2.k8s.redactedinc.com:4001"],"ID":"c0029d9ef59e42bc"}
    {"name":"etcd-b","peerURLs":["http://etcd-b.internal.us-east-2.k8s.redactedinc.com:2380"],"endpoints":["http://etcd-b.internal.us-east-2.k8s.redactedinc.com:4001"],"ID":"e0ad5a0dbcdd8e79"}
  peers:
    etcdClusterPeerInfo{peer=peer{id:"etcd-a" endpoints:"10.40.42.180:3996" }, info=cluster_name:"etcd" node_configuration:<name:"etcd-a" peer_urls:"https://etcd-a.internal.us-east-2.k8s.redactedinc.com:2380" client_urls:"https://etcd-a.internal.us-east-2.k8s.redactedinc.com:4001" quarantined_client_urls:"https://etcd-a.internal.us-east-2.k8s.redactedinc.com:3994" > etcd_state:<cluster:<desired_cluster_size:3 cluster_token:"etcd-cluster-token-etcd" nodes:<name:"etcd-a" peer_urls:"https://etcd-a.internal.us-east-2.k8s.redactedinc.com:2380" client_urls:"https://0.0.0.0:4001" quarantined_client_urls:"http://0.0.0.0:3994" tls_enabled:true > nodes:<name:"etcd-b" peer_urls:"http://etcd-b.internal.us-east-2.k8s.redactedinc.com:2380" client_urls:"http://0.0.0.0:4001" quarantined_client_urls:"http://0.0.0.0:3994" > nodes:<name:"etcd-c" peer_urls:"http://etcd-c.internal.us-east-2.k8s.redactedinc.com:2380" client_urls:"http://0.0.0.0:4001" quarantined_client_urls:"http://0.0.0.0:3994" > > etcd_version:"2.2.1" > }
I0619 18:16:55.590782    3754 controller.go:277] etcd cluster members: map[50114dd36d2f3dc3:{"name":"etcd-c","peerURLs":["http://etcd-c.internal.us-east-2.k8s.redactedinc.com:2380"],"endpoints":["http://etcd-c.internal.us-east-2.k8s.redactedinc.com:4001"],"ID":"50114dd36d2f3dc3"} c0029d9ef59e42bc:{"name":"etcd-a","peerURLs":["https://etcd-a.internal.us-east-2.k8s.redactedinc.com:2380"],"endpoints":["https://etcd-a.internal.us-east-2.k8s.redactedinc.com:4001"],"ID":"c0029d9ef59e42bc"} e0ad5a0dbcdd8e79:{"name":"etcd-b","peerURLs":["http://etcd-b.internal.us-east-2.k8s.redactedinc.com:2380"],"endpoints":["http://etcd-b.internal.us-east-2.k8s.redactedinc.com:4001"],"ID":"e0ad5a0dbcdd8e79"}]
I0619 18:16:55.590804    3754 controller.go:615] sending member map to all peers: members:<name:"etcd-a" dns:"etcd-a.internal.us-east-2.k8s.redactedinc.com" addresses:"10.40.42.180" > 
I0619 18:16:55.590979    3754 etcdserver.go:222] updating hosts: map[10.40.42.180:[etcd-a.internal.us-east-2.k8s.redactedinc.com]]
I0619 18:16:55.590993    3754 hosts.go:84] hosts update: primary=map[10.40.42.180:[etcd-a.internal.us-east-2.k8s.redactedinc.com]], fallbacks=map[etcd-a.internal.us-east-2.k8s.redactedinc.com:[10.40.42.180 10.40.42.180] etcd-b.internal.us-east-2.k8s.redactedinc.com:[10.40.66.188 10.40.66.188] etcd-c.internal.us-east-2.k8s.redactedinc.com:[10.40.126.31 10.40.126.31]], final=map[10.40.126.31:[etcd-c.internal.us-east-2.k8s.redactedinc.com etcd-c.internal.us-east-2.k8s.redactedinc.com] 10.40.42.180:[etcd-a.internal.us-east-2.k8s.redactedinc.com] 10.40.66.188:[etcd-b.internal.us-east-2.k8s.redactedinc.com etcd-b.internal.us-east-2.k8s.redactedinc.com]]
I0619 18:16:55.600464    3754 commands.go:22] not refreshing commands - TTL not hit
I0619 18:16:55.600490    3754 s3fs.go:220] Reading file "s3://us-east-2-il-kops-state-store/us-east-2.k8s.redactedinc.com/backups/etcd/main/control/etcd-cluster-created"
I0619 18:16:55.609470    3754 controller.go:369] spec member_count:3 etcd_version:"3.2.24" 
I0619 18:16:55.609531    3754 controller.go:417] mismatched version for peer peer{id:"etcd-a" endpoints:"10.40.42.180:3996" }: want "3.2.24", have "2.2.1"
I0619 18:16:55.609556    3754 controller.go:423] can't do in-place upgrade from "2.2.1" -> "3.2.24"
I0619 18:16:55.609576    3754 controller.go:517] detected that we need to upgrade/downgrade etcd
I0619 18:16:55.609590    3754 controller.go:526] upgrade/downgrade needed, but we don't have sufficient peers
I0619 18:16:58.043759    3754 peers.go:281] connecting to peer "etcd-b" with TLS policy, servername="etcd-manager-server-etcd-b"
W0619 18:16:58.048338    3754 peers.go:325] unable to grpc-ping discovered peer 10.40.66.188:3996: rpc error: code = Unavailable desc = all SubConns are in TransientFailure
I0619 18:16:58.048367    3754 peers.go:347] was not able to connect to peer etcd-b: map[10.40.66.188:3996:true]
W0619 18:16:58.048381    3754 peers.go:215] unexpected error from peer intercommunications: unable to connect to peer etcd-b
I0619 18:16:58.147534    3754 peers.go:281] connecting to peer "etcd-c" with TLS policy, servername="etcd-manager-server-etcd-c"
W0619 18:16:58.152316    3754 peers.go:325] unable to grpc-ping discovered peer 10.40.126.31:3996: rpc error: code = Unavailable desc = all SubConns are in TransientFailure
I0619 18:16:58.152345    3754 peers.go:347] was not able to connect to peer etcd-c: map[10.40.126.31:3996:true]
W0619 18:16:58.152360    3754 peers.go:215] unexpected error from peer intercommunications: unable to connect to peer etcd-c
I0619 18:17:03.048498    3754 peers.go:281] connecting to peer "etcd-b" with TLS policy, servername="etcd-manager-server-etcd-b"
W0619 18:17:03.049586    3754 peers.go:325] unable to grpc-ping discovered peer 10.40.66.188:3996: rpc error: code = Unavailable desc = all SubConns are in TransientFailure
I0619 18:17:03.049603    3754 peers.go:347] was not able to connect to peer etcd-b: map[10.40.66.188:3996:true]
W0619 18:17:03.049617    3754 peers.go:215] unexpected error from peer intercommunications: unable to connect to peer etcd-b
I0619 18:17:03.152485    3754 peers.go:281] connecting to peer "etcd-c" with TLS policy, servername="etcd-manager-server-etcd-c"
W0619 18:17:03.153952    3754 peers.go:325] unable to grpc-ping discovered peer 10.40.126.31:3996: rpc error: code = Unavailable desc = all SubConns are in TransientFailure
I0619 18:17:03.153973    3754 peers.go:347] was not able to connect to peer etcd-c: map[10.40.126.31:3996:true]
W0619 18:17:03.156241    3754 peers.go:215] unexpected error from peer intercommunications: unable to connect to peer etcd-c
I0619 18:17:05.450434    3754 volumes.go:85] AWS API Request: ec2/DescribeVolumes
I0619 18:17:05.543514    3754 volumes.go:85] AWS API Request: ec2/DescribeInstances
I0619 18:17:05.586794    3754 hosts.go:84] hosts update: primary=map[10.40.42.180:[etcd-a.internal.us-east-2.k8s.redactedinc.com]], fallbacks=map[etcd-a.internal.us-east-2.k8s.redactedinc.com:[10.40.42.180 10.40.42.180] etcd-b.internal.us-east-2.k8s.redactedinc.com:[10.40.66.188 10.40.66.188] etcd-c.internal.us-east-2.k8s.redactedinc.com:[10.40.126.31 10.40.126.31]], final=map[10.40.126.31:[etcd-c.internal.us-east-2.k8s.redactedinc.com etcd-c.internal.us-east-2.k8s.redactedinc.com] 10.40.42.180:[etcd-a.internal.us-east-2.k8s.redactedinc.com] 10.40.66.188:[etcd-b.internal.us-east-2.k8s.redactedinc.com etcd-b.internal.us-east-2.k8s.redactedinc.com]]
I0619 18:17:05.616264    3754 controller.go:173] starting controller iteration
I0619 18:17:05.616303    3754 controller.go:269] I am leader with token "rgD81JOa6aHA3wsBY3qGfQ"
W0619 18:17:05.889836    3754 etcdclusterstate.go:97] unable to find node for member "etcd-b"; using default clientURLs [http://etcd-b.internal.us-east-2.k8s.redactedinc.com:4001]
W0619 18:17:05.896361    3754 etcdclusterstate.go:97] unable to find node for member "etcd-c"; using default clientURLs [http://etcd-c.internal.us-east-2.k8s.redactedinc.com:4001]
I0619 18:17:05.900282    3754 controller.go:276] etcd cluster state: etcdClusterState
  members:
    {"name":"etcd-c","peerURLs":["http://etcd-c.internal.us-east-2.k8s.redactedinc.com:2380"],"endpoints":["http://etcd-c.internal.us-east-2.k8s.redactedinc.com:4001"],"ID":"50114dd36d2f3dc3"}
    {"name":"etcd-a","peerURLs":["https://etcd-a.internal.us-east-2.k8s.redactedinc.com:2380"],"endpoints":["https://etcd-a.internal.us-east-2.k8s.redactedinc.com:4001"],"ID":"c0029d9ef59e42bc"}
    {"name":"etcd-b","peerURLs":["http://etcd-b.internal.us-east-2.k8s.redactedinc.com:2380"],"endpoints":["http://etcd-b.internal.us-east-2.k8s.redactedinc.com:4001"],"ID":"e0ad5a0dbcdd8e79"}
  peers:
    etcdClusterPeerInfo{peer=peer{id:"etcd-a" endpoints:"10.40.42.180:3996" }, info=cluster_name:"etcd" node_configuration:<name:"etcd-a" peer_urls:"https://etcd-a.internal.us-east-2.k8s.redactedinc.com:2380" client_urls:"https://etcd-a.internal.us-east-2.k8s.redactedinc.com:4001" quarantined_client_urls:"https://etcd-a.internal.us-east-2.k8s.redactedinc.com:3994" > etcd_state:<cluster:<desired_cluster_size:3 cluster_token:"etcd-cluster-token-etcd" nodes:<name:"etcd-a" peer_urls:"https://etcd-a.internal.us-east-2.k8s.redactedinc.com:2380" client_urls:"https://0.0.0.0:4001" quarantined_client_urls:"http://0.0.0.0:3994" tls_enabled:true > nodes:<name:"etcd-b" peer_urls:"http://etcd-b.internal.us-east-2.k8s.redactedinc.com:2380" client_urls:"http://0.0.0.0:4001" quarantined_client_urls:"http://0.0.0.0:3994" > nodes:<name:"etcd-c" peer_urls:"http://etcd-c.internal.us-east-2.k8s.redactedinc.com:2380" client_urls:"http://0.0.0.0:4001" quarantined_client_urls:"http://0.0.0.0:3994" > > etcd_version:"2.2.1" > }
I0619 18:17:05.900398    3754 controller.go:277] etcd cluster members: map[50114dd36d2f3dc3:{"name":"etcd-c","peerURLs":["http://etcd-c.internal.us-east-2.k8s.redactedinc.com:2380"],"endpoints":["http://etcd-c.internal.us-east-2.k8s.redactedinc.com:4001"],"ID":"50114dd36d2f3dc3"} c0029d9ef59e42bc:{"name":"etcd-a","peerURLs":["https://etcd-a.internal.us-east-2.k8s.redactedinc.com:2380"],"endpoints":["https://etcd-a.internal.us-east-2.k8s.redactedinc.com:4001"],"ID":"c0029d9ef59e42bc"} e0ad5a0dbcdd8e79:{"name":"etcd-b","peerURLs":["http://etcd-b.internal.us-east-2.k8s.redactedinc.com:2380"],"endpoints":["http://etcd-b.internal.us-east-2.k8s.redactedinc.com:4001"],"ID":"e0ad5a0dbcdd8e79"}]
I0619 18:17:05.900656    3754 controller.go:615] sending member map to all peers: members:<name:"etcd-a" dns:"etcd-a.internal.us-east-2.k8s.redactedinc.com" addresses:"10.40.42.180" > 
I0619 18:17:05.900971    3754 etcdserver.go:222] updating hosts: map[10.40.42.180:[etcd-a.internal.us-east-2.k8s.redactedinc.com]]
I0619 18:17:05.900990    3754 hosts.go:84] hosts update: primary=map[10.40.42.180:[etcd-a.internal.us-east-2.k8s.redactedinc.com]], fallbacks=map[etcd-a.internal.us-east-2.k8s.redactedinc.com:[10.40.42.180 10.40.42.180] etcd-b.internal.us-east-2.k8s.redactedinc.com:[10.40.66.188 10.40.66.188] etcd-c.internal.us-east-2.k8s.redactedinc.com:[10.40.126.31 10.40.126.31]], final=map[10.40.126.31:[etcd-c.internal.us-east-2.k8s.redactedinc.com etcd-c.internal.us-east-2.k8s.redactedinc.com] 10.40.42.180:[etcd-a.internal.us-east-2.k8s.redactedinc.com] 10.40.66.188:[etcd-b.internal.us-east-2.k8s.redactedinc.com etcd-b.internal.us-east-2.k8s.redactedinc.com]]
I0619 18:17:05.901060    3754 hosts.go:181] skipping update of unchanged /etc/hosts
I0619 18:17:05.901141    3754 commands.go:22] not refreshing commands - TTL not hit
I0619 18:17:05.901154    3754 s3fs.go:220] Reading file "s3://us-east-2-il-kops-state-store/us-east-2.k8s.redactedinc.com/backups/etcd/main/control/etcd-cluster-created"
I0619 18:17:05.908400    3754 controller.go:369] spec member_count:3 etcd_version:"3.2.24" 
I0619 18:17:05.908454    3754 controller.go:417] mismatched version for peer peer{id:"etcd-a" endpoints:"10.40.42.180:3996" }: want "3.2.24", have "2.2.1"
I0619 18:17:05.908479    3754 controller.go:423] can't do in-place upgrade from "2.2.1" -> "3.2.24"
I0619 18:17:05.908500    3754 controller.go:517] detected that we need to upgrade/downgrade etcd
I0619 18:17:05.908513    3754 controller.go:526] upgrade/downgrade needed, but we don't have sufficient peers
I0619 18:17:08.052250    3754 peers.go:281] connecting to peer "etcd-b" with TLS policy, servername="etcd-manager-server-etcd-b"
W0619 18:17:08.053181    3754 peers.go:325] unable to grpc-ping discovered peer 10.40.66.188:3996: rpc error: code = Unavailable desc = all SubConns are in TransientFailure
I0619 18:17:08.053197    3754 peers.go:347] was not able to connect to peer etcd-b: map[10.40.66.188:3996:true]
W0619 18:17:08.053209    3754 peers.go:215] unexpected error from peer intercommunications: unable to connect to peer etcd-b
I0619 18:17:08.156376    3754 peers.go:281] connecting to peer "etcd-c" with TLS policy, servername="etcd-manager-server-etcd-c"
W0619 18:17:08.157656    3754 peers.go:325] unable to grpc-ping discovered peer 10.40.126.31:3996: rpc error: code = Unavailable desc = all SubConns are in TransientFailure
I0619 18:17:08.157673    3754 peers.go:347] was not able to connect to peer etcd-c: map[10.40.126.31:3996:true]
W0619 18:17:08.157686    3754 peers.go:215] unexpected error from peer intercommunications: unable to connect to peer etcd-c
I0619 18:17:13.056516    3754 peers.go:281] connecting to peer "etcd-b" with TLS policy, servername="etcd-manager-server-etcd-b"
W0619 18:17:13.060307    3754 peers.go:325] unable to grpc-ping discovered peer 10.40.66.188:3996: rpc error: code = Unavailable desc = all SubConns are in TransientFailure
I0619 18:17:13.060331    3754 peers.go:347] was not able to connect to peer etcd-b: map[10.40.66.188:3996:true]
W0619 18:17:13.060343    3754 peers.go:215] unexpected error from peer intercommunications: unable to connect to peer etcd-b
I0619 18:17:13.160261    3754 peers.go:281] connecting to peer "etcd-c" with TLS policy, servername="etcd-manager-server-etcd-c"
W0619 18:17:13.161570    3754 peers.go:325] unable to grpc-ping discovered peer 10.40.126.31:3996: rpc error: code = Unavailable desc = all SubConns are in TransientFailure
I0619 18:17:13.161588    3754 peers.go:347] was not able to connect to peer etcd-c: map[10.40.126.31:3996:true]
W0619 18:17:13.161602    3754 peers.go:215] unexpected error from peer intercommunications: unable to connect to peer etcd-c
I0619 18:17:15.916254    3754 controller.go:173] starting controller iteration
I0619 18:17:15.916299    3754 controller.go:269] I am leader with token "rgD81JOa6aHA3wsBY3qGfQ"
W0619 18:17:15.932160    3754 etcdclusterstate.go:97] unable to find node for member "etcd-c"; using default clientURLs [http://etcd-c.internal.us-east-2.k8s.redactedinc.com:4001]
W0619 18:17:15.978855    3754 etcdclusterstate.go:97] unable to find node for member "etcd-b"; using default clientURLs [http://etcd-b.internal.us-east-2.k8s.redactedinc.com:4001]
I0619 18:17:15.988353    3754 controller.go:276] etcd cluster state: etcdClusterState
  members:
    {"name":"etcd-c","peerURLs":["http://etcd-c.internal.us-east-2.k8s.redactedinc.com:2380"],"endpoints":["http://etcd-c.internal.us-east-2.k8s.redactedinc.com:4001"],"ID":"50114dd36d2f3dc3"}
    {"name":"etcd-a","peerURLs":["https://etcd-a.internal.us-east-2.k8s.redactedinc.com:2380"],"endpoints":["https://etcd-a.internal.us-east-2.k8s.redactedinc.com:4001"],"ID":"c0029d9ef59e42bc"}
    {"name":"etcd-b","peerURLs":["http://etcd-b.internal.us-east-2.k8s.redactedinc.com:2380"],"endpoints":["http://etcd-b.internal.us-east-2.k8s.redactedinc.com:4001"],"ID":"e0ad5a0dbcdd8e79"}
  peers:
    etcdClusterPeerInfo{peer=peer{id:"etcd-a" endpoints:"10.40.42.180:3996" }, info=cluster_name:"etcd" node_configuration:<name:"etcd-a" peer_urls:"https://etcd-a.internal.us-east-2.k8s.redactedinc.com:2380" client_urls:"https://etcd-a.internal.us-east-2.k8s.redactedinc.com:4001" quarantined_client_urls:"https://etcd-a.internal.us-east-2.k8s.redactedinc.com:3994" > etcd_state:<cluster:<desired_cluster_size:3 cluster_token:"etcd-cluster-token-etcd" nodes:<name:"etcd-a" peer_urls:"https://etcd-a.internal.us-east-2.k8s.redactedinc.com:2380" client_urls:"https://0.0.0.0:4001" quarantined_client_urls:"http://0.0.0.0:3994" tls_enabled:true > nodes:<name:"etcd-b" peer_urls:"http://etcd-b.internal.us-east-2.k8s.redactedinc.com:2380" client_urls:"http://0.0.0.0:4001" quarantined_client_urls:"http://0.0.0.0:3994" > nodes:<name:"etcd-c" peer_urls:"http://etcd-c.internal.us-east-2.k8s.redactedinc.com:2380" client_urls:"http://0.0.0.0:4001" quarantined_client_urls:"http://0.0.0.0:3994" > > etcd_version:"2.2.1" > }
I0619 18:17:15.988471    3754 controller.go:277] etcd cluster members: map[50114dd36d2f3dc3:{"name":"etcd-c","peerURLs":["http://etcd-c.internal.us-east-2.k8s.redactedinc.com:2380"],"endpoints":["http://etcd-c.internal.us-east-2.k8s.redactedinc.com:4001"],"ID":"50114dd36d2f3dc3"} c0029d9ef59e42bc:{"name":"etcd-a","peerURLs":["https://etcd-a.internal.us-east-2.k8s.redactedinc.com:2380"],"endpoints":["https://etcd-a.internal.us-east-2.k8s.redactedinc.com:4001"],"ID":"c0029d9ef59e42bc"} e0ad5a0dbcdd8e79:{"name":"etcd-b","peerURLs":["http://etcd-b.internal.us-east-2.k8s.redactedinc.com:2380"],"endpoints":["http://etcd-b.internal.us-east-2.k8s.redactedinc.com:4001"],"ID":"e0ad5a0dbcdd8e79"}]
I0619 18:17:15.988493    3754 controller.go:615] sending member map to all peers: members:<name:"etcd-a" dns:"etcd-a.internal.us-east-2.k8s.redactedinc.com" addresses:"10.40.42.180" > 
I0619 18:17:15.988684    3754 etcdserver.go:222] updating hosts: map[10.40.42.180:[etcd-a.internal.us-east-2.k8s.redactedinc.com]]
I0619 18:17:15.988699    3754 hosts.go:84] hosts update: primary=map[10.40.42.180:[etcd-a.internal.us-east-2.k8s.redactedinc.com]], fallbacks=map[etcd-a.internal.us-east-2.k8s.redactedinc.com:[10.40.42.180 10.40.42.180] etcd-b.internal.us-east-2.k8s.redactedinc.com:[10.40.66.188 10.40.66.188] etcd-c.internal.us-east-2.k8s.redactedinc.com:[10.40.126.31 10.40.126.31]], final=map[10.40.126.31:[etcd-c.internal.us-east-2.k8s.redactedinc.com etcd-c.internal.us-east-2.k8s.redactedinc.com] 10.40.42.180:[etcd-a.internal.us-east-2.k8s.redactedinc.com] 10.40.66.188:[etcd-b.internal.us-east-2.k8s.redactedinc.com etcd-b.internal.us-east-2.k8s.redactedinc.com]]
I0619 18:17:15.995143    3754 commands.go:22] not refreshing commands - TTL not hit
I0619 18:17:15.995167    3754 s3fs.go:220] Reading file "s3://us-east-2-il-kops-state-store/us-east-2.k8s.redactedinc.com/backups/etcd/main/control/etcd-cluster-created"
I0619 18:17:16.002575    3754 controller.go:369] spec member_count:3 etcd_version:"3.2.24" 
I0619 18:17:16.002630    3754 controller.go:417] mismatched version for peer peer{id:"etcd-a" endpoints:"10.40.42.180:3996" }: want "3.2.24", have "2.2.1"
I0619 18:17:16.002655    3754 controller.go:423] can't do in-place upgrade from "2.2.1" -> "3.2.24"
I0619 18:17:16.002674    3754 controller.go:517] detected that we need to upgrade/downgrade etcd
I0619 18:17:16.002688    3754 controller.go:526] upgrade/downgrade needed, but we don't have sufficient peers
I0619 18:17:18.060468    3754 peers.go:281] connecting to peer "etcd-b" with TLS policy, servername="etcd-manager-server-etcd-b"
W0619 18:17:18.061439    3754 peers.go:325] unable to grpc-ping discovered peer 10.40.66.188:3996: rpc error: code = Unavailable desc = all SubConns are in TransientFailure
I0619 18:17:18.061456    3754 peers.go:347] was not able to connect to peer etcd-b: map[10.40.66.188:3996:true]
W0619 18:17:18.061467    3754 peers.go:215] unexpected error from peer intercommunications: unable to connect to peer etcd-b
I0619 18:17:18.161739    3754 peers.go:281] connecting to peer "etcd-c" with TLS policy, servername="etcd-manager-server-etcd-c"
W0619 18:17:18.164142    3754 peers.go:325] unable to grpc-ping discovered peer 10.40.126.31:3996: rpc error: code = Unavailable desc = all SubConns are in TransientFailure
I0619 18:17:18.164162    3754 peers.go:347] was not able to connect to peer etcd-c: map[10.40.126.31:3996:true]
W0619 18:17:18.164195    3754 peers.go:215] unexpected error from peer intercommunications: unable to connect to peer etcd-c
I0619 18:17:23.061592    3754 peers.go:281] connecting to peer "etcd-b" with TLS policy, servername="etcd-manager-server-etcd-b"
W0619 18:17:23.062497    3754 peers.go:325] unable to grpc-ping discovered peer 10.40.66.188:3996: rpc error: code = Unavailable desc = all SubConns are in TransientFailure
I0619 18:17:23.062513    3754 peers.go:347] was not able to connect to peer etcd-b: map[10.40.66.188:3996:true]
W0619 18:17:23.062524    3754 peers.go:215] unexpected error from peer intercommunications: unable to connect to peer etcd-b
I0619 18:17:23.165198    3754 peers.go:281] connecting to peer "etcd-c" with TLS policy, servername="etcd-manager-server-etcd-c"
W0619 18:17:23.166648    3754 peers.go:325] unable to grpc-ping discovered peer 10.40.126.31:3996: rpc error: code = Unavailable desc = all SubConns are in TransientFailure
I0619 18:17:23.166665    3754 peers.go:347] was not able to connect to peer etcd-c: map[10.40.126.31:3996:true]
W0619 18:17:23.166686    3754 peers.go:215] unexpected error from peer intercommunications: unable to connect to peer etcd-c
I0619 18:17:26.003897    3754 controller.go:173] starting controller iteration
I0619 18:17:26.003934    3754 controller.go:269] I am leader with token "rgD81JOa6aHA3wsBY3qGfQ"
W0619 18:17:26.024674    3754 etcdclusterstate.go:97] unable to find node for member "etcd-c"; using default clientURLs [http://etcd-c.internal.us-east-2.k8s.redactedinc.com:4001]
W0619 18:17:26.040406    3754 etcdclusterstate.go:97] unable to find node for member "etcd-b"; using default clientURLs [http://etcd-b.internal.us-east-2.k8s.redactedinc.com:4001]
I0619 18:17:26.046479    3754 controller.go:276] etcd cluster state: etcdClusterState
  members:
    {"name":"etcd-c","peerURLs":["http://etcd-c.internal.us-east-2.k8s.redactedinc.com:2380"],"endpoints":["http://etcd-c.internal.us-east-2.k8s.redactedinc.com:4001"],"ID":"50114dd36d2f3dc3"}
    {"name":"etcd-a","peerURLs":["https://etcd-a.internal.us-east-2.k8s.redactedinc.com:2380"],"endpoints":["https://etcd-a.internal.us-east-2.k8s.redactedinc.com:4001"],"ID":"c0029d9ef59e42bc"}
    {"name":"etcd-b","peerURLs":["http://etcd-b.internal.us-east-2.k8s.redactedinc.com:2380"],"endpoints":["http://etcd-b.internal.us-east-2.k8s.redactedinc.com:4001"],"ID":"e0ad5a0dbcdd8e79"}
  peers:
    etcdClusterPeerInfo{peer=peer{id:"etcd-a" endpoints:"10.40.42.180:3996" }, info=cluster_name:"etcd" node_configuration:<name:"etcd-a" peer_urls:"https://etcd-a.internal.us-east-2.k8s.redactedinc.com:2380" client_urls:"https://etcd-a.internal.us-east-2.k8s.redactedinc.com:4001" quarantined_client_urls:"https://etcd-a.internal.us-east-2.k8s.redactedinc.com:3994" > etcd_state:<cluster:<desired_cluster_size:3 cluster_token:"etcd-cluster-token-etcd" nodes:<name:"etcd-a" peer_urls:"https://etcd-a.internal.us-east-2.k8s.redactedinc.com:2380" client_urls:"https://0.0.0.0:4001" quarantined_client_urls:"http://0.0.0.0:3994" tls_enabled:true > nodes:<name:"etcd-b" peer_urls:"http://etcd-b.internal.us-east-2.k8s.redactedinc.com:2380" client_urls:"http://0.0.0.0:4001" quarantined_client_urls:"http://0.0.0.0:3994" > nodes:<name:"etcd-c" peer_urls:"http://etcd-c.internal.us-east-2.k8s.redactedinc.com:2380" client_urls:"http://0.0.0.0:4001" quarantined_client_urls:"http://0.0.0.0:3994" > > etcd_version:"2.2.1" > }
I0619 18:17:26.046583    3754 controller.go:277] etcd cluster members: map[50114dd36d2f3dc3:{"name":"etcd-c","peerURLs":["http://etcd-c.internal.us-east-2.k8s.redactedinc.com:2380"],"endpoints":["http://etcd-c.internal.us-east-2.k8s.redactedinc.com:4001"],"ID":"50114dd36d2f3dc3"} c0029d9ef59e42bc:{"name":"etcd-a","peerURLs":["https://etcd-a.internal.us-east-2.k8s.redactedinc.com:2380"],"endpoints":["https://etcd-a.internal.us-east-2.k8s.redactedinc.com:4001"],"ID":"c0029d9ef59e42bc"} e0ad5a0dbcdd8e79:{"name":"etcd-b","peerURLs":["http://etcd-b.internal.us-east-2.k8s.redactedinc.com:2380"],"endpoints":["http://etcd-b.internal.us-east-2.k8s.redactedinc.com:4001"],"ID":"e0ad5a0dbcdd8e79"}]
I0619 18:17:26.046605    3754 controller.go:615] sending member map to all peers: members:<name:"etcd-a" dns:"etcd-a.internal.us-east-2.k8s.redactedinc.com" addresses:"10.40.42.180" > 
I0619 18:17:26.046776    3754 etcdserver.go:222] updating hosts: map[10.40.42.180:[etcd-a.internal.us-east-2.k8s.redactedinc.com]]
I0619 18:17:26.046790    3754 hosts.go:84] hosts update: primary=map[10.40.42.180:[etcd-a.internal.us-east-2.k8s.redactedinc.com]], fallbacks=map[etcd-a.internal.us-east-2.k8s.redactedinc.com:[10.40.42.180 10.40.42.180] etcd-b.internal.us-east-2.k8s.redactedinc.com:[10.40.66.188 10.40.66.188] etcd-c.internal.us-east-2.k8s.redactedinc.com:[10.40.126.31 10.40.126.31]], final=map[10.40.126.31:[etcd-c.internal.us-east-2.k8s.redactedinc.com etcd-c.internal.us-east-2.k8s.redactedinc.com] 10.40.42.180:[etcd-a.internal.us-east-2.k8s.redactedinc.com] 10.40.66.188:[etcd-b.internal.us-east-2.k8s.redactedinc.com etcd-b.internal.us-east-2.k8s.redactedinc.com]]
I0619 18:17:26.051305    3754 commands.go:22] not refreshing commands - TTL not hit
I0619 18:17:26.051542    3754 s3fs.go:220] Reading file "s3://us-east-2-il-kops-state-store/us-east-2.k8s.redactedinc.com/backups/etcd/main/control/etcd-cluster-created"
I0619 18:17:26.059599    3754 controller.go:369] spec member_count:3 etcd_version:"3.2.24" 
I0619 18:17:26.059807    3754 controller.go:417] mismatched version for peer peer{id:"etcd-a" endpoints:"10.40.42.180:3996" }: want "3.2.24", have "2.2.1"
I0619 18:17:26.059947    3754 controller.go:423] can't do in-place upgrade from "2.2.1" -> "3.2.24"
I0619 18:17:26.060067    3754 controller.go:517] detected that we need to upgrade/downgrade etcd
I0619 18:17:26.060157    3754 controller.go:526] upgrade/downgrade needed, but we don't have sufficient peers
I0619 18:17:28.064265    3754 peers.go:281] connecting to peer "etcd-b" with TLS policy, servername="etcd-manager-server-etcd-b"
W0619 18:17:28.068313    3754 peers.go:325] unable to grpc-ping discovered peer 10.40.66.188:3996: rpc error: code = Unavailable desc = all SubConns are in TransientFailure
I0619 18:17:28.068340    3754 peers.go:347] was not able to connect to peer etcd-b: map[10.40.66.188:3996:true]
W0619 18:17:28.068354    3754 peers.go:215] unexpected error from peer intercommunications: unable to connect to peer etcd-b
I0619 18:17:28.168276    3754 peers.go:281] connecting to peer "etcd-c" with TLS policy, servername="etcd-manager-server-etcd-c"
W0619 18:17:28.169593    3754 peers.go:325] unable to grpc-ping discovered peer 10.40.126.31:3996: rpc error: code = Unavailable desc = all SubConns are in TransientFailure
I0619 18:17:28.169610    3754 peers.go:347] was not able to connect to peer etcd-c: map[10.40.126.31:3996:true]
W0619 18:17:28.169621    3754 peers.go:215] unexpected error from peer intercommunications: unable to connect to peer etcd-c

I ran kops rolling-update cluster --yes 2 more times to force the other masters to be recreated.

The API server was unavailable between the final master being removed and the new one starting/etcd resyncing, which took approx. 15 minutes (as expected, but a heads up to anyone who finds this thread).

Not sure if a bug or a feature request, but it feels like kops should be able to detect this problem between upgrades and handle/advise accordingly.

Hi @richstokes

I believe that the Kops 1.12 Release Notes mention the disruption to masters due to the etcd migration. It suggests running kops rolling-update cluster --cloudonly --instance-group-roles master --master-interval=1s --yes, terminating all 3 masters and forcing them to be recreated all at once to minimize the downtime. If you have any suggestions on how to improve the visibility of this information or the overall UX of the k8s 1.11 -> 1.12 upgrade it would be much appreciated!

Thanks @rifelpet - there's a lesson in here for me about checking the release notes. Previous kops upgrades had been so smooth that it didn't occur to me. (And the benefit of upgrading a test environment first)

The only real UX feedback is that if kops plans to apply a "breaking change" like this (even though temporary and intentional), it would be helpful to display this at the time of running the cluster upgrade commands.

Thanks

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle stale

Stale issues rot after 30d of inactivity.
Mark the issue as fresh with /remove-lifecycle rotten.
Rotten issues close after an additional 30d of inactivity.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle rotten

Rotten issues close after 30d of inactivity.
Reopen the issue with /reopen.
Mark the issue as fresh with /remove-lifecycle rotten.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/close

@fejta-bot: Closing this issue.

In response to this:

Rotten issues close after 30d of inactivity.
Reopen the issue with /reopen.
Mark the issue as fresh with /remove-lifecycle rotten.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

Was this page helpful?
0 / 5 - 0 ratings

Related issues

thejsj picture thejsj  ·  4Comments

owenmorgan picture owenmorgan  ·  3Comments

yetanotherchris picture yetanotherchris  ·  3Comments

minasys picture minasys  ·  3Comments

chrislovecnm picture chrislovecnm  ·  3Comments