Kops: question - HA to single master migration

Created on 2 Aug 2018 · 7Comments · Source: kubernetes/kops

Has anyone done a migration from HA to single master and if so, what exactly were the steps and were there any problems to be aware of?

lifecyclstale

Source

danielkrainas

Most helpful comment

For anyone coming here via Google, the steps are a little different, since etcd-manager was introduced in kops 1.12, and main and events etcd clusters are backup to S3 (same bucket for KOPS_STATE_STORE) automatically and regularly.

So if you have a k8s cluster newer than 1.12 version, maybe you need the following steps:

Delete etcd zones in cluster

$ kops edit cluster

In etcdCluster section, remove etcdMembers items to keep only one instanceGroup for main and events. e.g.

  etcdClusters:
  - etcdMembers:
    - instanceGroup: master-ap-southeast-1a
      name: a
    name: main
  - etcdMembers:
    - instanceGroup: master-ap-southeast-1a
      name: a
    name: events

Apply the changes

$ kops update cluster --yes
$ kops rolling-update cluster --yes

Remove 2 master instance groups

$ kops delete ig master-xxxxxx-1b
$ kops delete ig master-xxxxxx-1c

This action cannot be undone, and it will delete the 2 master nodes immediately.

Now 2 out of 3 of your master nodes are deleted, k8s etcd services might be failed and the kube-api service will be unreachable. It is normal that your kops and kubectl commands do not work anymore after this step.

Restart the ectd cluster with single master node
This is the tricky part. ssh into the remaining master node, then

$ sudo systemctl stop protokube
$ sudo systemctl stop kubelet

Download the etcd-manager-ctl tool. If using a different etcd-manager version, adjust the download link accordingly

$ wget https://github.com/kopeio/etcd-manager/releases/download/3.0.20190930/etcd-manager-ctl-linux-amd64
$ mv etcd-manager-ctl-linux-amd64 etcd-manager-ctl
$ chmod +x etcd-manager-ctl
$ mv etcd-manager-ctl /usr/local/bin/

Restore backups from S3. See the official docs

$ etcd-manager-ctl -backup-store=s3://<kops s3 bucket name>/<cluster name>/backups/etcd/main list-backups
$ etcd-manager-ctl -backup-store=s3://<kops s3 bucket name>/<cluster name>/backups/etcd/main restore-backup 2019-10-16T09:42:37Z-000001
# do the same for events
$ etcd-manager-ctl -backup-store=s3://<kops s3 bucket name>/<cluster name>/backups/etcd/events list-backups
$ etcd-manager-ctl -backup-store=s3://<kops s3 bucket name>/<cluster name>/backups/etcd/events restore-backup 2019-10-16T09:42:37Z-000001

This does not start the restore immediately; you need to restart etcd: kill related containers and start kubelet

$ sudo systemctl start kubelet
$ sudo systemctl start protokube

Wait for the restore to finish, then kubectl get nodes and kops validate cluster should be working. If not, you can just terminate the EC2 instance of the remaining master node in AWS console, a new master node will be created by Auto Scaling Groups, and etcd cluster will be restored.

whuhacker on 17 Oct 2019

👍3

All 7 comments

I've never done it myself but I've heard some people say, if you do this in reverse it should work:

https://github.com/kubernetes/kops/blob/master/docs/single-to-multi-master.md

I would do it on a test cluster first with a backup

mikesplain on 2 Aug 2018

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle stale

fejta-bot on 31 Oct 2018

Thanks! Our migration was a success.

For posterity (mileage may vary). Ours was a A/B/C zoned cluster and we wished to drop A and C zones:

1) backup etcd
2) drop etcd zones in cluster manifest.
3) drop A/C zones in worker instance group
4) cloud only rolling restart
5) drop A/C masters (note: this takes affect immediately)

:warning: Your k8s etcd services might be unable to start and the kube-api service on master probably won't work. To fix that, you need to reinitialize the etcd clusters by adding the -force-new-cluster option to the etcd manifests on the master node.

ssh to master
sudo systemctl stop protokube
sudo systemctl stop kubelet
add -force-new-cluster to /etc/kubernetes/manifests/etcd.manifest
add -force-new-cluster to /etc/kubernetes/manifests/etcd-events.manifest
sudo systemctl start protokube
sudo systemctl start kubelet

etcd should be sane now as a single member cluster.

Another "gotcha" for us was that some of our pvc-linked ebs volumes were stranded in the abandoned zones. To fix this, we took snapshots of the volumes in A/C and created new volumes from those snapshots in zone B with the same tags as their source volumes. Then we manually edited the pv manifests to point at the new volume id in zone B. Our pvc-dependent services came back online at that point.

danielkrainas on 31 Oct 2018

@danielkrainas's step almost got us working. One gotcha was that we needed to start protokube after successfully starting Kubelet with the modified etcd manifests. Protokube rewrites those manifests during its startup, which would wipe out the addition of -force-new-cluster

fullsailor on 4 Feb 2019

👍1

@fullsailor thanks! I had written the steps from my terminal history and memory and bungled em. I updated my comment.

danielkrainas on 4 Feb 2019

Very helpful guys, thank you!

georgeyord on 17 Apr 2019

So if you have a k8s cluster newer than 1.12 version, maybe you need the following steps:

Delete etcd zones in cluster

$ kops edit cluster

In etcdCluster section, remove etcdMembers items to keep only one instanceGroup for main and events. e.g.

  etcdClusters:
  - etcdMembers:
    - instanceGroup: master-ap-southeast-1a
      name: a
    name: main
  - etcdMembers:
    - instanceGroup: master-ap-southeast-1a
      name: a
    name: events

Apply the changes

$ kops update cluster --yes
$ kops rolling-update cluster --yes

Remove 2 master instance groups

$ kops delete ig master-xxxxxx-1b
$ kops delete ig master-xxxxxx-1c

This action cannot be undone, and it will delete the 2 master nodes immediately.

Restart the ectd cluster with single master node
This is the tricky part. ssh into the remaining master node, then

$ sudo systemctl stop protokube
$ sudo systemctl stop kubelet

Download the etcd-manager-ctl tool. If using a different etcd-manager version, adjust the download link accordingly

$ wget https://github.com/kopeio/etcd-manager/releases/download/3.0.20190930/etcd-manager-ctl-linux-amd64
$ mv etcd-manager-ctl-linux-amd64 etcd-manager-ctl
$ chmod +x etcd-manager-ctl
$ mv etcd-manager-ctl /usr/local/bin/

Restore backups from S3. See the official docs

$ etcd-manager-ctl -backup-store=s3://<kops s3 bucket name>/<cluster name>/backups/etcd/main list-backups
$ etcd-manager-ctl -backup-store=s3://<kops s3 bucket name>/<cluster name>/backups/etcd/main restore-backup 2019-10-16T09:42:37Z-000001
# do the same for events
$ etcd-manager-ctl -backup-store=s3://<kops s3 bucket name>/<cluster name>/backups/etcd/events list-backups
$ etcd-manager-ctl -backup-store=s3://<kops s3 bucket name>/<cluster name>/backups/etcd/events restore-backup 2019-10-16T09:42:37Z-000001

This does not start the restore immediately; you need to restart etcd: kill related containers and start kubelet

$ sudo systemctl start kubelet
$ sudo systemctl start protokube

whuhacker on 17 Oct 2019

👍3

Was this page helpful?

0 / 5 - 0 ratings

Related issues

kube-dns pods cannot be scheduled on master instances

georgebuckerfield · 4Comments

Cluster create fails with kops-version.txt not found

mikejoh · 3Comments

Cycle Nodes

owenmorgan · 3Comments

Go panic when deleting IG

drewfisher314 · 4Comments

kops drain node

chrislovecnm · 3Comments