Kops: question - HA to single master migration

Created on 2 Aug 2018  路  7Comments  路  Source: kubernetes/kops

Has anyone done a migration from HA to single master and if so, what exactly were the steps and were there any problems to be aware of?

lifecyclstale

Most helpful comment

For anyone coming here via Google, the steps are a little different, since etcd-manager was introduced in kops 1.12, and main and events etcd clusters are backup to S3 (same bucket for KOPS_STATE_STORE) automatically and regularly.

So if you have a k8s cluster newer than 1.12 version, maybe you need the following steps:

  1. Delete etcd zones in cluster
$ kops edit cluster

In etcdCluster section, remove etcdMembers items to keep only one instanceGroup for main and events. e.g.

  etcdClusters:
  - etcdMembers:
    - instanceGroup: master-ap-southeast-1a
      name: a
    name: main
  - etcdMembers:
    - instanceGroup: master-ap-southeast-1a
      name: a
    name: events
  1. Apply the changes
$ kops update cluster --yes
$ kops rolling-update cluster --yes
  1. Remove 2 master instance groups
$ kops delete ig master-xxxxxx-1b
$ kops delete ig master-xxxxxx-1c

This action cannot be undone, and it will delete the 2 master nodes immediately.

Now 2 out of 3 of your master nodes are deleted, k8s etcd services might be failed and the kube-api service will be unreachable. It is normal that your kops and kubectl commands do not work anymore after this step.

  1. Restart the ectd cluster with single master node
    This is the tricky part. ssh into the remaining master node, then
$ sudo systemctl stop protokube
$ sudo systemctl stop kubelet

Download the etcd-manager-ctl tool. If using a different etcd-manager version, adjust the download link accordingly

$ wget https://github.com/kopeio/etcd-manager/releases/download/3.0.20190930/etcd-manager-ctl-linux-amd64
$ mv etcd-manager-ctl-linux-amd64 etcd-manager-ctl
$ chmod +x etcd-manager-ctl
$ mv etcd-manager-ctl /usr/local/bin/

Restore backups from S3. See the official docs

$ etcd-manager-ctl -backup-store=s3://<kops s3 bucket name>/<cluster name>/backups/etcd/main list-backups
$ etcd-manager-ctl -backup-store=s3://<kops s3 bucket name>/<cluster name>/backups/etcd/main restore-backup 2019-10-16T09:42:37Z-000001
# do the same for events
$ etcd-manager-ctl -backup-store=s3://<kops s3 bucket name>/<cluster name>/backups/etcd/events list-backups
$ etcd-manager-ctl -backup-store=s3://<kops s3 bucket name>/<cluster name>/backups/etcd/events restore-backup 2019-10-16T09:42:37Z-000001

This does not start the restore immediately; you need to restart etcd: kill related containers and start kubelet

$ sudo systemctl start kubelet
$ sudo systemctl start protokube

Wait for the restore to finish, then kubectl get nodes and kops validate cluster should be working. If not, you can just terminate the EC2 instance of the remaining master node in AWS console, a new master node will be created by Auto Scaling Groups, and etcd cluster will be restored.

All 7 comments

I've never done it myself but I've heard some people say, if you do this in reverse it should work:

https://github.com/kubernetes/kops/blob/master/docs/single-to-multi-master.md

I would do it on a test cluster first with a backup

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle stale

Thanks! Our migration was a success.

For posterity (mileage may vary). Ours was a A/B/C zoned cluster and we wished to drop A and C zones:

1) backup etcd
2) drop etcd zones in cluster manifest.
3) drop A/C zones in worker instance group
4) cloud only rolling restart
5) drop A/C masters (note: this takes affect immediately)

:warning: Your k8s etcd services might be unable to start and the kube-api service on master probably won't work. To fix that, you need to reinitialize the etcd clusters by adding the -force-new-cluster option to the etcd manifests on the master node.

  • ssh to master
  • sudo systemctl stop protokube
  • sudo systemctl stop kubelet
  • add -force-new-cluster to /etc/kubernetes/manifests/etcd.manifest
  • add -force-new-cluster to /etc/kubernetes/manifests/etcd-events.manifest
  • sudo systemctl start protokube
  • sudo systemctl start kubelet

etcd should be sane now as a single member cluster.

Another "gotcha" for us was that some of our pvc-linked ebs volumes were stranded in the abandoned zones. To fix this, we took snapshots of the volumes in A/C and created new volumes from those snapshots in zone B with the same tags as their source volumes. Then we manually edited the pv manifests to point at the new volume id in zone B. Our pvc-dependent services came back online at that point.

@danielkrainas's step almost got us working. One gotcha was that we needed to start protokube after successfully starting Kubelet with the modified etcd manifests. Protokube rewrites those manifests during its startup, which would wipe out the addition of -force-new-cluster

@fullsailor thanks! I had written the steps from my terminal history and memory and bungled em. I updated my comment.

Very helpful guys, thank you!

For anyone coming here via Google, the steps are a little different, since etcd-manager was introduced in kops 1.12, and main and events etcd clusters are backup to S3 (same bucket for KOPS_STATE_STORE) automatically and regularly.

So if you have a k8s cluster newer than 1.12 version, maybe you need the following steps:

  1. Delete etcd zones in cluster
$ kops edit cluster

In etcdCluster section, remove etcdMembers items to keep only one instanceGroup for main and events. e.g.

  etcdClusters:
  - etcdMembers:
    - instanceGroup: master-ap-southeast-1a
      name: a
    name: main
  - etcdMembers:
    - instanceGroup: master-ap-southeast-1a
      name: a
    name: events
  1. Apply the changes
$ kops update cluster --yes
$ kops rolling-update cluster --yes
  1. Remove 2 master instance groups
$ kops delete ig master-xxxxxx-1b
$ kops delete ig master-xxxxxx-1c

This action cannot be undone, and it will delete the 2 master nodes immediately.

Now 2 out of 3 of your master nodes are deleted, k8s etcd services might be failed and the kube-api service will be unreachable. It is normal that your kops and kubectl commands do not work anymore after this step.

  1. Restart the ectd cluster with single master node
    This is the tricky part. ssh into the remaining master node, then
$ sudo systemctl stop protokube
$ sudo systemctl stop kubelet

Download the etcd-manager-ctl tool. If using a different etcd-manager version, adjust the download link accordingly

$ wget https://github.com/kopeio/etcd-manager/releases/download/3.0.20190930/etcd-manager-ctl-linux-amd64
$ mv etcd-manager-ctl-linux-amd64 etcd-manager-ctl
$ chmod +x etcd-manager-ctl
$ mv etcd-manager-ctl /usr/local/bin/

Restore backups from S3. See the official docs

$ etcd-manager-ctl -backup-store=s3://<kops s3 bucket name>/<cluster name>/backups/etcd/main list-backups
$ etcd-manager-ctl -backup-store=s3://<kops s3 bucket name>/<cluster name>/backups/etcd/main restore-backup 2019-10-16T09:42:37Z-000001
# do the same for events
$ etcd-manager-ctl -backup-store=s3://<kops s3 bucket name>/<cluster name>/backups/etcd/events list-backups
$ etcd-manager-ctl -backup-store=s3://<kops s3 bucket name>/<cluster name>/backups/etcd/events restore-backup 2019-10-16T09:42:37Z-000001

This does not start the restore immediately; you need to restart etcd: kill related containers and start kubelet

$ sudo systemctl start kubelet
$ sudo systemctl start protokube

Wait for the restore to finish, then kubectl get nodes and kops validate cluster should be working. If not, you can just terminate the EC2 instance of the remaining master node in AWS console, a new master node will be created by Auto Scaling Groups, and etcd cluster will be restored.

Was this page helpful?
0 / 5 - 0 ratings

Related issues

justinsb picture justinsb  路  4Comments

Caskia picture Caskia  路  3Comments

argusua picture argusua  路  5Comments

olalonde picture olalonde  路  4Comments

DocValerian picture DocValerian  路  4Comments