Has anyone done a migration from HA to single master and if so, what exactly were the steps and were there any problems to be aware of?
I've never done it myself but I've heard some people say, if you do this in reverse it should work:
https://github.com/kubernetes/kops/blob/master/docs/single-to-multi-master.md
I would do it on a test cluster first with a backup
Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.
If this issue is safe to close now please do so with /close.
Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle stale
Thanks! Our migration was a success.
For posterity (mileage may vary). Ours was a A/B/C zoned cluster and we wished to drop A and C zones:
1) backup etcd
2) drop etcd zones in cluster manifest.
3) drop A/C zones in worker instance group
4) cloud only rolling restart
5) drop A/C masters (note: this takes affect immediately)
:warning: Your k8s etcd services might be unable to start and the kube-api service on master probably won't work. To fix that, you need to reinitialize the etcd clusters by adding the -force-new-cluster option to the etcd manifests on the master node.
sudo systemctl stop protokube sudo systemctl stop kubelet-force-new-cluster to /etc/kubernetes/manifests/etcd.manifest-force-new-cluster to /etc/kubernetes/manifests/etcd-events.manifestsudo systemctl start protokubesudo systemctl start kubeletetcd should be sane now as a single member cluster.
Another "gotcha" for us was that some of our pvc-linked ebs volumes were stranded in the abandoned zones. To fix this, we took snapshots of the volumes in A/C and created new volumes from those snapshots in zone B with the same tags as their source volumes. Then we manually edited the pv manifests to point at the new volume id in zone B. Our pvc-dependent services came back online at that point.
@danielkrainas's step almost got us working. One gotcha was that we needed to start protokube after successfully starting Kubelet with the modified etcd manifests. Protokube rewrites those manifests during its startup, which would wipe out the addition of -force-new-cluster
@fullsailor thanks! I had written the steps from my terminal history and memory and bungled em. I updated my comment.
Very helpful guys, thank you!
For anyone coming here via Google, the steps are a little different, since etcd-manager was introduced in kops 1.12, and main and events etcd clusters are backup to S3 (same bucket for KOPS_STATE_STORE) automatically and regularly.
So if you have a k8s cluster newer than 1.12 version, maybe you need the following steps:
$ kops edit cluster
In etcdCluster section, remove etcdMembers items to keep only one instanceGroup for main and events. e.g.
etcdClusters:
- etcdMembers:
- instanceGroup: master-ap-southeast-1a
name: a
name: main
- etcdMembers:
- instanceGroup: master-ap-southeast-1a
name: a
name: events
$ kops update cluster --yes
$ kops rolling-update cluster --yes
$ kops delete ig master-xxxxxx-1b
$ kops delete ig master-xxxxxx-1c
This action cannot be undone, and it will delete the 2 master nodes immediately.
Now 2 out of 3 of your master nodes are deleted, k8s etcd services might be failed and the kube-api service will be unreachable. It is normal that your kops and kubectl commands do not work anymore after this step.
$ sudo systemctl stop protokube
$ sudo systemctl stop kubelet
Download the etcd-manager-ctl tool. If using a different etcd-manager version, adjust the download link accordingly
$ wget https://github.com/kopeio/etcd-manager/releases/download/3.0.20190930/etcd-manager-ctl-linux-amd64
$ mv etcd-manager-ctl-linux-amd64 etcd-manager-ctl
$ chmod +x etcd-manager-ctl
$ mv etcd-manager-ctl /usr/local/bin/
Restore backups from S3. See the official docs
$ etcd-manager-ctl -backup-store=s3://<kops s3 bucket name>/<cluster name>/backups/etcd/main list-backups
$ etcd-manager-ctl -backup-store=s3://<kops s3 bucket name>/<cluster name>/backups/etcd/main restore-backup 2019-10-16T09:42:37Z-000001
# do the same for events
$ etcd-manager-ctl -backup-store=s3://<kops s3 bucket name>/<cluster name>/backups/etcd/events list-backups
$ etcd-manager-ctl -backup-store=s3://<kops s3 bucket name>/<cluster name>/backups/etcd/events restore-backup 2019-10-16T09:42:37Z-000001
This does not start the restore immediately; you need to restart etcd: kill related containers and start kubelet
$ sudo systemctl start kubelet
$ sudo systemctl start protokube
Wait for the restore to finish, then kubectl get nodes and kops validate cluster should be working. If not, you can just terminate the EC2 instance of the remaining master node in AWS console, a new master node will be created by Auto Scaling Groups, and etcd cluster will be restored.
Most helpful comment
For anyone coming here via Google, the steps are a little different, since etcd-manager was introduced in kops 1.12, and
mainandeventsetcd clusters are backup to S3 (same bucket forKOPS_STATE_STORE) automatically and regularly.So if you have a k8s cluster newer than 1.12 version, maybe you need the following steps:
In
etcdClustersection, removeetcdMembersitems to keep only oneinstanceGroupformainandevents. e.g.This action cannot be undone, and it will delete the 2 master nodes immediately.
Now 2 out of 3 of your master nodes are deleted, k8s etcd services might be failed and the kube-api service will be unreachable. It is normal that your
kopsandkubectlcommands do not work anymore after this step.This is the tricky part. ssh into the remaining master node, then
Download the
etcd-manager-ctltool. If using a differentetcd-managerversion, adjust the download link accordinglyRestore backups from S3. See the official docs
This does not start the restore immediately; you need to restart etcd: kill related containers and start kubelet
Wait for the restore to finish, then
kubectl get nodesandkops validate clustershould be working. If not, you can just terminate the EC2 instance of the remaining master node in AWS console, a new master node will be created by Auto Scaling Groups, and etcd cluster will be restored.