Eksctl: Support Upgrade Existing EKS Kubernetes

Created on 17 Dec 2018  路  16Comments  路  Source: weaveworks/eksctl

Why do you want this feature?
EKS currently has clusters running with 1.10 versions, this would add a new mechanism to upgrade existing clusters to 1.11.

What feature/behavior/change do you want?
I'd like to have the conversation about best practices for how we should allow for this?

  1. Should we upgrade the nodes automatically?
  2. How should we do this to reduce human error? https://github.com/aws/containers-roadmap/issues/57

This is the extension from #344

areupgrades

Most helpful comment

This would be awesome because this sucks.

All 16 comments

This would be awesome because this sucks.

Let's write down semi-manual instructions first (see https://github.com/weaveworks/eksctl/issues/357#issuecomment-450477481), it should become clear what needs automating from there.

cc @tiffanyfay

What @tiffanyfay and I have:

  • [ ] Create update control plane command (see eks.UpdateClusterVersion)
  • [x] Create new node group
  • [x] Add shared security group to new and old ASG
  • [x] delete old nodegroup (#593)

    • [x] ~taint all nodes (equivalent command: kubectl taint nodes -l alpha.eksctl.io/nodegroup-name=<name> key=value:NoSchedule)~ - not needed, as crodon implies this

    • [x] drain all nodes (equivalent command: kubectl drain -l alpha.eksctl.io/nodegroup-name=<name> --ignore-daemonsets --delete-local-data) [@errordeveloper refactoring drain client code kubernetes/kubernetes#72827, but we could use either kubernetes/autoscaler or openshift/kubernetes-drain]

    • [x] remove IAM role from the aws-auth configmap [implemented by @tiffanyfay in #528]

  • [ ] Implement update cluster command

  • [ ] check if cluster-autoscaler is installed if so scale down to 0 [needs to be implemented]
  • [ ] scale kube-dns by 1 [needs to be implemented]
  • [ ] if cluster-autoscaler is installed if so scale back to original [needs to be implemented]
  • [ ] Deploy coredns [needs to be implemented]
  • [ ] Delete kube-dns [needs to be implemented]
  • [ ] Update kube-proxy [needs to be updated]

We might also have to upgrade kube-proxy from 1.10 to 1.11. Need more info.

If going from 1.10 to 1.11 then also swap kube-dns for CoreDNS.

Good point @mrichman.

What @tiffanyfay https://github.com/tiffanyfay and I have:

  1. Create new node group eksctl create nodegroup
  2. Add new sg as ingress for old sg and old sg to new sg as ingress

What is this supposed to accomplish?

  1. check if cluster-autoscaler is installed if so scale down to 0
  2. scale kube-dns by 1
  3. taint all old nodes kubectl taint nodes node_name
    key=value:NoSchedule

Not sure this is really needed, drain accomplishes this as far as I know.

  1. drain all nodes kubectl drain node_name --ignore-daemonsets
    --delete-local-data

Why is '--ignore-daemonsets' needed here?

  1. once all nodes are drained remove added sg ingress
  2. delete old node group and remove IAM role from the aws-auth
    configmap
  3. if cluster-autoscaler is installed if so scale back to original

By the way, does it work with multiple ASGs?

  1. scale kube-dns down by 1

We might also have to upgrade kube-proxy from 1.10 to 1.11. Need more info.

As Mark mentiined, there is going to be a flip to coredns, is there some
kind of official EKS method for this?

@errordeveloper for the 1.11 upgrade, I don't believe so. I'll talk with the team.

And if/when we are good with the steps, I'll work on an update API/command when I'm back to work next week.

We also need to update kube-proxy in the list above.

https://docs.aws.amazon.com/eks/latest/userguide/coredns.html

Answering my own questions.

Why is '--ignore-daemonsets' needed here?

So one cannot normally delete deamonset-owned pods. I still don't get why, but anyway...

By the way, does it work with multiple ASGs?

Yes, cluster autoscaler is capable of discovering nodegroups.

I am still not clear on why we need to wire up a temporary SG? And what does key=value:NoSchedule that cordon/drain doesn't accomplish already?

Answering my own questions.

Why is '--ignore-daemonsets' needed here?

So one cannot normally delete deamonset-owned pods. I still don't get why, but anyway...

By the way, does it work with multiple ASGs?

Yes, cluster autoscaler is capable of discovering nodegroups.

Yeah, the --ignore-daemonsets is necessary or kubectl won't work to drain, didn't look into the full background for why. The reality is it doesn't matter for DS' cause as your new ASG came up and was available the DS' would have been schedule automatically.

I am still not clear on why we need to wire up a temporary SG? And what does key=value:NoSchedule that cordon/drain doesn't accomplish already?

The temp SG connection between the two ASGs allows cross service traffic while you drain nodes. So if you have pods running on both sets of ASGs and a service on the new ASG tries to route to a pod running on the old ASG it can still make the connection during the switch.

The Cordon/Drain vs NoSchedule is very nuanced. If you Cordon it will start to remove the pods from Services so doing this takes down your environment if you haven't already moved the workloads manually somehow. So instead we just NoSchedule to allow the new nodes to be the only nodes Schedule-able, then drain which will move them to the new instances.

Make sense?

Thanks, Chris! Do we strictly need the temporary SG? At the moment we are still debating what level of isolation nodegroup should have (see #419), but I think it there is no isolation (for ordinary ports), we don't need the temporary SG, unless I am missing something?

A short summary on #419 - I'm going to work on adding shared SG for all nodes, so that all node groups are actually equal, there will be options to enable isolation for those who need it. Adding this SG also means that we will have to add plumbing/mechanics for making changes to cluster stack, which will help for future work on upgrades in general.

We should turn https://github.com/weaveworks/eksctl/issues/348#issuecomment-453336225 into an actual proposal and write down basic CLI design. I think we are pretty close to having this implemented.

@errordeveloper would you call this done? I think we should close.

Yes, I think it is!

Was this page helpful?
0 / 5 - 0 ratings