Terraform-aws-eks: Nodes didn't get automatically updated after a version upgrade.

Created on 25 Jun 2019 · 9Comments · Source: terraform-aws-modules/terraform-aws-eks

I have issues

I'm submitting a...

[ ] bug report
[ ] feature request
[x] support request
[ ] kudos, thank you, warm fuzzy

What is the current behavior?

I had a running EKS 1.12 cluster with a single worker group (two worker nodes), and I've updated cluster_version to 1.13. The control-plane update worked, and the launch configuration's AMI was updated to the correct version, but the worker nodes didn't get automatically updated - I had to manually scale down to 0 and scale back to the desired capacity for the change to take effect.

I am not sure whether this is a bug or just the expected behaviour, so it would be really nice to have someone looking into this and providing guidance - a section about cluster and worker group upgrades would be awesome! 💯

As a side note, even though this probably represents a different issue, kube-proxy and CoreDNS didn't get automatically updated to the relevant versions. It would be awesome if that could be handled automatically as well - I might be able to contribute with code if someone is available to guide me through what's required.

If this is a bug, how to reproduce? Please include a code sample if relevant.

As I've mentioned above, I am not entirely sure this is a bug or just the expected behaviour. My Terraform configuration was initially the following:

module "my-eks" {
  source = "terraform-aws-modules/eks/aws"

  cluster_create_timeout = "30m"
  cluster_delete_timeout = "30m"

  cluster_enabled_log_types = [
    "api",
  ]

  cluster_endpoint_private_access = true
  cluster_endpoint_public_access  = true
  cluster_name                    = "my-eks"
  cluster_version                 = "1.12" // Later changed to "1.13".

  subnets = [
    "${aws_subnet.my-eks-internal-001.id}",
    "${aws_subnet.my-eks-internal-002.id}",
    "${aws_subnet.my-eks-internal-003.id}",
    "${aws_subnet.my-eks-external-001.id}",
    "${aws_subnet.my-eks-external-002.id}",
    "${aws_subnet.my-eks-external-003.id}",
  ]

  vpc_id = "${aws_vpc.my-eks.id}"

  worker_groups = [
    {
      asg_desired_capacity = 2
      asg_min_size         = 1
      asg_max_size         = 3
      instance_type        = "m5.large"
      name                 = "my-eks-wg-0"

      subnets = [
        "${aws_subnet.my-eks-internal-001.id}",
        "${aws_subnet.my-eks-internal-002.id}",
        "${aws_subnet.my-eks-internal-003.id}",
      ]
    }
  ]

  write_kubeconfig = true
}

What's the expected behavior?

I was expecting the old worker nodes to be replaced by two new worker nodes running the correct AMI automatically.

Are you able to fix this problem and submit a PR? Link here if you have already.

N/A.

Environment details

Affected module version: v5.0.0.
OS: macOS 10.14.5.
Terraform version: v0.12.2.

Any other relevant info

N/A.

Source

bmcustodio

Most helpful comment

Here's what I do:

Ensure you have cluster-autoscaler running
Apply TF changes that updates the LC of the ASG to the new AMI
Drain 1 older version node: kubectl drain --force --ignore-daemonsets --delete-local-data ip-xxxxxxx.eu-west-1.compute.internal
Wait until work load is rescheduled
cluster-autoscaler will create new nodes when required. These new nodes will have the new AMI version.
Repeat 3-5 until all older version nodes are drained
cluster-autoscaler will terminate the old nodes after 10-60 minutes automatically.

🚀

max-rocket-internet on 1 Jul 2019

❤11 👍3

All 9 comments

am not sure whether this is a bug or just the expected behaviour

It's expected behaviour. Updating nodes will be a different process for different workloads so we don't attempt to control this process in this module.

a section about cluster and worker group upgrades would be awesome!

You're right. Feel free to create a PR to add some details 😃

kube-proxy and CoreDNS didn't get automatically updated to the relevant versions. It would be awesome if that could be handled automatically as well

It could definitely be automated but this won't be part of this module. Details in https://github.com/terraform-aws-modules/terraform-aws-eks/issues/99

max-rocket-internet on 25 Jun 2019

@max-rocket-internet .. would you share some best practice for rolling upgrade the asg nodes when eks is upgraded ? so far, I could think of having 2 asg worker nodes and manually changing desired capacity!

kim0 on 30 Jun 2019

👍1

Here's what I do:

Ensure you have cluster-autoscaler running
Apply TF changes that updates the LC of the ASG to the new AMI
Drain 1 older version node: kubectl drain --force --ignore-daemonsets --delete-local-data ip-xxxxxxx.eu-west-1.compute.internal
Wait until work load is rescheduled
cluster-autoscaler will create new nodes when required. These new nodes will have the new AMI version.
Repeat 3-5 until all older version nodes are drained
cluster-autoscaler will terminate the old nodes after 10-60 minutes automatically.

🚀

max-rocket-internet on 1 Jul 2019

❤11 👍3

Thanks @max-rocket-internet .. It's helpful to know this is the "standard" approach. I would think setting termination_policies = ["OldestLaunchConfiguration"] would help ? to hint to CA which nodes to delete

kim0 on 1 Jul 2019

I would think setting termination_policies = ["OldestLaunchConfiguration"] would help ?

Essentially you don't want the ASG doing anything at all as it doesn't gracefully drain the node, it just shuts the instance down. This is much too aggressive. That's why you use the cluster-autoscaler and kubectl drain. This asks the pods to stop, respects all the timeout and shutdown settings in the pods (e.g. terminationGracePeriodSeconds and lifecycle settings) and stops any further pods being scheduled on the node.

In normal autoscaling then cluster-autoscaler will also drain nodes before telling the ASG to terminate that specific node.

max-rocket-internet on 1 Jul 2019

Thanks @max-rocket-internet .. I understand what you just mentioned. I guess what's not clear, is what makes the CA prefer to kill the nodes that I've just drained? I assume CA has no visibility that those nodes are started from an older LC, or does it ? Thanks!

kim0 on 1 Jul 2019

I assume CA has no visibility that those nodes are started from an older LC, or does it ?

No it doesn't. You are choosing to drain a node because it's an old one, as shown here:

$ kubectl get nodes
NAME                                             STATUS   ROLES    AGE     VERSION
ip-10-6-22-158.ap-southeast-1.compute.internal   Ready    <none>   23d     v1.12.7
ip-10-6-22-221.ap-southeast-1.compute.internal   Ready    <none>   31d     v1.11.9

Then CA will eventually terminate that node because it's status SchedulingDisabled, as shown here:

NAME                                        STATUS                     ROLES    AGE   VERSION
ip-10-0-27-15.eu-west-1.compute.internal    Ready,SchedulingDisabled   <none>   34d   v1.12.7

The CA will gracefully terminate nodes that are SchedulingDisabled or if they are not needed due to resources.

max-rocket-internet on 1 Jul 2019

❤1

Thanks a tons @max-rocket-internet .. It would be really awesome, if nodes got a k8s label that is the launch-configuration version, or the ami-id ..etc, so that one can easily evict all nodes matching the old label (in case of a large number of nodes). Is it a possibility to do that today ?

PS: I'm happy to send a docs PR on autoscaling.md summarizing everything you mentioned here!

kim0 on 2 Jul 2019

if nodes got a k8s label that is the launch-configuration version, or the ami-id

Yeah that's a good idea. PR welcome 😃

PS: I'm happy to send a docs PR on autoscaling.md summarizing everything you mentioned here!

Please do 💯

max-rocket-internet on 3 Jul 2019

Was this page helpful?

0 / 5 - 0 ratings

Related issues

Version 5.0.0 Error: Incorrect attribute value type

tokiwong · 4Comments

Subsequent apply runs cause subnet and vpc tags to get resetted

discordianfish · 4Comments

Better options for spot instances: Support aws_ec2_fleet or aws_spot_fleet_request

max-rocket-internet · 3Comments

Creating eks cluster in multiple availability regions

jimmiebtlr · 3Comments

Include autoscaling related IAM policies for workers for the cluster-autoscaler

max-rocket-internet · 4Comments