Autoscaler: Unable to get instance type from launch config or launch template - v1.13.6

Created on 8 Aug 2019  路  16Comments  路  Source: kubernetes/autoscaler

We have started to use launch-templates-mixed in our Kubernetes cluster 1.13 on EKS and apparently, when we upgraded to cluster-autoscaler 1.13.6 we started fo face this issue with the autoscaling groups using launch-templates-mixed
Unable to build proper template node for $AUTOSCALING_GROUP_NAME: Unable to get instance type from launch config or launch template

When rolling back to cluster-autoscaler 1.13.1 this problems does not appear at all.

Here are some logs from 1.13.6
logs-ca-1-13-6.txt

areprovideaws

Most helpful comment

@mmingorance-dh You will actually see exact issues on v1.13.1. The reason you did see it because
GetNodeInfosForGroups was moved to from ScaleUp to RunOnce

1.13.1
https://github.com/kubernetes/autoscaler/blob/6402c460cd0b0a735585ced05f23c62a8d7a2124/cluster-autoscaler/core/scale_up.go#L265-L267

1.13.5
https://github.com/kubernetes/autoscaler/blob/909c5aaa60cf7946833ce30627ff7118a7a1cab7/cluster-autoscaler/core/static_autoscaler.go#L149-L150

In v1.13.1. Only if you scale this node group up. CA then goes look for a TemplateInfo of this node group.

We didn't handle MixedInstancePolicy case here.
https://github.com/kubernetes/autoscaler/blob/8303a2355e55dc7a416a8a80359207b774556d51/cluster-autoscaler/cloudprovider/aws/aws_manager.go#L316-L324

All scale up from 0 case in MixedInstancePolicy group will be impacted. If you already have one instance in that ASG. CA won't look at ASG template and will use existing node as a template.

1.13 actually doesn't support MixedInstancePolicy. If you https://github.com/kubernetes/autoscaler/pull/2019#issuecomment-492750444 we only backport to 1.14. The major reason is the aws sdk with MixedInstancePolicy is not compatible with 1.13 version.

Please use 1.14 instead. I can have a PR to add minimum version support in MixedInstancePolicy documentation here.

https://github.com/kubernetes/autoscaler/blob/master/cluster-autoscaler/cloudprovider/aws/MixedInstancePolicy.md

All 16 comments

/cc @Jeffwan @jaypipes
This seems very specific to AWS provider.

/area provider/aws

https://github.com/kubernetes/autoscaler/commit/4bd79c83531d449e9ed1451cd030a9f34cc3e470 introduced the tracking of launch template or config. The AwsManager.buildInstanceType function hasn't changed substantially since July last year (well before 1.13.1).

This error will be returned if the ASG has no launch configuration name or launch template name+version.

What does your ASG look like?

Also, see in your logs this warning:

W0808 12:28:18.713213       1 aws_manager.go:194] Found multiple availability zones for ASG "v1-13-testing-eu-tools-blue20190807074452737800000026"; using eu-west-1b

Please note that multi-AZ ASGs are not supported. Please see the second bullet point here:

https://github.com/kubernetes/autoscaler/tree/master/cluster-autoscaler/cloudprovider/aws#common-notes-and-gotchas

@jaypipes Thanks for replying.
multi-AZ ASGs can lead to nodes being terminated by AWS instead of cluster-autoscaler but it doesn't cause any problem with cluster-autoscaler terminating and launching new instances. We are running our ASGs in multiple AZs since very long time ago.

My ASG looks like this:

{
    "AutoScalingGroups": [
        {
            "AutoScalingGroupARN": "$ARN",
            "ServiceLinkedRoleARN": "$RoleARN",
            "TargetGroupARNs": [],
            "SuspendedProcesses": [
                {
                    "ProcessName": "AZRebalance",
                    "SuspensionReason": "User suspended at 2019-08-07T07:45:04Z"
                }
            ],
            "DesiredCapacity": 0,
            "MixedInstancesPolicy": {
                "InstancesDistribution": {
                    "OnDemandBaseCapacity": 0,
                    "SpotInstancePools": 4,
                    "OnDemandPercentageAboveBaseCapacity": 0,
                    "SpotMaxPrice": "$PRICE"
                    "SpotAllocationStrategy": "lowest-price",
                    "OnDemandAllocationStrategy": "prioritized"
                },
                "LaunchTemplate": {
                    "LaunchTemplateSpecification": {
                        "LaunchTemplateName": "$LAUNCHTEMPLATENAME",
                        "Version": "$Latest",
                        "LaunchTemplateId": "$LAUNCHTEMPLATEID"
                    },
                    "Overrides": [
                        {
                            "InstanceType": "m5.2xlarge"
                        },
                        {
                            "InstanceType": "c5.2xlarge"
                        },
                        {
                            "InstanceType": "r5.2xlarge"
                        },
                        {
                            "InstanceType": "t3.2xlarge"
                        }
                    ]
                }
            },
            "EnabledMetrics": [],
            "Tags": [
                {
                    "ResourceType": "auto-scaling-group",
                    "ResourceId": "$AUTOSCALINGGROUPNAME",
                    "PropagateAtLaunch": true,
                    "Value": "blue",
                    "Key": "Deployment"
                },
                {
                    "ResourceType": "auto-scaling-group",
                    "ResourceId": "$AUTOSCALINGGROUPNAME",
                    "PropagateAtLaunch": true,
                    "Value": "v1-13-testing-eu",
                    "Key": "KubernetesCluster"
                },
                {
                    "ResourceType": "auto-scaling-group",
                    "ResourceId": "$AUTOSCALINGGROUPNAME",
                    "PropagateAtLaunch": true,
                    "Value": "v1-13-testing-eu-tools-blue-eks_asg",
                    "Key": "Name"
                },
                {
                    "ResourceType": "auto-scaling-group",
                    "ResourceId": "$AUTOSCALINGGROUPNAME",
                    "PropagateAtLaunch": true,
                    "Value": "tools",
                    "Key": "Usage"
                },
                {
                    "ResourceType": "auto-scaling-group",
                    "ResourceId": "$AUTOSCALINGGROUPNAME",
                    "PropagateAtLaunch": true,
                    "Value": "testing",
                    "Key": "environment"
                },
                {
                    "ResourceType": "auto-scaling-group",
                    "ResourceId": "$AUTOSCALINGGROUPNAME",
                    "PropagateAtLaunch": false,
                    "Value": "true",
                    "Key": "k8s.io/cluster-autoscaler/enabled"
                },
                {
                    "ResourceType": "auto-scaling-group",
                    "ResourceId": "$AUTOSCALINGGROUPNAME",
                    "PropagateAtLaunch": false,
                    "Value": "250Gi",
                    "Key": "k8s.io/cluster-autoscaler/node-template/resources/ephemeral-storage"
                },
                {
                    "ResourceType": "auto-scaling-group",
                    "ResourceId": "$AUTOSCALINGGROUPNAME",
                    "PropagateAtLaunch": false,
                    "Value": "",
                    "Key": "k8s.io/cluster-autoscaler/v1-13-testing-eu"
                },
                {
                    "ResourceType": "auto-scaling-group",
                    "ResourceId": "$AUTOSCALINGGROUPNAME",
                    "PropagateAtLaunch": true,
                    "Value": "owned",
                    "Key": "kubernetes.io/cluster/v1-13-testing-eu"
                },
                {
                    "ResourceType": "auto-scaling-group",
                    "ResourceId": "$AUTOSCALINGGROUPNAME",
                    "PropagateAtLaunch": true,
                    "Value": "1",
                    "Key": "terraform"
                }
            ],
            "AutoScalingGroupName": "$AUTOSCALINGGROUPNAME",
            "DefaultCooldown": 300,
            "MinSize": 0,
            "Instances": [],
            "MaxSize": 30,
            "VPCZoneIdentifier": "$SUBNETS_IDS",
            "HealthCheckGracePeriod": 300,
            "TerminationPolicies": [
                "Default"
            ],
            "LoadBalancerNames": [],
            "CreatedTime": "2019-08-07T07:45:03.937Z",
            "AvailabilityZones": [
                "eu-west-1b",
                "eu-west-1c",
                "eu-west-1a"
            ],
            "HealthCheckType": "EC2",
            "NewInstancesProtectedFromScaleIn": false
        }
    ]
}

In terraform looks pretty much like this example: https://github.com/terraform-aws-modules/terraform-aws-eks/blob/v4.0.2/examples/spot_instances/main.tf#L38-L60

Thanks again for your help

Thanks very much for providing that info @mmingorance-dh. Is that ASG the same one from the log file you linked above? I notice the name of the ASG is different between the log file and the ASG JSON info above ("v1-13-testing-eu-tools-blue" vs "v1-13-testing-eu-tools-blue-eks_asg")

@jaypipes "v1-13-testing-eu-tools-blue" is the name of the ASG and "v1-13-testing-eu-tools-blue-eks_asg" is the value of the tag called Name

@mmingorance-dh You will actually see exact issues on v1.13.1. The reason you did see it because
GetNodeInfosForGroups was moved to from ScaleUp to RunOnce

1.13.1
https://github.com/kubernetes/autoscaler/blob/6402c460cd0b0a735585ced05f23c62a8d7a2124/cluster-autoscaler/core/scale_up.go#L265-L267

1.13.5
https://github.com/kubernetes/autoscaler/blob/909c5aaa60cf7946833ce30627ff7118a7a1cab7/cluster-autoscaler/core/static_autoscaler.go#L149-L150

In v1.13.1. Only if you scale this node group up. CA then goes look for a TemplateInfo of this node group.

We didn't handle MixedInstancePolicy case here.
https://github.com/kubernetes/autoscaler/blob/8303a2355e55dc7a416a8a80359207b774556d51/cluster-autoscaler/cloudprovider/aws/aws_manager.go#L316-L324

All scale up from 0 case in MixedInstancePolicy group will be impacted. If you already have one instance in that ASG. CA won't look at ASG template and will use existing node as a template.

1.13 actually doesn't support MixedInstancePolicy. If you https://github.com/kubernetes/autoscaler/pull/2019#issuecomment-492750444 we only backport to 1.14. The major reason is the aws sdk with MixedInstancePolicy is not compatible with 1.13 version.

Please use 1.14 instead. I can have a PR to add minimum version support in MixedInstancePolicy documentation here.

https://github.com/kubernetes/autoscaler/blob/master/cluster-autoscaler/cloudprovider/aws/MixedInstancePolicy.md

@Jeffwan is it a good idea to run CA 1.14 on a Kubernetes cluster 1.13?
According to CA documentation, you are advised to run the same version of CA than Kubernetes.
I don't have any problem on running a higher version than Kubernetes as long as it doesn't cause any problem.

@mmingorance-dh We don't have official compatibility tests for cross versions. We don't suggest to use different CA version.

  1. You can wait for EKS 1.14 and upgrade your CA
  2. Your need to take the risk to run higher CA version and we don't know if there's any compatibility issues. If you determine to run it and meet problems, please feel free to report them to us.

The reason i added a down emoji, is because you are suggesting using a CA that is untested for a specific k8s version is a plausible scenario, and more people than just the author will read this thread. a.k.a using a different version decision appears to be taken too lightly.

Unless someone is fully certain it will not introduce stability issues, @mmingorance-dh should either switch to a 1.14 or higher version cluster or take the responsibility for "please double check change log." decision ( including that 1.14 was tested only with 1.14 k8s api by the project owner )

IMHO "I can have a PR to add minimum version support in MixedInstancePolicy documentation here." is the real fix.

@chskdh Thanks for the feedback. As the documentation mentions, we don't suggest to use different version.

  1. As user mentioned they are on EKS and EKS doesn't release 1.14 yet. upgrade to 1.14 is not an option.
  2. It's hard to backport the change to 1.13 and if user want to take the risk, they can have a try on the version with feature support.

Based on this, I give the suggestion for a hack. I agree some users will see it, I will clarify it in original comment. Please also help review PR here https://github.com/kubernetes/autoscaler/pull/2248

@Jeffwan @mmingorance-dh I believe this issue can be closed out, yes?

@jaypipes from my side, yes.
We decided not to use launch-template-mixed until we upgrade to Kubernetes 1.14 and therefore use the feature available in CA 1.14

Thanks!

Original request has been supported in #2248. We can close this issue and feel free to reopen it.

/close

@Jeffwan: Closing this issue.

In response to this:

Original request has been supported in #2248. We can close this issue and feel free to reopen it.

/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

All scale up from 0 case in MixedInstancePolicy group will be impacted. If you already have one instance in that ASG. CA won't look at ASG template and will use existing node as a template.

Hi @Jeffwan, can you tell why this was removed from the ScaleUp to the RunOnce?
I found that this is the commit that removed the getNodeInfosForGroups from the ScaleUp function, but it doesn't explain why --> https://github.com/kubernetes/autoscaler/commit/85a83b62bdd19826b20bf9dbb90bffb3005f8346#diff-bc8c6b18c73f226ec22e18c29d96d94c

How a user should handle updates to the launch template (new AMI version, user-data updates, etc...)?

I found this issu when I was trying why Zalando forked CA for their own implementation (see https://github.com/zalando-incubator/autoscaler/blob/zalando-cluster-autoscaler/cluster-autoscaler/ZALANDO_CHANGES.md#more-robust-template-node-generation)

Thanks

Was this page helpful?
0 / 5 - 0 ratings