Terraform-provider-aws: Regression: aws_appautoscaling_policy with multiple step_adjustment blocks fails for scaling down

Created on 26 Oct 2017  ·  13Comments  ·  Source: hashicorp/terraform-provider-aws

Adding 2 or more step_adjustment blocks to the step_scaling_policy_configuration block when scaling down fails. This previous worked in Terraform 0.9.x before the step_scaling_policy_configuration block was added.

Terraform Version

Terraform v0.10.8
aws provider versions tested: 0.1.4, 1.0.0, 1.1.0, 1.2.0

Affected Resource(s)

  • aws_appautoscaling_policy

Terraform Configuration Files

# Scaling up with multiple step_adjustment blocks works fine:
resource "aws_appautoscaling_policy" "add_capacity" {
  name               = "test-${aws_appautoscaling_target.test.id}_upscaling"
  resource_id        = "spot-fleet-request/${aws_spot_fleet_request.test.id}"
  scalable_dimension = "ec2:spot-fleet-request:TargetCapacity"
  service_namespace  = "ec2"

  step_scaling_policy_configuration {
    adjustment_type         = "PercentChangeInCapacity"
    cooldown                = 600
    metric_aggregation_type = "Average"

    step_adjustment {
      metric_interval_upper_bound = 5
      metric_interval_lower_bound = 0
      scaling_adjustment          = 5
    }

    step_adjustment {
      metric_interval_upper_bound = 10
      metric_interval_lower_bound = 5
      scaling_adjustment          = 15
    }

    step_adjustment {
      metric_interval_lower_bound = 10
      scaling_adjustment          = 30
    }
  }
}

# Scaling down with multiple step_adjustment blocks fails:
resource "aws_appautoscaling_policy" "reduce_capacity" {
  name               = "test-${aws_appautoscaling_target.test.id}_downscaling"
  resource_id        = "spot-fleet-request/${aws_spot_fleet_request.test.id}"
  scalable_dimension = "ec2:spot-fleet-request:TargetCapacity"
  service_namespace  = "ec2"

  step_scaling_policy_configuration {
    adjustment_type         = "PercentChangeInCapacity"
    cooldown                = 1800
    metric_aggregation_type = "Average"

    step_adjustment {
      metric_interval_upper_bound = 0
      metric_interval_lower_bound = -45
      scaling_adjustment          = -5
    }

    // step_adjustment {
    //   metric_interval_upper_bound = -45
    //   scaling_adjustment          = -20
    // }
  }
}

Expected Behavior

With one step_adjustment the reduce_capacity policy works fine. When the second step_adjustment block is uncommented in the reduce_capacity block it should still work.

Actual Behavior

Error applying plan:

2 error(s) occurred:

* aws_appautoscaling_policy.reduce_capacity: 1 error(s) occurred:

* aws_appautoscaling_policy.reduce_capacity: Failed to update scaling policy: ValidationException: Both lower and upper bounds of a step adjustment cannot be left unspecified
    status code: 400, request id: xxxx
* aws_appautoscaling_policy.reduce_capacity: 1 error(s) occurred:

* aws_appautoscaling_policy.reduce_capacity: Failed to update scaling policy: ValidationException: Both lower and upper bounds of a step adjustment cannot be left unspecified
    status code: 400, request id: xxxx

terraform plan reports the following diff when uncommenting the second block:

aws_appautoscaling_policy.reduce_capacity: Modifying... (ID: subnet-xxx_spo...yyy-_downscaling)
  step_scaling_policy_configuration.0.step_adjustment.#:                                 "1" => "2"
  step_scaling_policy_configuration.0.step_adjustment.xxx.metric_interval_lower_bound:   "-45" => "-45"
  step_scaling_policy_configuration.0.step_adjustment.xxx.metric_interval_upper_bound:   "0" => "0"
  step_scaling_policy_configuration.0.step_adjustment.xxx.scaling_adjustment:            "-10" => "-10"
  step_scaling_policy_configuration.0.step_adjustment.yyy.metric_interval_lower_bound:   "" => "-1"
  step_scaling_policy_configuration.0.step_adjustment.yyy.metric_interval_upper_bound:   "" => "-45"
  step_scaling_policy_configuration.0.step_adjustment.yyy.scaling_adjustment:            "" => "-20"

My guess is that the issue has something to do with the fact that omitting metric_interval_lower_bound in the second block translates it to a -1. That works fine for the scaling up block (since all values are positive in that case) but fails for scaling down (where all values are negative and it can't determine that it's a special case). I can confirm that the missing metric_interval_upper_bound value in the scaling up policy also translates to -1 in the plan output

Steps to Reproduce

Please list the steps required to reproduce the issue, for example:

  1. terraform apply
bug servicapplicationautoscaling

All 13 comments

Still failing in Terraform 0.11.0 with aws provider version 1.3.1

Still fails in 0.11.1 with provider 1.5

This issue renders service scale-in largely unusable. It appears particularly challenging due to issues mentioned in #5471. There are very few schema.TypeFloat instances in the provider which might be used as examples for how to handle the default/nil case.

Given that any lower/upper bound is valid, picking an arbitrary value outside of the valid range might be an acceptable workaround. The CloudWatch docs on PutMetricData state that:

Values must be in the range of 8.515920e-109 to 1.174271e+108 (Base 10) or 2e-360 to 2e360 (Base 2).

Given that TypeFloat represents float64, and that the math constants are:

        MaxFloat64             = 1.797693134862315708145274237317043567981e+308 // 2**1023 * (2**53 - 1) / 2**52
        SmallestNonzeroFloat64 = 4.940656458412465441765687928682213723651e-324 // 1 / 2**(1023 - 1 + 52)

Selecting an arbitrary pair of values, such as 5e+110 and 5e-110 or something similar, would be a valid float64 but outside of the acceptable range of the CloudWatch API. (5 avoids round-off errors, which would make equality testing easier. Alternatively, the checker could use the CloudWatch limits and avoid an equality test.)

HCL supports parsing scientific notation for floats, so these values could be explicitly specified if someone needed to.

The downside to this approach, aside from requiring magic numbers, is that the defaults would need to be converted to nil before passing to the API.

Actually, it looks like there's a cleaner workaround--essentially reverting to the earlier implementation. The resource_aws_autoscaling_policy implementation works. It uses TypeString rather than TypeFloat for the lower and upper bounds, with no explicit Default. Within expandStepAdjustments, the string is coerced to a float. This code is still present in resource_aws_appautoscaling_policy but it is superseded by the newer float handling case.

@toddlucas I've put together https://github.com/terraform-providers/terraform-provider-aws/pull/3480 and it works as you've described! Thanks for the comment, it would have taken me much longer to fix had you not weighed in!

That's excellent news, @nathanielks! Thanks so much for taking the time to fix it. I haven't been able to get provider cross compiling set up so I was unable to test it.

Would you believe in all this time it never occurred to me to just use the deprecated syntax? Works like a charm until @nathanielks's PR is merged :)

@WolverineFan oh, neat! I tried using strings but it still didn't work for me 🤔 Could you post an example?

Is it possible to use this workaround on ECS services, which don't have an autoscaling_group and I don't believe can have one?

For that matter, this bug has been open for 6 months. Isn't autoscaling considered fairly core functionality? Isn't that a big part of the reason why people move to cloud computing and infrastructure as code? I like to think someone might consider broken autoscaling to be something of a high priority but 6 months with broken autoscaling seems excessive. Is there any workaround that will work with an ECS service or do I just have to manually set all of the autoscaling up in the console?

@sgendler-stem I agree with you. It is major missing functionality. A PR has submitted by @nathanielks five weeks ago and looks great. A number of new test cases were added as well. I don't know how to get HashiCorp to prioritize it.

The fix for this has been merged and will release with version 1.40.0 of the AWS provider, likely middle of next week.

This has been released in version 1.40.0 of the AWS provider. Please see the Terraform documentation on provider versioning or reach out if you need any assistance upgrading.

I'm going to lock this issue because it has been closed for _30 days_ ⏳. This helps our maintainers find and focus on the active issues.

If you feel this issue should be reopened, we encourage creating a new issue linking back to this one for added context. Thanks!

Was this page helpful?
0 / 5 - 0 ratings