Terraform-provider-aws: Terraform fails to destroy autoscaling group if scale in protection is enabled

Created on 20 Jul 2018  路  9Comments  路  Source: hashicorp/terraform-provider-aws

_This issue was originally opened by @erikselin as hashicorp/terraform#18507. It was migrated here as a result of the provider split. The original body of the issue is below._


Terraform Version

0.11.7

Terraform Configuration Files

resource "aws_autoscaling_group" "foobar" {
  ...
  protect_from_scale_in = true
}

Crash Output

...
aws_autoscaling_group.foobar: Still destroying... (ID: foobar, 8m50s elapsed)
aws_autoscaling_group.foobar: Still destroying... (ID: foobar, 9m0s elapsed)
aws_autoscaling_group.foobar: Still destroying... (ID: foobar, 9m10s elapsed)
aws_autoscaling_group.foobar: Still destroying... (ID: foobar, 9m20s elapsed)
aws_autoscaling_group.foobar: Still destroying... (ID: foobar, 9m30s elapsed)
aws_autoscaling_group.foobar: Still destroying... (ID: foobar, 9m40s elapsed)
aws_autoscaling_group.foobar: Still destroying... (ID: foobar, 9m50s elapsed)
aws_autoscaling_group.foobar: Still destroying... (ID: foobar, 10m0s elapsed)

Error: Error applying plan:

1 error(s) occurred:

* aws_autoscaling_group.foobar (destroy): 1 error(s) occurred:

* aws_autoscaling_group.foobar: group still has 7 instances

Terraform does not automatically rollback in the face of errors.
Instead, your Terraform state file has been partially updated with
any resources that successfully completed. Please address the error
above and apply again to incrementally change your infrastructure.

Expected Behavior

Terraform should have terminated the instances associated with the aws_autoscaling_group and then destroyed the aws_autoscaling_group.

Actual Behavior

  1. Terraform sets aws_autoscaling_group min, max and desired instance count to 0.
  2. Terraform waits for the aws_autoscaling_group instances to terminate.
  3. No instance terminates because protect_from_scale_in = true.
  4. Terraform errors due to timeout.

Steps to Reproduce

  1. Add aws_autoscaling_group to Terraform with protect_from_scale_in = true.
  2. Apply and ensure aws_autoscaling_group has at least one instance.
  3. Attempt to remove aws_autoscaling_group from Terraform.
enhancement servicautoscaling

Most helpful comment

This issue is also being hit in the EKS module: https://github.com/terraform-aws-modules/terraform-aws-eks/issues/176 and I am currently thinking of workarounds.

The idea I had was to use a local-exec or remote-exec provisioner with when = "destroy" that deletes the remaining instances/ disables scale-in protection on the ASGs. Does anybody have any better ideas about mitigating this? I feel like I am missing something.

All 9 comments

@erikselin Does force_delete = true help solve this for you?

Hi @erikselin 馃憢

Given that Terraform is designed to be declarative, it seems like the behavior you're expecting conflicts with itself:

protect_from_scale_in = true
Terraform should have terminated the instances associated with the aws_autoscaling_group and then destroyed the aws_autoscaling_group.

Since protect_from_scale_in is an API-provided method of ensuring instances are not destroyed unexpectedly, I would personally disagree with deleting instances when that parameter is enabled and instead recommend disabling it first before destroying the Terraform resource.

As @tomelliff mentioned above, force_delete might be an option in your scenario, but it can leave dangling resources:

force_delete - (Optional) Allows deleting the autoscaling group without waiting for all instances in the pool to terminate. You can force an autoscaling group to delete even if it's in the process of scaling a resource. Normally, Terraform drains all the instances before deleting the group. This bypasses that behavior and potentially leaves resources dangling.

Perhaps a better ask here then may be to add logic into the Terraform resource that errors out immediately if protect_from_scale_in is enabled? What do you think?

@bflad what about setting NewInstancesProtectedFromScaleIn to false when destroying?

It does feel a bit odd but if people want to protect Terraform from destroying the ASG then I think they should really be using prevent_destroy.

Also I'm a little unsure about that comment in the docs on force_delete as the comment in the SDK says:

// Specifies that the group will be deleted along with all instances associated
// with the group, without waiting for all instances to be terminated. This
// parameter also deletes any lifecycle actions associated with the group.

which should mean that all resources are cleaned up properly. It could be that that's not actually the case though :/

@bflad I think the error logic might actually be a great solution here. If I had received an actionable error message explaining the issue instead of a timeout error after 10 minutes I don't think I would have opened an issue or consider this a bug :)

@tomelliff is correct in my opinion. Scale in protection is different from "I don't need this anymore, please delete it". In the absence of literally anything else, an error would be helpful, but that requires human intervention to remove scale in protection, which negates the entire point of terraform. prevent_destroy seems like the most correct path forward to me.

This issue is also being hit in the EKS module: https://github.com/terraform-aws-modules/terraform-aws-eks/issues/176 and I am currently thinking of workarounds.

The idea I had was to use a local-exec or remote-exec provisioner with when = "destroy" that deletes the remaining instances/ disables scale-in protection on the ASGs. Does anybody have any better ideas about mitigating this? I feel like I am missing something.

any update on this? I'm still hitting this issue with terraform 0.12.28 and aws provider 2.70.

Seems the issue is still not resolved. Please let me know if i am missing anything here. This is kind of pain on destroying the ASG by using terraform destroy :(

module.nlb-asg.module.asg.aws_autoscaling_group.asg[0]: Still destroying... [id=SYDMED-DEV1-asg, 9m20s elapsed]
module.nlb-asg.module.asg.aws_autoscaling_group.asg[0]: Still destroying... [id=SYDMED-DEV1-asg, 9m30s elapsed]
module.nlb-asg.module.asg.aws_autoscaling_group.asg[0]: Still destroying... [id=SYDMED-DEV1-asg, 9m40s elapsed]
module.nlb-asg.module.asg.aws_autoscaling_group.asg[0]: Still destroying... [id=SYDMED-DEV1-asg, 9m50s elapsed]
module.nlb-asg.module.asg.aws_autoscaling_group.asg[0]: Still destroying... [id=SYDMED-DEV1-asg, 10m0s elapsed]

Error: Error draining autoscaling group: Group still has 1 instances

For what it is worth, I am still facing this issue on

Terraform Version

terraform --version
Terraform v0.13.5

In particular, it looks like even with both force_delete=true and protect_from_scale_in=false (or any of their 4 combinations, for that matter), terraform isn't properly terraform destroying:

resource "aws_autoscaling_group" "ecs_cluster" {
  # other settings...
  # (force_delete) NOTE! see https://registry.terraform.io/providers/hashicorp/aws/latest/docs/resources/autoscaling_group#force_delete
  force_delete = true
  protect_from_scale_in = false
}

Related Issues

A possible related issue seems to be that the force_delete flag might not be properly set? This seems unlikely and I have no logs for it.

Workaround

Using aws cli, I can forcibly terminate the auto-scaling group, in which case, the terraform destroy properly cleans up remaining instance - I am aware that force_delete and doing it via aws cli could leave dangling resources, however this seems to be the only way I can reliably get terraform destroy to work.

Was this page helpful?
0 / 5 - 0 ratings