Terraform-provider-aws: Wait for ALB Target Group health threshold

Created on 13 Jun 2017 · 6Comments · Source: hashicorp/terraform-provider-aws

_This issue was originally opened by @apparentlymart as hashicorp/terraform#11451. It was migrated here as part of the provider split. The original body of the issue is below._

There's a common pattern with autoscaling groups in Terraform of having the aws_autoscaling_group resource wait until a certain number of instances are healthy in an associated ELB before considering the autoscaling group as created.

This pattern doesn't work for the new Application Load Balancer because there's an extra indirection in the form of a target group.

A naive attempt to use the usual pattern with ALB might look like this:

resource "aws_alb" "example" {
  # ...
}

resource "aws_alb_target_group" "example" {
  name = "example-${var.version_id}"
  # ...

  lifecycle {
    create_before_destroy = true
  }
}

resource "aws_autoscaling_group" "example" {
  name = "example-${var.version_id}"
  target_group_arns = ["${aws_alb_target_group.example.arn}"]
  min_elb_capacity = 2
  # ...

  lifecycle {
    create_before_destroy = true
  }
}

resource "aws_alb_listener" "example" {
  load_balancer_arn = "${aws_alb.example.arn}"
  # ...

  default_action {
    target_group_arn = "${aws_alb_target_group.example.arn}"
    type = "forward"
  }

  # Don't update target_group_arn to new target group until
  # the autoscaling group has healthy instances.
  depends_on = ["aws_autoscaling_group.example"]
}

The problem here is the non-obvious dependency cycle that causes a deadlock: the min_elb_capacity attribute on aws_autoscaling_group causes that resource to block completion until it sees instances in the ELB, but the instances can't get into the ELB until the aws_alb_listener resource is updated to include the new target group.

The DescribeTargetHealth operation allows us to ask for the healthcheck status of the members of a target group without first attaching the target group to an ALB.

Terraform could detect that target_group_arns is set on the aws_autoscaling_group resource and prefer to use the ALB DescribeTargetHealth operation instead of the ELB status operation, which would then allow the subsequent aws_alb_listener update to complete only once the target group has become healthy enough to satisfy the constraint.

It seems that we could do this without changing the configuration schema at all, because wait_for_elb_capacity can just be treated as the minimum value of (number of healthy classic ELB instances + number of distinct healthy instances across all ALB target groups), without the user needing to think about whether it's a classic ELB or an ALB that's providing the capacity.

enhancement servicelbv2

Source

hashibot

👍8

Most helpful comment

@pdedmon You have no idea how long I was fighting these weird 504s between deployments! Removing the autoscaling_attachment and recreating the ASG did the trick for good. 200 OKs only from now on! 💯

minac on 7 Mar 2018

👍5 🎉1

All 6 comments

Slight issue...the DescribeTargetHealth operation shows a status of unused until the target group is attached to an ALB.

As a potential workaround, the new target group could be attached using a rule for a host or path that is unexpected to be triggered--then, once the minimum expected healthy count is confirmed, it could replace the default rule.

matt-deboer on 15 Oct 2017

does this mean that this fix was not working? https://github.com/hashicorp/terraform/pull/10243

salvianreynaldi on 30 Nov 2017

@salvianreynaldi I can say for sure that "wait_for_elb_capacity" doesn't work for me.

jcomeaux on 11 Dec 2017

@salvianreynaldi @jcomeaux - That fix works fine for me, but only if I use the target_group_arns attribute on the autoscaling_group resource, rather than using an autoscaling_attachment resource, which results in a short period of downtime.