Terraform-provider-aws: Unable to define http/s health checks for Network Loadbalancer

Created on 19 Dec 2017  ·  28Comments  ·  Source: hashicorp/terraform-provider-aws

Hi there,

Thank you for opening an issue. Please note that we try to keep the Terraform issue tracker reserved for bug reports and feature requests. For general usage questions, please see: https://www.terraform.io/community.html.

Terraform Version

Terraform 0.11.1 with AWS Provider 1.6

Affected Resource(s)

Please list the resources as a list, for example:

  • aws_lb_target_group

If this issue appears to affect multiple resources, it may be an issue with Terraform's core, so please mention this.

Terraform Configuration Files

resource "aws_lb_target_group" "tcp" {
 # ...
 protocol    = "TCP"
 # ...
 health_check {
    healthy_threshold   = 2
    unhealthy_threshold = 2
    timeout             = "10"
    port                = "443"
    path                = "/healthz"
    protocol            = "HTTPS"
    interval            = 30
    matcher             = "200-399"
  }
}

Debug Output

Please provider a link to a GitHub Gist containing the complete debug output: https://www.terraform.io/docs/internals/debugging.html. Please do NOT paste the debug output in the issue; just paste a link to the Gist.

Expected Behavior

Setup the expected network loadbalancer like in 1.5

Actual Behavior

* module.lb_internal_master.aws_lb_target_group.tcp: 1 error(s) occurred:

* module.lb_internal_master.aws_lb_target_group.tcp: arn:aws:elasticloadbalancing:eu-central-1:191844718867:targetgroup/openshift-tg-internal-master-443/652998c18664d76d: custom matcher is not supported for target_groups with TCP protocol

Steps to Reproduce

Please list the steps required to reproduce the issue, for example:

  1. terraform apply

Important Factoids

N/A

References

N/A

bug regression

Most helpful comment

Unfortunately problem persist for Terraform v0.11.11

resource "aws_lb_target_group" "hapee_nlb_target" {
  name = "hapee-test-nlb-tg"

  vpc_id = "${aws_vpc.default.id}"

  port     = 443
  protocol = "TCP"

  health_check {
    interval            = 30
    path                = "/haproxy_status"
    port                = 8080
    timeout             = 5
    healthy_threshold   = 3
    unhealthy_threshold = 3
    matcher             = "200,202"
  }
aws_lb_target_group.hapee_nlb_target: Error creating LB Target Group: InvalidConfigurationRequest: Custom health check timeouts are not supported for health checks for target groups with the TCP protocol

All 28 comments

Hi, Same issue here with the path.

Looking the Aws doc http://docs.aws.amazon.com/fr_fr/elasticloadbalancing/latest/APIReference/API_CreateTargetGroup.html

HTTP health check param path and matcher should be available for Network Load Balancers

Might be related to https://github.com/terraform-providers/terraform-provider-aws/commit/a6d126668344bd20c61965e941fae6aaf09cc451

I just found this error myself.

Terraform 0.11.1

  • provider.aws: version = "~> 1.6"

Example code

resource "aws_lb_target_group" "testexternal" {
  name     = "testexternal"
  protocol = "TCP"
  port     = 22
  vpc_id      = "${aws_vpc.bla.id}"

  health_check {
    healthy_threshold   = 2
    unhealthy_threshold = 2
    interval            = 10
  }
}

resource "aws_lb" "testexternal" {
  name                        = "testserver"

  load_balancer_type          = "network"
  internal                    = false
  subnets                     = ["${module.subnet.ELB-subnet-ids}"]
  enable_deletion_protection  = true
}

resource "aws_lb_listener" "testexternal" {
  load_balancer_arn = "${aws_lb.testexternal.arn}"
  protocol          = "TCP"
  port              = "22"

  default_action {
    target_group_arn = "${aws_lb_target_group.testexternal.arn}"
    type             = "forward"
  }
}

resource "aws_lb_target_group_attachment" "testexternal" {
  target_group_arn = "${aws_lb_target_group.testexternal.arn}"
  target_id        = "${aws_instance.bla-002.id}"
  port             = 22
}

I ran into this issue as well.

My theory is that it is checking the rules based on the target group protocol (which in this case is TCP) when it should be checking against the health check protocol (which is HTTPS). Both path and matcher are not applicable to TCP health checks, but they are to HTTP and HTTPS health checks, even if the target group's protocol is TCP.

I might also mention that even version 1.5 had some wonky behavior regarding health checks on network load balancers / TCP target groups. This is because network load balancers have certain restrictions on the health checks that application load balancers do not: The unhealthy threshold must equal the healthy threshold, timeout is fixed to 10 and cannot be changed, interval has only two possible values (10 and 30), and matcher (if applicable) is fixed to 200-399.

Unfortunately, terraform would still try to manage these parameters even if they weren't supplied. This led to errors in some cases. In others, it would work upon creation, but then a subsequent apply would detect changes to those parameters and attempt to fix them, resulting in errors. This made it necessary to specify all the parameters and make sure they were valid for network load balancers. For instance, matcher = "200-399" had to be specified in this case in order to avoid errors, even though matcher is always that value and cannot be changed. However, now in 1.6, terraform won't let you specify matcher. I haven't had a chance to see what happens in 1.6 when you don't specify a matcher, though, because in this case we need to specify a path, and that is not allowed now either. So we have to revert to 1.5 for now to work around this.

In any case, the rules will have to take into account not only the protocol of the target group (which determines whether it's an alb or nlb) but also the protocol of the health check.

Hi all! Sorry for the regression here.

It seems that this is caused by the additional validation checks added in #2380. The goal of these changes was to catch more errors at plan time that were previously only caught at apply time, regarding the various subtle differences between application and network load balancers.

@deftflux is correct that the validation code is checking the target group protocol to recognize if a given target group is an application or network target group, but indeed it does seem like the health check protocol is the correct thing to check for this case to match, per the relevant API documentation which describes this particular property (Matcher) as being for "HTTP/HTTPS health checks" rather than for network load balancers in particular.

It seems that the same bug exists for HealthCheckPath. The docs also seem to disagree with our implementation about the timeout attribute, which we currently seem to permit only for HTTP/HTTPS target groups but the docs suggest it can work for all target groups but has a different range of valid values and different default depending on the healthcheck protocol.

After playing with this some more it seems like the current validation _is_ correct here, per what's enforced by the underlying API. After weakening the check in the provider, I see the following error from the remote API during apply:

InvalidConfigurationRequest: Custom health check matchers are not supported for health checks for target groups with the TCP protocol

The logic I'd implemented -- based on the documentation -- was to allow custom health check matchers if the _healthcheck protocol_ is HTTP, but it seems that there is an undocumented additional restriction that Matcher may not be set for TCP target groups, regardless of the healthcheck protocol.

However, I see that you all saw _something_ working prior to this validation being added, so I'm now trying to figure out what the old implementation (prior to 1.6) was actually doing in this scenario that was allowing it to work.

As far as I can tell, this was only working before because it was totally ignoring these attributes:

https://github.com/terraform-providers/terraform-provider-aws/blob/840a82babd3ef0deed25ca7e06104f998577bbab/aws/resource_aws_lb_target_group.go#L218-L224

So while indeed this wasn't an _error_ before, it seems like it was never actually _working_. In principle we could restore the previous behavior of just silently ignoring these arguments for TCP target groups, but that seems counter to Terraform's usually goal of doing what it says it will do or failing loudly if it can't.

Given that these attributes were not functional before anyway, I'd like to propose that we move forward with these additional checks in place (arguably it was a bug that these checks were not present before) and require removing these previously-non-functional attributes from configuration when upgrading to 1.6 and above. Of course we ideally would've noticed this change in behavior and included it in the 1.6 changelog, which we can do now retroactively although it won't be visible within the v1.6.0 tag's version of the changelog since that is now frozen.

Please let me know if any of you have a use-case where including these arguments even though they are ignored is important; we can then think about how we might strike a compromise to retain the now-more-correct validation while still making those use-cases work.

Sorry for the accidental undocumented compatibility break here! :confounded:

Thanks for looking into this @apparentlymart !

I did a little testing, and there is still a problem with 1.6. Consider this test configuration:

resource "aws_lb_target_group" "foo" {
    name = "tf-nlb-health-check-test"
    protocol = "TCP"
    port = "1234"
    vpc_id = "${local.vpc}"

    health_check {
        protocol = "HTTPS"
        port = 12345
        #path = "/custom/path"
        #matcher = "200-399"
        interval = 30
        #timeout = 10
        healthy_threshold = 3
        unhealthy_threshold = 3
    }
}

The lines that are commented out are the ones that I used in 1.5 but that are considered invalid now in 1.6. In 1.5 with those lines uncommented, it works for both creation and subsequent plan and apply.

With those lines commented out, 1.6 will create the target group successfully. However, subsequent plan or apply will produce the following error:

aws_lb_target_group.foo: Refreshing state... (ID: arn:aws:elasticloadbalancing:us-east-1:...nlb-health-check-test/f37488d894f4b0a6)

Error: Error running plan: 1 error(s) occurred:

* aws_lb_target_group.foo: 1 error(s) occurred:

* aws_lb_target_group.foo: arn:aws:elasticloadbalancing:us-east-1:365567845318:targetgroup/tf-nlb-health-check-test/f37488d894f4b0a6: custom matcher is not supported for target_groups with TCP protocol

Apparently, terraform is detecting a change in the matcher, presumably because normally the default matcher is assumed by terraform to be 200, but for TCP target groups, AWS locks it to 200-399. That's why previously, explicitly specifying the correct default fixed the problem, but that workaround is no longer possible in 1.6. (However, it is strange that this error happens while refreshing the state rather than when applying a change.)

So it looks like these attributes are being correctly ignored when creating a TCP target group, but not when managing an existing TCP target group.

As far as requiring the removal of these attributes, I suppose it is a bit of a bug that we had to explicitly specify them previously. But we could still maintain backwards compatibility if we allow the only valid value to be specified as the commented lines above. I would definitely be in favor of at least fixing it so that we do not have to specify those attributes, however.

I'm currently experiencing the same problem.

I created an aws_lb_target_group in v1.5.0 with matcher = "200-399".
When i upgraded to v1.6.0 i get the following message, even _after_ removing the matcher attribute:

custom matcher is not supported for target_groups with TCP protocol

When i downgrade to v1.5.0 again and remove the matcher attribute i get:

Error modifying Target Group: ValidationError: Health check matcher HTTP code cannot be empty

How is an upgrade suppose to happen with the current implementation?
Currently the only option i have is to stay on v1.5.0 with matcher set.

You are right, @arminbuerkle. Currently in 1.6, there is no way to have an HTTP/S health check for a TCP target group without getting errors on subsequent plan or apply.

Thanks for that extra detail @deftflux, and sorry for the silence while I was on my holiday break.

Indeed it does seem like there is an issue here based on what you described. I suppose what's happening here is that we're reading back some server-provided defaults from the API that are then causing validation to fail.

Probably the best solution for that would be to add some extra checks to the Read implementation to force the relevant attributes to be saved as empty regardless of what the API returns, so that the values in state stay consistent with the empty values we now require in the configuration.

I guess I may have gotten a bit lost in the discussion here, but I'm hitting this issue as well. Via both awscli and the Console UI, I _can_ create an NLB with a HTTP health check against a custom path. Assuming the AWS Console UI is correct: a TCP target group can have a HTTP health check and the configurable properties on it are Path, Port and Healthy Threshold. Unhealthy Threshold, Timeout, Interval and Matcher ("Success Codes" in the UI) are grayed out and fixed values.

Hi @jantman yes, you are correct, you can create TCP Target Groups with HTTP health checks, and Terraform should be letting us too, but isn't right now.

What you can't do it create a TCP Target Group with and TCP health check, and then change the health check to HTTP (likewise for any protocol change even HTTP -> HTTPS). Changing the health check protocol requires destroying and recreating the Target Group.

Likewise changing unhealthy threshold, timeout, interval, and success codes (matcher), can be set on creation, but all require the Target Group to be recreated to change them (which seems harsh!).

Another note about 'interval', I note in the UI that for HTTP/HTTPS health checks you can freely set the interval, but for TCP health check you can only set 10 seconds or 30 seconds. That might be an API restriction too. Though I guess Terraform could leave that validation up to AWS.

This has been released in terraform-provider-aws version 1.7.0. Please see the Terraform documentation on provider versioning or reach out if you need any assistance upgrading.

PR #2906 lets you create HTTP/HTTPS health checks now 🎉, but it doesn't fully resolve validation. The protocol can be changed between HTTP and HTTPS, but changing to or from TCP should trigger a recreation plan for the Target Group.

```

  • module.nlb.aws_lb_target_group.nlb[0]: 1 error(s) occurred:
  • aws_lb_target_group.nlb.0: Error modifying Target Group: InvalidConfigurationRequest: You cannot change the health check protocol for a target group with the TCP protocol
    status code: 400, request id: 1234567-f7db-11e7-8d49-f37d7e7f4cf3
    ````

@whereisaaron I just ran into issue switch from TCP to HTTP/HTTPs (it needs to be re-created and terraform doesn't know that); was a separate issue ever opened to address that?

@iancward no sorry, I don't know any issue for this bug where recreation isn't triggered. I currently have to manually intervene.

@whereisaaron it doesnt let me change to a custom healthcheck with TCP protocol. While it lets me in AWS console. So i dont think this is something rejected by AWS api's

Unfortunately problem persist for Terraform v0.11.11

resource "aws_lb_target_group" "hapee_nlb_target" {
  name = "hapee-test-nlb-tg"

  vpc_id = "${aws_vpc.default.id}"

  port     = 443
  protocol = "TCP"

  health_check {
    interval            = 30
    path                = "/haproxy_status"
    port                = 8080
    timeout             = 5
    healthy_threshold   = 3
    unhealthy_threshold = 3
    matcher             = "200,202"
  }
aws_lb_target_group.hapee_nlb_target: Error creating LB Target Group: InvalidConfigurationRequest: Custom health check timeouts are not supported for health checks for target groups with the TCP protocol

@MichalPloski yes, this looks to still be validated along with other similar parameters: https://github.com/terraform-providers/terraform-provider-aws/blob/d0edc835f07ef347937892b691b9ab0a602b2372/aws/resource_aws_lb_target_group.go#L673

That check technically allows 0 to be set in the case of TCP health check but unfortunately, timeout param fails validation against the allowed range 2-60.

With terraform-provider-aws =1.38.0

Error: aws_lb_target_group.my_tg: expected health_check.0.timeout to be in the range (2 - 60), got 0

Why is this closed? seems to still not be working? Or maybe there is a workaround?

whoops nevermind I can see in the aws UI you actually cant change the timeout at all (greyed out)
image

@red8888 It is still an issue for me because I would like to re-use the aws_lb_target_group resource for both HTTP and TCP protocols (as a module, for example). I should be able to set the health_check timeout argument to 0 or 10 for a TCP TG but with this bug, Terraform will always throw an error when health_check timeout is set when protocol is TCP.

Is there workaround for this? Why this was closed?

I am using provider.aws: version = "~> 2.23", and I am still facing this issue, Let me know how the issue can be resolved or any workaround.

Had the same issue. TF version 0.11.4. Provider 2.27.0. Solved it by setting matcher to 200~399. Anything else would fail with the same error the rest are having.

Had the same issue. TF version 0.11.4. Provider 2.27.0. Solved it by setting matcher to 200~399. Anything else would fail with the same error the rest are having.

I have made the changes which you have suggested, but still facing the same issue. below is NLB target group resource ,

resource "aws_lb_target_group" "nlb_target" {
name = "nlb-target"
port = XXX
protocol = "TCP"
vpc_id = "XXX"

health_check {
healthy_threshold = 3
unhealthy_threshold = 3
timeout = 6
protocol = "HTTPS"
port = XXX
path = "XXX"
interval = 30
matcher = "200,399"
}
}

Let me know if I need to config any additional params?

I have fixed the issue, by setting below values,
timeout = 10
matcher = "200-399"
interval = 30

I'm going to lock this issue because it has been closed for _30 days_ ⏳. This helps our maintainers find and focus on the active issues.

If you feel this issue should be reopened, we encourage creating a new issue linking back to this one for added context. Thanks!

Was this page helpful?
0 / 5 - 0 ratings