Terraform-provider-aws: api_gatewayv2 integration throws 500 unless I detach/reattach

Created on 5 Jun 2020 · 12Comments · Source: hashicorp/terraform-provider-aws

Community Note

Please vote on this issue by adding a 👍 reaction to the original issue to help the community and maintainers prioritize this request
Please do not leave "+1" or other comments that do not add relevant new information or questions, they generate extra noise for issue followers and do not help prioritize the request
If you are interested in working on this issue or have submitted a pull request, please leave a comment

Terraform Version

Using hashicorp/terraform:light (latest) so assume it's 12.26 :)

Affected Resource(s)

aws_apigatewayv2_integration

Terraform Configuration Files

resource "aws_apigatewayv2_api" "sc_api" {
  name          = "lambda-sc-kicker-api"
  protocol_type = "HTTP"
}

# TODO: There is something weird about this integration.
# After this resource is deployed, detach it and then re-attach it in the console and it works fine.
# Must be some kind of missing property.
resource "aws_apigatewayv2_integration" "sc_api_integration" {
  api_id           = aws_apigatewayv2_api.sc_api.id
  integration_type = "AWS_PROXY"

  description            = "Lambda SC Kicker"
  integration_method     = "POST"
  integration_uri        = module.lambda_sc_kicker.lambda_sc_kicker_invoke_arn
  payload_format_version = "2.0"
}

resource "aws_apigatewayv2_route" "sc_api_create_route" {
  api_id    = aws_apigatewayv2_api.sc_api.id
  route_key = "POST /create"
  target    = "integrations/${aws_apigatewayv2_integration.sc_api_integration.id}"
}

resource "aws_apigatewayv2_stage" "sc_api_default_stage" {
  api_id      = aws_apigatewayv2_api.sc_api.id
  name        = "$default"
  auto_deploy = true
}

Debug Output

Panic Output

Expected Behavior

Should have deployed a working API gateway

Actual Behavior

The integration doesn't work properly. When trying to use it, it throws an Internal Server Error 500. If I go to the console and detach the integration, then reattach it, it works fine.

Steps to Reproduce

terraform apply

Important Factoids

Running terraform in a Codebuild job

References

#0000

needs-triage servicapigatewayv2

Source

Vermyndax

👍2

Most helpful comment

@Vermyndax - I have been investigating this issue further in order to find a workaround and I think I have found the problem...

The issue is around Lambda permissions - You will note that if you detach and re-attach the integration, the permissions for your Lambda function with change. Generic code to set the permission is as follows:

resource "aws_lambda_permission" "this" {
  statement_id  = "AllowExecutionFromAPIGateway"
  action        = "lambda:InvokeFunction"
  function_name = aws_lambda_function.this.function_name // add a reference to your function name here
  principal     = "apigateway.amazonaws.com"

  # The /*/*/* part allows invocation from any stage, method and resource path
  # within API Gateway REST API. the last one indicates where to send requests to.
  # see more detail https://docs.aws.amazon.com/lambda/latest/dg/services-apigateway.html
  source_arn = "${aws_apigatewayv2_api. sc_api.execution_arn}/*/*"
}

Suggest that the issue can be closed...

adatoo on 1 Jul 2020

👍2

All 12 comments

@Vermyndax Thanks for raising this; I can't reproduce getting an HTTP 500 when applying the example configuration, but a subsequent terraform plan shows:

$ terraform12 plan
Refreshing Terraform state in-memory prior to plan...
The refreshed state will be used to calculate this plan, but will not be
persisted to local or remote state storage.

aws_apigatewayv2_api.sc_api: Refreshing state... [id=ekmqhtlds3]
aws_apigatewayv2_integration.sc_api_integration: Refreshing state... [id=jhsl38o]
aws_apigatewayv2_stage.sc_api_default_stage: Refreshing state... [id=$default]
aws_apigatewayv2_route.sc_api_create_route: Refreshing state... [id=l00v2m1]

------------------------------------------------------------------------

An execution plan has been generated and is shown below.
Resource actions are indicated with the following symbols:
  ~ update in-place

Terraform will perform the following actions:

  # aws_apigatewayv2_integration.sc_api_integration will be updated in-place
  ~ resource "aws_apigatewayv2_integration" "sc_api_integration" {
        api_id                 = "ekmqhtlds3"
        connection_type        = "INTERNET"
        description            = "Lambda SC Kicker"
        id                     = "jhsl38o"
        integration_method     = "POST"
        integration_type       = "AWS_PROXY"
        integration_uri        = "arn:aws:apigateway:us-west-2:lambda:path/2015-03-31/functions/arn:aws:lambda:us-west-2:123456789012:function:example-apigw/invocations"
      + passthrough_behavior   = "WHEN_NO_MATCH"
        payload_format_version = "2.0"
        request_templates      = {}
        timeout_milliseconds   = 29000
    }

  # aws_apigatewayv2_stage.sc_api_default_stage will be updated in-place
  ~ resource "aws_apigatewayv2_stage" "sc_api_default_stage" {
        api_id          = "ekmqhtlds3"
        arn             = "arn:aws:apigateway:us-west-2::/apis/ekmqhtlds3/stages/$default"
        auto_deploy     = true
      - deployment_id   = "y2dz0a" -> null
        id              = "$default"
        invoke_url      = "https://ekmqhtlds3.execute-api.us-west-2.amazonaws.com/"
        name            = "$default"
        stage_variables = {}
        tags            = {}

        default_route_settings {
            data_trace_enabled       = false
            detailed_metrics_enabled = false
            throttling_burst_limit   = 0
            throttling_rate_limit    = 0
        }
    }

Plan: 0 to add, 2 to change, 0 to destroy.

------------------------------------------------------------------------

Note: You didn't specify an "-out" parameter to save this plan, so Terraform
can't guarantee that exactly these actions will be performed if
"terraform apply" is subsequently run.

The integration passthrough_behavior diff is addressed in https://github.com/terraform-providers/terraform-provider-aws/pull/13062. I'll investigate the stage deployment_id diff.
Which version of the AWS Provider are you using?

The perpetual diff on the stage looks like a bug caused by not handling a calculated deployment_id for auto_deploy stages.

ewbankkit on 5 Jun 2020

👍1

Using AWS provider 2.65. Thanks for checking into this :)

Vermyndax on 7 Jun 2020

We see the same passthrough_behavior "WHEN_NO_MATCH" in every plan

bvaradinov-c on 10 Jun 2020

FYI, I have been doing some heavy modification to the lambda script that required a destroy/recreate. On recreate, the integration says it was attached in the console from the API's viewpoint, but the Lambda console disagreed. I went into the API gateway console and manually detached the integration, then reattached it and everything was fine again.

Vermyndax on 12 Jun 2020

I am having a similar issue. Like @Vermyndax, if I detach and reattach the integration, everything is resolved and works. I also have similar issues with the terraform plan command. The issue with aws_apigatewayv2_integration resource can be resolved by adding the following:

  lifecycle {
    ignore_changes = [passthrough_behavior]
  }

Similar can be applied to the aws_apigatewayv2_stage resource.

adatoo on 30 Jun 2020

👍1

@Vermyndax - I have been investigating this issue further in order to find a workaround and I think I have found the problem...

resource "aws_lambda_permission" "this" {
  statement_id  = "AllowExecutionFromAPIGateway"
  action        = "lambda:InvokeFunction"
  function_name = aws_lambda_function.this.function_name // add a reference to your function name here
  principal     = "apigateway.amazonaws.com"

  # The /*/*/* part allows invocation from any stage, method and resource path
  # within API Gateway REST API. the last one indicates where to send requests to.
  # see more detail https://docs.aws.amazon.com/lambda/latest/dg/services-apigateway.html
  source_arn = "${aws_apigatewayv2_api. sc_api.execution_arn}/*/*"
}

Suggest that the issue can be closed...

adatoo on 1 Jul 2020

👍2

Thanks, @adatoo looks like the workaround did the trick!

Vermyndax on 1 Jul 2020

I added an aws_lambda_permission like that, and my API started working.

Then I removed the permission, reapplied, and it stayed working! In fact, now I cannot figure out how to get it to go back to broken again. The aws_lambda_permission will not go away.

warrenstephens on 21 Jul 2020

Suggest that you delete the API Gateway completely and re-apply the stack. That should clear everything...

adatoo on 21 Jul 2020

👍1

Thanks @adatoo ! That worked, but are you saying it is a permanent clear up, or will glitchy-ness return to this permission for the gatewayv2 situation?

BTW, my short term objective was accomplished, which was adding vital info to the access_log_settings of the stage -- which now looks like this (with the permission error being intentional at this point):

{
    "requestId": "QCv0PiP2oAMEPYw=",
    "ip": "71.69.181.213",
    "requestTime": "21/Jul/2020:21:37:31 +0000",
    "httpMethod": "POST",
    "routeKey": "$default",
    "status": "500",
    "protocol": "HTTP/1.1",
    "integrationErrorMessage": "The IAM role configured on the integration or API Gateway doesn't have permissions to call the integration. Check the permissions and try again.",
    "responseLength": "35"
}

I wish that I had found and added $context.integrationErrorMessage to the access_log_settings when I started with my gatewayv2 exploration.

warrenstephens on 22 Jul 2020

Hi @warrenstephens. I suspect that this issue will persist.

My reasoning is that the permission is implicitly declared when attaching the integration via the AWS Console and I suspect this creates some form of cached copy. I don't understand the inner workings sufficiently to be able to comment.

Whilst deleting and re-creating the API gateway did clear it, I would guess the minimum you would need to do to clear this would be to delete the integration in addition to the permission.

adatoo on 22 Jul 2020

👍1

I'm going to lock this issue because it has been closed for _30 days_ ⏳. This helps our maintainers find and focus on the active issues.

If you feel this issue should be reopened, we encourage creating a new issue linking back to this one for added context. Thanks!