Terraform-provider-aws: aws_api_gateway_deployment doesn't get updated after changes

Created on 13 Jun 2017 · 18Comments · Source: hashicorp/terraform-provider-aws

_This issue was originally opened by @blalor as hashicorp/terraform#6613. It was migrated here as part of the provider split. The original body of the issue is below._

aws_api_gateway_deployment doesn't get updated after a dependent resource changes. For example, if I change a aws_api_gateway_integration resource to modify the request_template property, aws_api_gateway_deployment should be triggered. Since that doesn't happen, the stage specified in the deployment continues to use the old configuration. depends_on doesn't seem to be sufficient; I tried capturing that dependency and it didn't work, and as I understand it, depends_on only captures ordering.

A workaround is to taint the aws_api_gateway_deployment resource after a successful apply and re-running apply.

bug servicapigateway upstream-terraform

Source

hashibot

👍109 😕13

Most helpful comment

Sharing our workaround for this issue. Initially we tried hashing the whole API definition file but then we ran into the issue that changes to variable values did not trigger a new deployment. For example, we had a lambda integration where the function name changed. The aws_api_gateway_integration resource got updated correctly with the new invocation ARN but a new aws_api_gateway_deployment was not created since the API tf file had not changed. The same would go for any parametrised value that the API uses.

Currently we follow the pattern below for our APIs. Here we build a hash from the full JSON representations of all the resources that would affect the deployment, which means resource, methods, integrations, the various response types and the models. Maybe there are more types that need to go in here, but these are the ones we have found necessary.

We need to keep this list updated when we add resources and that opens up for mistakes, but this is the approach that gives us new deployments at the right time without creating them unnecessarily.

resource "aws_api_gateway_deployment" "demo" {
  rest_api_id = aws_api_gateway_rest_api.demo.id

  variables = {
    // For new changes to the API to be correctly deployed, they need to
    // be detected by terraform as a trigger to recreate the aws_api_gateway_deployment.
    // This is because AWS keeps a "working copy" of the API resources which does not
    // go live until a new aws_api_gateway_deployment is created.
    // Here we use a dummy stage variable to force a new aws_api_gateway_deployment.
    // We want it to detect if any of the API-defining resources have changed so we
    // hash all of their configurations.
    // IMPORTANT: This list must include all API resources that define the "content" of
    // the rest API. That means anything except for aws_api_gateway_rest_api,
    // aws_api_gateway_stage, aws_api_gateway_base_path_mapping, that are higher-level
    // resources. Any change to a part of the API not included in this list might not
    // trigger creation of a new aws_api_gateway_deployment and thus not fully deployed.
    trigger_hash = sha1(join(",", [
      jsonencode(aws_api_gateway_resource.demo),
      jsonencode(aws_api_gateway_method.demo_get),
      jsonencode(aws_api_gateway_integration.demo_get),
      jsonencode(aws_api_gateway_integration_response.demo_get_200),
      jsonencode(aws_api_gateway_integration_response.demo_get_400),
      jsonencode(aws_api_gateway_integration_response.demo_get_500),
      jsonencode(aws_api_gateway_method_response.demo_get_200),
      jsonencode(aws_api_gateway_method_response.demo_get_400),
      jsonencode(aws_api_gateway_method_response.demo_get_500),
      jsonencode(aws_api_gateway_model.demo_request_body),
      jsonencode(aws_api_gateway_model.demo_api_response),
      jsonencode(aws_api_gateway_model.demo_error_response),
      //
      // Etc. ...
      //
    ]))
  }

  lifecycle {
    create_before_destroy = true
  }
}

staffan-einarsson on 18 Sep 2019

👍11

All 18 comments

+1 :)

andrevdh on 3 Jul 2017

👎9

Hitting this right now as well, took me a long time to realise I had to deploy the API in the console to see all my changes ;)

CumpsD on 18 Feb 2018

👍3

A seemingly decent suggestion from the comments on the original ticket was to add a 'triggers' map on the deployment resource: https://github.com/hashicorp/terraform/issues/6613#issuecomment-252275369 - this seems like maybe the least painful quick fix to the issue.

tabacco on 5 Jun 2018

👍1

Do we have the fix for this now ?

asrivastava2017 on 9 Aug 2018

Sharing this here as well since the original was closed:

Using stage_description seems like a good work-around to solve this issue in the short term.

resource "aws_api_gateway_deployment" "example" {
  rest_api_id = "${aws_api_gateway_rest_api.example.id}"
  stage_name  = "test"

  # Force re-deployments if any dependencies change
  # https://github.com/hashicorp/terraform/issues/6613
  # https://github.com/terraform-providers/terraform-provider-aws/issues/162
  stage_description = <<DESCRIPTION
${aws_api_gateway_resource.example.id}
${aws_api_gateway_method.example.id}
${aws_api_gateway_integration.example.id}
DESCRIPTION

  depends_on = [
    "aws_api_gateway_integration.example",
  ]
}

nguse on 11 Sep 2018

👍4

If I'm understanding correctly, the mentioned "triggers" map idea isn't available to use here - it's just a proposal from one of the terraform core developers for something that _could_ be implement as an interim solution, correct?

Looking at the stage_description approach then, I found that I can't simply rely on all the constituent components' id attributes, as some parameters can be changed in-place, so no id change and so we don't notice the change.

We could simply trigger a new deployment _every time_, e.g. using stage_description = "${timestamp()}", but in my case, this was causing an unrelated issue leaving me wanting to avoid this so I've ended up using a cascading series of hashes to monitor _all_ attributes of the resources that need monitoring. This is working nicely.

man8 on 24 Jan 2019

I'm encountering a similar issue. I've setup the stage variable based on a MD5 Hash of the terraform file but the problem I'm currently running into is that deploying a private API Gateway with a resource policy fails the first time, because the API Gateway gets created first, then the resource policy gets applied, but the API Gateway never gets redeployed after the resource policy is applied. This requires manually deploying the API Gateway the first time it's deployed, which is a lot for us as we are using branch based deployments in our CI/CD pipeline.

marcato15 on 11 Mar 2019

@marcato15 I had some similar issues -- create API Gateway from swagger file, terraform api_gateway resource sets policy, then second run removes policy as it's not defined in swagger file, then third run creates policy again and it goes on and on.. Fixed by adding x-amazon-apigateway-policy to swagger file template, so both swagger file and terraform resource have the same policy configuration. In that way there are no switching off and on, plus API gateway will always be redeployed on policy updates.

aliusmiles on 12 Mar 2019

A workaround we use:

Use a swagger template to create the API body
Use a json template to create the API policy
concatenate the rendered values of the two templates and run the builtin base64sha256() function on the resultant string to create an "API hash"
set that hash as the value of a aws_api_gateway_deployment variables property

The effect is that when the API changes, the deployment is replaced and the stage updated. When the API doesn't change, the deployment doesn't change (a disadvantage of using timestamp() for the deployment variable value)

One more "astuce": use

lifecycle {
    create_before_destroy = true
}

in the aws_api_gateway_deployment resource to avoid "Active stages" errors.

bassmanitram on 21 Mar 2019

👍7 🎉1

A workaround we use:

Use a swagger template to create the API body

Use a json template to create the API policy

concatenate the rendered values of the two templates and run the builtin base64sha256() function on the resultant string to create an "API hash"

set that hash as the value of a aws_api_gateway_deployment variables property

The effect is that when the API changes, the deployment is replaced and the stage updated. When the API doesn't change, the deployment doesn't change (a disadvantage of using timestamp() for the deployment variable value)

One more "astuce": use
lifecycle {
    create_before_destroy = true
}
in the aws_api_gateway_deployment resource to avoid "Active stages" errors.

This WA has serious disadvantages which actually I can't work around yet. Basically I have WAF association with each stage, after recreation of deployment WAF association get lost of course. And there is no chance in terraform to rerun WAF association at same time.
Also I have split terraform Usage Plan creation and API Gateway Deployment in separate modules.
However Usage Plan has stage association, but since there is no dependency between modules both API Gateway deployment and Usage Plan updates run in parallel. Which cause failure of Usage Plan creation since stage might not be present at time of Usage Plan creation.

maxutlvl on 18 Jun 2019

We need to keep this list updated when we add resources and that opens up for mistakes, but this is the approach that gives us new deployments at the right time without creating them unnecessarily.

resource "aws_api_gateway_deployment" "demo" {
  rest_api_id = aws_api_gateway_rest_api.demo.id

  variables = {
    // For new changes to the API to be correctly deployed, they need to
    // be detected by terraform as a trigger to recreate the aws_api_gateway_deployment.
    // This is because AWS keeps a "working copy" of the API resources which does not
    // go live until a new aws_api_gateway_deployment is created.
    // Here we use a dummy stage variable to force a new aws_api_gateway_deployment.
    // We want it to detect if any of the API-defining resources have changed so we
    // hash all of their configurations.
    // IMPORTANT: This list must include all API resources that define the "content" of
    // the rest API. That means anything except for aws_api_gateway_rest_api,
    // aws_api_gateway_stage, aws_api_gateway_base_path_mapping, that are higher-level
    // resources. Any change to a part of the API not included in this list might not
    // trigger creation of a new aws_api_gateway_deployment and thus not fully deployed.
    trigger_hash = sha1(join(",", [
      jsonencode(aws_api_gateway_resource.demo),
      jsonencode(aws_api_gateway_method.demo_get),
      jsonencode(aws_api_gateway_integration.demo_get),
      jsonencode(aws_api_gateway_integration_response.demo_get_200),
      jsonencode(aws_api_gateway_integration_response.demo_get_400),
      jsonencode(aws_api_gateway_integration_response.demo_get_500),
      jsonencode(aws_api_gateway_method_response.demo_get_200),
      jsonencode(aws_api_gateway_method_response.demo_get_400),
      jsonencode(aws_api_gateway_method_response.demo_get_500),
      jsonencode(aws_api_gateway_model.demo_request_body),
      jsonencode(aws_api_gateway_model.demo_api_response),
      jsonencode(aws_api_gateway_model.demo_error_response),
      //
      // Etc. ...
      //
    ]))
  }

  lifecycle {
    create_before_destroy = true
  }
}

staffan-einarsson on 18 Sep 2019

👍11

@staffan-einarsson I tried your workaround, and got two issues. Hope you can help out:

The first apply failed because it seemed like TF/AWS still thought the (old) deployment was in use (I had create_before_destroy = true):

Error: error deleting API Gateway Deployment (h62bbj): BadRequestException: Active stages pointing to this deployment must be moved or deleted
    status code: 400, request id: 0249b4d9-04ef-47fe-943f-056636e2899c

The second apply succeeded but there's my question, is there a way for TF to not delete deployment history, which is a major reason why we keep deployments and stages separate, in order for rollback, etc?

zihaoyu on 21 Oct 2019

Hi @zihaoyu! Thanks for trying it.

The first apply failed because it seemed like TF/AWS still thought the (old) deployment was in use (I had create_before_destroy = true):

I'm not sure if this explains your issue but this might happen if you use the optional stage_name of the aws_api_gateway_deployment resource to set the current active deployment of the stage. We did not use that attribute, but instead used the full aws_api_gateway_stage resource which sets the active deployment.

I think it works like this:

If you do include the aws_api_gateway_stage resource, then you are required to give it an active deployment id. You must have create_before_destroy set to true on the deployment, or you'll get the error that the stage is still associated with the deployment when it tries to update. This happens because we need to create a new deployment, switch the stage over to it, and then delete the old one after it was disassociated from the stage.
If you don't include the aws_api_gateway_stage resource, then you can use the stage_name attribute to make this deployment active in a stage. You must have create_before_destroy set to false (the default) on the deployment, or you'll get the error that the stage is still associated with the deployment when it tries to update. However, by destroying the deployment first and then creating the new one, you leave some time when your stage might not be associated with a deployment at all, which is not zero downtime. In fact, if your new deployment fails to create because of some other error, you might be left with an outage.

We opted for the first option to avoid the risk for downtime, but also because we had other attributes on the aws_api_gateway_stage resource that we wanted to configure. This is why the example has the create_before_destroy = true.

The second apply succeeded but there's my question, is there a way for TF to _not_ delete deployment history, which is a major reason why we keep deployments and stages separate, in order for rollback, etc?

Hmm, I'm thinking this is to be expected behavior given that aws_api_gateway_deployment is a resource and resources are destroyed at the end of their lifecycle. I know that for example AWS CloudFormation has this DeletionPolicy option to retain resources on delete (forget), but I'm not aware of any such thing in terraform.

But if you're already using terraform to manage your infrastructure, I don't see why I would use the deployment switching feature for rollbacks, since you have much more powerful and broad configuration management at your disposal. Just make sure you version your terraform files before applying and if you decide that you want to roll back, re-apply from an earlier version.

staffan-einarsson on 22 Oct 2019

@bassmanitram's solution works just fine when using an OpenAPI Spec file.

However, I couldn't use base64sha256 as it outputs characters that are invalid for the stage's variable values. I used this sha1(file("./api/spec.yaml")) instead, a shorter value and only hex chars

falmar on 25 Dec 2019

👍1

I had some trouble using @staffan-einarsson's solution when encoding the resources into JSON:

aws_api_gateway_resource.example: resource variables must be three parts: TYPE.NAME.ATTR in:

${jsonencode("${aws_api_gateway_resource.projects}")}

Perhaps it was because we're running Terraform 0.11? I'm not sure.

Instead, I'm now hashing the Terraform file and the module's variable values:

locals {
  api_depends_on = [
    "${var.name}",
    "${var.environment}",
    # Etc...
    "${sha1(file("${path.module}/main.tf"))}"
  ]

  trigger_hash = "Terraform hash to trigger deploys: ${sha1(join(",", "${local.api_depends_on}"))}"
}

I'm hoping that this means we'll only need to update api_depends_on when adding module variables, which should be infrequent.

DMeechan on 23 Jan 2020

Hi folks 👋

Since it does not appear there will be functionality added anytime soon in Terraform core to support a form of resource configuration that will automatically triggers resource recreation when referenced resources are updated, the aws_api_gateway_deployment resource has been enhanced with a triggers map argument similar to those utilized by the null, random, and time providers. This can be used by operators to automatically force a new resource (redeployment) using key/value criteria of their choosing. Its usage is fairly advanced, so caveats are added to the documentation. This functionality will release with version 2.61.0 of the Terraform AWS Provider, later next week.

If this type of enhancement does not fit your needs, we would encourage you to file a new issue (potentially upstream in Terraform core since there's not much else we can do at the provider level). Please also note that we do not intend to add this class of argument to all Terraform AWS Provider resources due to its complexity and potentially awkward configuration.