Terraform-provider-aws: Inconsistent order of environment variables in aws_ecs_task_definition

Created on 17 Jan 2018 · 25Comments · Source: hashicorp/terraform-provider-aws

Folks,

It seems that order of environment variables is not being preserved. Also, mountPoints and volumesFrom are being showed in the plan, even when they're not defined in TF.

Large number of moving parts like environmental variables make working with big task definitions rather unpleasant :)

Terraform Version

Terraform v0.11.2

provider.aws v1.7.0

Affected Resource(s)

aws_ecs_task_definition

Steps to Reproduce

resource "aws_ecs_task_definition" "service2" {
  family                = "service"
  container_definitions = "${file("service.json")}"
}

service.json:

[{
    "cpu": 10,
    "environment": [
        {
          "name": "APP_VERSION",
          "value": "dev-26758"
        },
        {
          "name": "DATADOG_ENV",
          "value": "ee1-llll-pppppppp"
        },
        {
          "name": "DATADOG_PATCH_MODULES",
          "value": "celery:true,elasticsearch:true,flask:true,httplib:true,mysql:true,redis:true,requests:true"
        },
        {
          "name": "DATADOG_SERVICE_NAME",
          "value": "translation-service"
        },
        {
          "name": "DATADOG_TRACE_AGENT_HOSTNAME",
          "value": "172.17.0.1"
        },
        {
          "name": "DATADOG_TRACE_AGENT_PORT",
          "value": "8126"
        },
        {
          "name": "DATADOG_TRACE_ENABLED",
          "value": "false"
        }
    ],
    "essential": true,
    "image": "wordpress1",
    "memory": 500,
    "name": "wordpress",
    "portMappings": [{
            "containerPort": 80,
            "hostPort": 0,
            "protocol": "tcp"
        },
        {
            "containerPort": 81,
            "hostPort": 0,
            "protocol": "tcp"
        },
        {
            "containerPort": 82,
            "hostPort": 0,
            "protocol": "tcp"
        }
    ]
}]

If I bump APP_VERSION from dev-26758 to dev-26760 I get:

-/+ aws_ecs_task_definition.service2 (new resource required)
      id:                    "service" => <computed> (forces new resource)
      arn:                   "arn:aws:ecs:eu-west-1:000000000000:task-definition/service:14" => <computed>
      container_definitions: "[{\"cpu\":10,\"environment\":[{\"name\":\"DATADOG_ENV\",\"value\":\"ee1-llll-pppppppp\"},{\"name\":\"DATADOG_TRACE_AGENT_HOSTNAME\",\"value\":\"172.17.0.1\"},{\"name\":\"DATADOG_SERVICE_NAME\",\"value\":\"translation-service\"},{\"name\":\"DATADOG_PATCH_MODULES\",\"value\":\"celery:true,elasticsearch:true,flask:true,httplib:true,mysql:true,redis:true,requests:true\"},{\"name\":\"DATADOG_TRACE_AGENT_PORT\",\"value\":\"8126\"},{\"name\":\"DATADOG_TRACE_ENABLED\",\"value\":\"false\"},{\"name\":\"APP_VERSION\",\"value\":\"dev-26758\"}],\"essential\":true,\"image\":\"wordpress1\",\"memory\":500,\"mountPoints\":[],\"name\":\"wordpress\",\"portMappings\":[{\"containerPort\":80,\"hostPort\":0,\"protocol\":\"tcp\"},{\"containerPort\":81,\"hostPort\":0,\"protocol\":\"tcp\"},{\"containerPort\":82,\"hostPort\":0,\"protocol\":\"tcp\"}],\"volumesFrom\":[]}]" => "[{\"cpu\":10,\"environment\":[{\"name\":\"APP_VERSION\",\"value\":\"dev-26760\"},{\"name\":\"DATADOG_ENV\",\"value\":\"ee1-llll-pppppppp\"},{\"name\":\"DATADOG_PATCH_MODULES\",\"value\":\"celery:true,elasticsearch:true,flask:true,httplib:true,mysql:true,redis:true,requests:true\"},{\"name\":\"DATADOG_SERVICE_NAME\",\"value\":\"translation-service\"},{\"name\":\"DATADOG_TRACE_AGENT_HOSTNAME\",\"value\":\"172.17.0.1\"},{\"name\":\"DATADOG_TRACE_AGENT_PORT\",\"value\":\"8126\"},{\"name\":\"DATADOG_TRACE_ENABLED\",\"value\":\"false\"}],\"essential\":true,\"image\":\"wordpress1\",\"memory\":500,\"name\":\"wordpress\",\"portMappings\":[{\"containerPort\":80,\"hostPort\":0,\"protocol\":\"tcp\"},{\"containerPort\":81,\"hostPort\":0,\"protocol\":\"tcp\"},{\"containerPort\":82,\"hostPort\":0,\"protocol\":\"tcp\"}]}]" (forces new resource)
      family:                "service" => "service"
      network_mode:          "" => <computed>
      revision:              "14" => <computed>

or in more human friendly form:

-/+ aws_ecs_task_definition.service2 (new resource required)
    id:                      "service" => "<computed>" (forces new resource)
    arn:                     "arn:aws:ecs:eu-west-1:000000000000:task-definition/service:14" => "<computed>"
    container_definitions:   [
                                {
                                  "cpu": 10,
                                  "environment": [
                                    {
                             +        "name": "APP_VERSION",
                             +        "value": "dev-26760"
                             +      },
                             +      {
                                      "name": "DATADOG_ENV",
                                      "value": "ee1-llll-pppppppp"
                                    },
                                    {
                             -        "name": "DATADOG_TRACE_AGENT_HOSTNAME",
                             -        "value": "172.17.0.1"
                             +        "name": "DATADOG_PATCH_MODULES",
                             +        "value": "celery:true,elasticsearch:true,flask:true,httplib:true,mysql:true,redis:true,requests:true"
                                    },
                                    {
                                      "name": "DATADOG_SERVICE_NAME",
                                      "value": "translation-service"
                                    },
                                    {
                             -        "name": "DATADOG_PATCH_MODULES",
                             -        "value": "celery:true,elasticsearch:true,flask:true,httplib:true,mysql:true,redis:true,requests:true"
                             +        "name": "DATADOG_TRACE_AGENT_HOSTNAME",
                             +        "value": "172.17.0.1"
                                    },
                                    {
                                      "name": "DATADOG_TRACE_AGENT_PORT",
                                      "value": "8126"
                                    },
                                    {
                                      "name": "DATADOG_TRACE_ENABLED",
                                      "value": "false"
                             -      },
                             -      {
                             -        "name": "APP_VERSION",
                             -        "value": "dev-26758"
                                    }
                                  ],
                                  "essential": true,
                                  "image": "wordpress1",
                                  "memory": 500,
                             -    "mountPoints": [
                             -
                             -    ],
                                  "name": "wordpress",
                                  "portMappings": [
                                    {
                                      "containerPort": 80,
                                      "hostPort": 0,
                                    {
                                      "containerPort": 82,
                                      "hostPort": 0,
                                      "protocol": "tcp"
                                    }
                             -    ],
                             -    "volumesFrom": [
                             -
                                  ]
                                }
                              ] (forces new resource)
    network_mode:            "" => "<computed>"
    revision:                "14" => "<computed>"

bug servicecs

Source

s-maj

👍39 😕3

Most helpful comment

@piotrb yours is the only discussion I found of this problem. Just so you know, https://github.com/terraform-providers/terraform-provider-aws/pull/11463 should fix that.

jbergknoff-rival on 3 Jan 2020

👍7 ❤5 🎉4

All 25 comments

It's possible to work around this by making sure your local task definition has the same order as stored on Amazon, which can be queried using the AWS cli (or just copy, paste, and sanitise the json that Terraform returns when planning). The order will remain the same until you add a new variable key-pair. Annoying, but it's the only way to avoid your plans getting junked.

i-ghost on 19 Jan 2018

👀1 👎1

There is nice a nice function https://github.com/terraform-providers/terraform-provider-aws/blob/fae04dfedfbd653d6a0bdbcc5d7c04f3d54e3048/aws/ecs_task_definition_equivalency.go#L56 used to reorder (diff suppression) task definition during diff but it's used only there. All values go raw to the state
https://github.com/terraform-providers/terraform-provider-aws/blob/fae04dfedfbd653d6a0bdbcc5d7c04f3d54e3048/aws/resource_aws_ecs_task_definition.go#L265
https://github.com/terraform-providers/terraform-provider-aws/blob/401dc017401659015fa0afdde5336b891c695fe6/aws/structure.go#L621

@radeksimko added task definition migration from sha to json (thanks million for that) plus further fixes. I was wondering if you could help us again to get this fixed :)

s-maj on 19 Jan 2018

👍1

Hello,

Just wondering if there's any update on this? This is unfortunately causing our ECS Task Definition to be replaced every time despite zero changes.

The ordering once it gets to AWS certainly seems very random!

m13t on 23 May 2019

👍3

Can we please have an update, we are having lots of issues with this bug @s-maj

devopsinfoltd on 4 Jun 2019

🚀4

In the initial report, your example diff output shows:

values that have changed ("APP_VERSION")
values that have not changed, but appear to due to the ordering

Is this bug report related to the display when a value does change, or idempotence of subsequent plan/apply operations?

ctd on 4 Jun 2019

@ctd, the original example isn’t great as some values have changed. However I can confirm that with TF 0.12 and latest AWS provider, the fact that AWS does not store the environment and secret variables in the order provided, results in Terraform wanting to change the order in the Terraform scripts.

m13t on 4 Jun 2019

@m13t thanks for the clarification.

I'm new to this particular code, so please excuse any mistakes in my thinking.

From an initial skim, it looks like the local/remote JSON strings are unmarshaled, a few specific fields are normalised, then the data is marshaled to JSON for comparison. This happens inside the DiffSuppressFunc though, which might lead to some surprising diff output as the remarshalled JSON isn't used for display purposes.

Could you check your output for any other value changes between the local and remote task definition JSON values? I'll take a look at reproducing this issue on my end as well.

ctd on 4 Jun 2019

@ctd, that is more or less my suspicion too. I’ll dig out the most complex task definition we have to see if there are any other fields that have this problem.

I guess if this was a native Terraform resource rather than being serialised to JSON, it would make use of the array item hash to ascertain whether or not it had just moved index rather than thinking it’s been removed and added.

m13t on 4 Jun 2019

So it looks like there's also issues with volumes too:

      - volume {
          - name = "gocd_data" -> null

          - docker_volume_configuration {
              - autoprovision = false -> null
              - driver        = "local" -> null
              - driver_opts   = {
                  - "device" = ":/"
                  - "o"      = "addr=fs-xxxxxxxx.efs.eu-west-2.amazonaws.com,nfsvers=4.1,rsize=1048576,wsize=1048576,hard,timeo=600,retrans=2,noresvport"
                  - "type"   = "nfs"
                } -> null
              - labels        = {} -> null
              - scope         = "task" -> null
            }
        }
      + volume {
          + name = "gocd_data"

          + docker_volume_configuration {
              + autoprovision = false
              + driver        = "local"
              + driver_opts   = {
                  + "device" = ":/"
                  + "o"      = "addr=fs-xxxxxxxx.efs.eu-west-2.amazonaws.com,nfsvers=4.1,rsize=1048576,wsize=1048576,hard,timeo=600,retrans=2,noresvport"
                  + "type"   = "nfs"
                }
              + scope         = "task"
            }
        }
      - volume {
          - name = "gocd_home" -> null

          - docker_volume_configuration {
              - autoprovision = false -> null
              - driver        = "local" -> null
              - driver_opts   = {
                  - "device" = ":/"
                  - "o"      = "addr=fs-xxxxxxxx.efs.eu-west-2.amazonaws.com,nfsvers=4.1,rsize=1048576,wsize=1048576,hard,timeo=600,retrans=2,noresvport"
                  - "type"   = "nfs"
                } -> null
              - labels        = {} -> null
              - scope         = "task" -> null
            }
        }
      + volume {
          + name = "gocd_home"

          + docker_volume_configuration {
              + autoprovision = false
              + driver        = "local"
              + driver_opts   = {
                  + "device" = ":/"
                  + "o"      = "addr=fs-xxxxxxxx.efs.eu-west-2.amazonaws.com,nfsvers=4.1,rsize=1048576,wsize=1048576,hard,timeo=600,retrans=2,noresvport"
                  + "type"   = "nfs"
                }
              + scope         = "task"
            }
        }

However, if there are no other changes the volumes don't show changes either. It appears that if there are other changes then the volumes get added to the diff also. i.e. if the container_definitions have changes, then so does volumes.

m13t on 6 Jun 2019

👍3

I haven't been able to reproduce the idempotency issue using a very minimal ECS stub service (https://gist.github.com/ctd/bfba7b6b9e6c50d4cba0b4fe930597b2).

Are you able to confirm under what circumstances you're seeing a resource update/replacement - e.g. does running two subsequent apply operations result in an update on both? A full output log would be useful as well if you can provide one (obfuscated resource name/IDs are fine, of course).

ctd on 10 Jun 2019

@m13t @s-maj

Just following up, is this still an issue for you? If so, can you provide any more details (such as in my last comment) to help me reproduce the problem?

ctd on 1 Jul 2019

Not saying this is the explanation for this case, but I have found similar problems arise if the terraform task definition contains two environment variables with the same name.

jonseymour on 9 Jul 2019

Suggestion:

Use the DEBUG output from the provider to determine the actual source of your diffs.

tl;dr summary:

Inconsistent order between environment objects is highly unlikely to be the cause of resource-level diffs.

diffs shown in plan output are not canonicalized.
Therefore, there can be visual differences in their element ordering.

my problem/fix summary:

I lacked default values for my healthcheck:s.
Adding the following values resolved my persistent diffs issue:

    interval: 30
    retries: 3
    timeout: 5

I'd suggest these missing values should not have caused diffs.
Perhaps there's an existing feature request?
If someone knows of one, please mention/link here.

Debug journal:

terraform: v0.11.14 with plugin.terraform-provider-aws_v2.20.0_x4:.

My first guess was that I had diffs because the value of this field is an opaque JSON string, and differences in object sort returned by the AWS API would be the cause of diffs.

Turns out, there's an entire go file dedicated to comparing ECS task definitions:

https://github.com/terraform-providers/terraform-provider-aws/blob/v2.20.0/aws/ecs_task_definition_equivalency.go#L79-L82

That looks good, and the tests look good; it appears to effectively test that un-ordered environment objects compare as "equal to" ordered environment objects.

The tested function is actually being called, so it's active code:

https://github.com/terraform-providers/terraform-provider-aws/blob/v2.20.0/aws/resource_aws_ecs_task_definition.go#L81

I'm now convinced object order is not my problem.

Now I want to see the canonicalized JSON and check it for diffs.

Checking TF_LOG=DEBUG output, I was able to see the canonicalized First: and Second: DEBUG values and was able to compare them for actual diffs.

No surprise, there were actual diffs.

My issue was due to not including any values for a few healthcheck: defaults which the AWS API happily inserts.

tamsky on 24 Jul 2019

👍1

Is there a plan to add a feature that lets the user say "Ignore the order of JSON lists for this particular JSON field value", so the AWS ENVIRONMENT JSON list doesn't get detected as a diff when AWS chooses a stupid, arbitrary order that is different from what was submitted by Terraform?

In my case, the ECS tasks are getting recreated every time because AWS re-orders JSON lists unpredictably, but the reordering is static.

It's hard to see what the course of action is from this ticket.

nhooey on 5 Sep 2019

👍1

@nhooey In my last comment, I tried to explain that the order of ENVIRONMENT elements in terraform plan output is misleading.

The aws_provider is 100% properly canonicalizing all ENVIRONMENT variable lists before calculating diffs. I even link to the code where that happens.

At the same time, terraform plan output is not and does not canonicalize that struct before emitting it, giving an appearance of being the source of a diff, when, if only order is different, it is not a diff.

It is my strong opinion that there is something else in your ECS task resource's definition that is different.

Have you checked the provider's output under TF_LOG=DEBUG ?

tamsky on 6 Sep 2019

👍3

@tamsky is correct. However, for clarity, this should still be considered a bug since it renders a confusing diff - as evidenced by this thread. :)

internetstaff on 18 Sep 2019

👍4

@tamsky, @internetstaff:

I haven't reviewed the debug output, but in my case I kept re-ordering the environment JSON list items in my resource "aws_ecs_task_definition" until running terraform plan said that the aws_ecs_task_definition didn't have to be replaced anymore.

All I changed was the ordering of the JSON list items. Once they were in some _magic_ order, Terraform no longer wanted to replace the ECS task.

That seems like proof that Terraform is detecting and applying a diff when there conceptually isn't one.

nhooey on 19 Sep 2019

@nhooey Do you have any steps to reliably reproduce the issue you've described?

I either need a reliable way to reproduce (I've tried without success), or debug output of the issue occurring to have a better idea of what's going on.

As mentioned, there _is_ a bug in the Terraform AWS Provider that may give deceptive output about the detected resource changes in a plan - at least until that is fixed we need to rely on the debugging output for determining if other issues exist.

ctd on 19 Sep 2019

same issue here, ordering of the environment variables is different everytime

e-moshaya on 2 Oct 2019

It's far from ideal but with the help of terraform 0.12 you can re-order the environment variables to match the order given by AWS.

create a Tf map with your env, in any order (because a map doesn't have an order)

locals {
  unordered_env  = {
      ENV_A = "${var.my_var}"
      ENV_1 = "value_for_1"
    }
}

get the order from AWS, for a given task-definition run the following query

aws ecs describe-task-definition --task-definition $TASK_DEF | jq '[ .taskDefinition.containerDefinitions[0].environment[] | .name]'

copy the result into

locals {
   aws_order = [ .... ]
}

and the magic line which generate a list of maps in the right order

locals {
    environments_variables = [for order in local.aws_order : map(order, local.unordered_env[order])]
}

then use that list to generate your task definition, the list looks like this

environments_variables = [
    {
      "ENV_A" = "."............
    },
    {
      "ENV_1" = "................."
    },
]

(optional) it can be feed to the 0.11 old school template_file like this to to generate the json env list

data "template_file" "environments_variables" {
  template = " { \"name\" : \"$${name}\", \"value\": \"$${value}\" }"
  count    = length(var.environments_variables)
  vars = {
    value = "${element(values(var.environments_variables[count.index]), 0)}"
    name  = "${element(keys(var.environments_variables[count.index]), 0)}"
  }
}

data "template_file" "environments_variables_json" {
  template = "[  $${value}  ]"
  vars = {
    value = "${join(",", data.template_file.environments_variables.*.rendered)}"
  }
}

it doesn't work with 0.11 because the for each interpolation syntax is from 0.12 only

you need to re-run the aws cli command and update the aws_order list each time you add/remove an env

remipichon on 22 Oct 2019

❤4

This seems to have generally improved in the latest versions of the provider and TF itself .. but some improvements would be great ..

It doesn't seem like the ordering causes the task definitions to need to be updated any more ..

But when there is changes the diff is very nonsensical since it seems to be diffing against the raw order.

Could the provider not manage the order of the fields as they're coming from aws and when they're persisted to state? This would then just make the diff alphabetical and make the whole thing much more pleasant .. right now its a pain to actually analyze the changes in the variables ..

                  ~ environment       = [
                      - {
                          - name  = "AWS_REGION"
                          - value = "..."
                        },
                      - {
                          - name  = "PORT"
                          - value = "8001"
                        },
                      - {
                          - name  = "POD_SERVICE"
                          - value = "..."
                        },
                      - {
                          - name  = "POD_NAME"
                          - value = "..."
                        },
                        {
                            name  = "APP_NAME"
                            value = "..."
                        },
                      ~ {
                          ~ name  = "POD_ENVIRONMENT" -> "AWS_REGION"
                          ~ value = "..." -> "..."
                        },
                        {
                            name  = "LANG"
                            value = "en_US.UTF-8"
                        },
                      + {
                          + name  = "POD_ENVIRONMENT"
                          + value = "..."
                        },
                      + {
                          + name  = "POD_NAME"
                          + value = "..."
                        },
                      + {
                          + name  = "POD_SERVICE"
                          + value = "..."
                        },
                      + {
                          + name  = "PORT"
                          + value = "..."
                        },
                        {
                            name  = "SSM_PATHS"
                            value = "..."
                        },
                    ]

What was my actual change here? NONE .. I didn't add any env .. I made other changes on the task definition ...

piotrb on 21 Nov 2019

👍8

@piotrb yours is the only discussion I found of this problem. Just so you know, https://github.com/terraform-providers/terraform-provider-aws/pull/11463 should fix that.

jbergknoff-rival on 3 Jan 2020

👍7 ❤5 🎉4

Improvements to this difference handling have been merged and will release with version 2.68.0 of the Terraform AWS Provider, later this week. Thanks to @jbergknoff-rival for the implementation. 👍

bflad on 24 Jun 2020

❤1

This has been released in version 2.68.0 of the Terraform AWS provider. Please see the Terraform documentation on provider versioning or reach out if you need any assistance upgrading.

For further feature requests or bug reports with this functionality, please create a new GitHub issue following the template for triage. Thanks!

hashibot[bot] on 26 Jun 2020

I'm going to lock this issue because it has been closed for _30 days_ ⏳. This helps our maintainers find and focus on the active issues.

If you feel this issue should be reopened, we encourage creating a new issue linking back to this one for added context. Thanks!

hashibot[bot] on 24 Jul 2020

Was this page helpful?

0 / 5 - 0 ratings

Related issues

aws_elasticsearch_domain: vpc_options - subnet_ids asks for list but can only be a single value.

ccslamstack · 3Comments

Failed to destroy AWS node with volume: * aws_volume_attachment.jenkins_disk_attachment: Error waiting for Volume (vol-XXXX) to detach from Instance: i-XXXXX

hashibot · 3Comments

AWS CodeBuild using environment variables from EC2 Parameter Store

blaltarriba · 3Comments

AWS ECS services are force updated on each apply

EmmN · 3Comments

S3 Lifecycle Rule Unnecessarily Updates on Each Terraform Apply

hashibot · 3Comments