terraform-plan doesn't properly detect deletion of ecs_service

Created on 21 Oct 2015 · 14Comments · Source: hashicorp/terraform

I manually removed an ECS service definition via the AWS console.

I ran terraform plan and got:

~ aws_ecs_service.registry
    desired_count: "0" => "1"

What I expected was a new service to be created.

After running apply (just for kicks):

aws_ecs_service.registry: Modifying...
  desired_count: "0" => "1"
Error applying plan:

1 error(s) occurred:

* aws_ecs_service.registry: ServiceNotActiveException: Service was not ACTIVE.
    status code: 400, request id: aea58c2e-7811-11e5-b8d8-a5d3e2f4e7bb

bug provideaws

Source

davedash

👍1

Most helpful comment

I was able to solve the inactive task definition issue with the example in the ECS task definition data source. You set up the ECS service resource to use the the max revision of either what your Terraform resource has created, or what is in the AWS console which the data source retrieves.

The one downside to this is if someone changes the task definition, Terraform will not realign that to what's defined in code.

dmikalova on 22 Oct 2017

👍2

All 14 comments

Hey @davedash – do you have any configuration that demonstrates this? Was it a Task Definition that you removed? Or the actual thing that aws_ecs_service.registry refers too (instead of references). Apologies if I'm mixing terms, I'm not too fluent in ECS :smile:

catsby on 27 Oct 2015

Hi @catsby I really have to tag my repo when I file a bug next time ;)

I removed on the AWS console the actual ECS Service, not the task definition.

davedash on 27 Oct 2015

I had the same problem (deleted the service, terraform got confused) - I created a stub service with the same name, did a terraform refresh, and then terraform apply and I think I'm operational again.

bwalding on 28 Oct 2015

I just reproduced this:

resource "aws_ecs_cluster" "sleep" {
  name = "helloworld-del-test"
}

resource "aws_ecs_task_definition" "sleep" {
  family = "tf-helloworld-del-test"
  container_definitions = <<TASK_DEFINITION
[
  {
    "name": "sleep",
    "image": "busybox",
    "cpu": 10,
    "command": ["sleep","360"],
    "memory": 10,
    "essential": true
  }
]
TASK_DEFINITION
}

resource "aws_ecs_service" "sleep" {
  name = "sleep"
  cluster = "${aws_ecs_cluster.sleep.id}"
  task_definition = "${aws_ecs_task_definition.sleep.arn}"
  desired_count = 1
}

$ aws ecs update-service --service arn:aws:ecs:us-west-2:12060895217:service/sleep --cluster helloworld-del-test --desired-count 0
$ aws ecs delete-service --service arn:aws:ecs:us-west-2:12060895217:service/sleep --cluster helloworld-del-test

The solution is to treat existing ECS service with state INACTIVE as deleted (non-existing) service. I will send a patch for this.

Here's a recap of my chat session with AWS support which helped me understand better how this workflow works:

When you delete a service in ECS it will mark the service as INACTIVE and clean up the events.
Since ECS mark the service as INACTIVE instead of deleting it, you should treat services in INACTIVE state as deleted.

radeksimko on 9 Nov 2015

See #3828

radeksimko on 9 Nov 2015

Hello,

I believe this isn't entirely fixed, or I'm missing something.

I have also manually "deleted" (via the AWS Console) a task definition which switched it to "inactive" state.

However, when I try to run Terraform, plan seems to correctly indicate it needs to be created, but apply fails to create it because it's already in "inactive" state. If I understand correctly, the above fix should have made Terraform treat "inactive" task definitions as deleted.

Is this a bug? Am I doing something wrong?

Output:

$ terraform plan

[...]

+ aws_ecs_service.myapp
    cluster:                            "arn:aws:ecs:eu-central-1:XXXXXXXXXX:cluster/default"
    deployment_maximum_percent:         "200"
    deployment_minimum_healthy_percent: "100"
    desired_count:                      "1"
    name:                               "myapp"
    task_definition:                    "arn:aws:ecs:eu-central-1:XXXXXXXXXX:task-definition/myapp:1"


Plan: 1 to add, 0 to change, 0 to destroy.
$ terraform apply

[...]

aws_ecs_service.myapp: Creating...
  cluster:                            "" => "arn:aws:ecs:eu-central-1:XXXXXXXXXX:cluster/default"
  deployment_maximum_percent:         "" => "200"
  deployment_minimum_healthy_percent: "" => "100"
  desired_count:                      "" => "1"
  name:                               "" => "myapp"
  task_definition:                    "" => "arn:aws:ecs:eu-central-1:XXXXXXXXXX:task-definition/myapp:1"
Error applying plan:

1 error(s) occurred:

* aws_ecs_service.myapp: ClientException: TaskDefinition is inactive
  status code: 400, request id: 4a59b26b-08b3-11e7-83cc-1bd0ec181350 "myapp"

Terraform does not automatically rollback in the face of errors.
Instead, your Terraform state file has been partially updated with
any resources that successfully completed. Please address the error
above and apply again to incrementally change your infrastructure.
$

elad on 14 Mar 2017

same as @elad with terraform ver0.99

debu99 on 28 Jun 2017

does anybody know how to fix this?

sunilkumarmohanty on 4 Aug 2017

The one downside to this is if someone changes the task definition, Terraform will not realign that to what's defined in code.

dmikalova on 22 Oct 2017

👍2

still happening
Terraform v0.11.0
provider.aws v1.4.0

FernandoMiguel on 29 Nov 2017

👍1

I've got the same issue using Terraform 0.11.3 AWS provider 1.11.0 and terragrunt 0.14.2, make a static configuration of the ECS services is a shame IMHO, just trying to use the ECS task definition data source is not working (I'm using the count pattern in this cluster too)

Error: Error refreshing state: 1 error(s) occurred:

* module.edge.data.aws_ecs_task_definition.task_definition: 1 error(s) occurred:

* module.edge.data.aws_ecs_task_definition.task_definition[0]: data.aws_ecs_task_definition.task_definition.0: Failed getting task definition ClientException: Unable to describe task definition.
    status code: 400, request id: 160a43b4-25e5-11e8-b560-29264574469c "api-gateway-td-edge-fon-hw-dev"

But taking a look to the state...

[terragrunt] 2018/03/12 12:06:53 Running command: terraform state show module.edge.aws_ecs_task_definition.task_definition[0]
id                         = api-gateway-td-edge-fon-hw-dev
arn                        = arn:aws:ecs:eu-west-1:312497795905:task-definition/api-gateway-td-edge-fon-hw-dev:2
container_definitions      = [{"cpu":0,"environment":[{"name":"JAVA_OPTS","value":"-Xss256k -Xms64m -Xmx256m -XX:+UseG1GC"},{"name":"SPRING_PROFILES_ACTIVE","value":"development"}],"essential":true,"image":"312497795905.dkr.ecr.eu-west-1.amazonaws.com/api-gateway:lastest","logConfiguration":{"logDriver":"awslogs","options":{"awslogs-group":"edge-fon-hw-dev/api-gateway","awslogs-region":"eu-west-1"}},"memory":256,"mountPoints":[],"name":"api-gateway","portMappings":[{"containerPort":9000,"hostPort":9000,"protocol":"tcp"}],"volumesFrom":[]}]
cpu                        =
execution_role_arn         =
family                     = api-gateway-td-edge-fon-hw-dev
memory                     =
network_mode               =
placement_constraints.#    = 0
requires_compatibilities.# = 0
revision                   = 2
task_role_arn              =

jjuarez on 12 Mar 2018

The inactive task definition problem was resolved for me by https://github.com/terraform-providers/terraform-provider-aws/pull/5565

ewilde on 16 Aug 2018

Still happening with Terraform v0.12.9 (installed with Homebrew on Mac OS).

When I manually delete an ECS service in the AWS interface that was created with a aws_ecs_service and aws_ecs_task_definition resource definitions, I get this plan:

  # module.whatever.aws_ecs_service.api will be updated in-place

And then applying that plan renders this error:

module.whatever.aws_ecs_service.api: Modifying... [id=arn:aws:ecs:eu-west-1:774908103135:service/search-cluster/search-api-service]

Error: error updating ECS Service (arn:aws:ecs:eu-west-1:774908103135:service/search-cluster/search-api-service): ServiceNotActiveException: Service was not ACTIVE.
        status code: 400, request id: 34080a32-9955-483d-b99a-4a1413025468

The particular workaround in my case is to just make a plan and apply it again, without having to do anything else.

Someone let me know if you need more context.

nhooey on 24 Sep 2019

I'm going to lock this issue because it has been closed for _30 days_ ⏳. This helps our maintainers find and focus on the active issues.

If you have found a problem that seems similar to this, please open a new issue and complete the issue template so we can capture all the details necessary to investigate further.