Terraform-provider-aws: ResourceInUse: Service contains registered instances; delete the instances before deleting the service

Created on 16 Jun 2018 · 27Comments · Source: hashicorp/terraform-provider-aws

_This issue was originally opened by @tuaris as hashicorp/terraform#18264. It was migrated here as a result of the provider split. The original body of the issue is below._

Terraform Version

0.11.5

Terraform Configuration Files

resource "aws_service_discovery_private_dns_namespace" "app_service" {
    name = "app.service"
    vpc = "vcp-12345678"
}

resource "aws_service_discovery_service" "worker" {
    name = "worker"
    dns_config {
        namespace_id = "${aws_service_discovery_private_dns_namespace.app_service.id}"
        dns_records {
            ttl = 10
            type = "A"
        }
        routing_policy = "MULTIVALUE"
    }

    health_check_custom_config {
        failure_threshold = 1
    }
}

resource "aws_ecs_service" "worker" {
        ...
    service_registries {
        registry_arn = "${aws_service_discovery_service.worker.arn}"
                container_name = "worker"
    }
}

Debug Output

Error: Error applying plan:

1 error(s) occurred:

* aws_service_discovery_service.worker(destroy): 1 error(s) occurred:

* aws_service_discovery_service.worker: ResourceInUse: Service contains registered instances; delete the instances before deleting the service

Expected Behavior

Removing the resource aws_service_discovery_service.worker should first stop the service aws_ecs_service.worker, then proceed to delete the resource.

Actual Behavior

Process fails with ResourceInUse: Service contains registered instances; delete the instances before deleting the service

Steps to Reproduce

To reproduce the issue, for example:

Use the configuration above
terraform apply
Remove the resource aws_service_discovery_service.worker
terraform apply

enhancement servicservicediscovery

Source

hashibot[bot]

👍54

Most helpful comment

In my case there was a modification in service discovery resource and terraform was unable to destroy the old resource. So I have to do it manually.

module.fargate_staging.aws_service_discovery_service.services[75]: Destroying... (ID: srv-jXXXXXXXXXX)

Solution:

Based on the service id, first i have to find the attached instance-id,

- aws servicediscovery list-instances --service-id=srv-jXXXXXXXXXX --region=eu-central-1 --profile=staging

{
    "Instances": [
        {
            "Attributes": {
                "AWS_INSTANCE_IPV4": "172.XX.XXX.XXX",
                "AWS_INIT_HEALTH_STATUS": "HEALTHY",
                "AVAILABILITY_ZONE": "eu-central-1c",
                "REGION": "eu-central-1",
                "ECS_SERVICE_NAME": "abcxyz",
                "ECS_CLUSTER_NAME": "staging",
                "ECS_TASK_DEFINITION_FAMILY": "staging-abcxyz"
            },
            "Id": "337cfbfd-bc9d-4b42-8a10-ABCXYZ913"
        }
    ]
}

Once i have attached instance-id, i have to deregister it before i delete the service.

- aws servicediscovery deregister-instance --service-id=srv-jXXXXXXXXXX --instance-id=337cfbfd-bc9d-4b42-8a10-ABCXYZ913 --region=eu-central-1 --profile=staging

- aws servicediscovery delete-service --id srv-jXXXXXXXXXX --region=eu-central-1 --profile=staging

It looks like terraform needs to fix this bug :)

adeelahmadch on 21 Jan 2019

👍8

All 27 comments

Still an issue, in my case it's a bit of a pain to add depends_on as this is abstracted away in a module... so

jackbritchford on 24 Aug 2018

This does seem to be a nasty bug, because terraform should be able to handle deleting resources in the correct order, but it doesn't seem to be in this case. I'm also having this issue with terraform destroy and aws_service_discovery. Currently manual deletion of AWS resources is required when this error happens.

ms14981 on 29 Aug 2018

choeflake on 3 Oct 2018

any workaround to this issue?

pradeepbhadani on 24 Oct 2018

👍1

I'm having the same issue pradeepbhadani cited. After a "terraform destroy" on an ECS fargate environment, I end up with orphaned DNS records in the service discovery namespace that cannot be deleted manually as they are managed by the service discovery service. Then, because those records are still there, the service discovery namespace cannot be deleted.

jasonfissure on 16 Nov 2018

This issue is happening to me while running:

Terraform v0.11.9
provider.aws v1.32.0
provider.template v1.0.0

This is requiring AWS Support personnel to go in and delete the orphaned DNS records manually before the Service Discovery namespace can be deleted using AWS CLI.

jasonfissure on 19 Nov 2018

I've run into this again in another scenario where the namespace wasn't being deleted. A service was being destroyed as part of updating it. I have ended up with an orphaned service discovery operation. An instance was attempted to be registered, but, the underlying ECS service was already destroyed. I'm again left with requiring AWS Support to go in and fix things behind the scenes.

jasonfissure on 30 Nov 2018

I'm experiencing this, though with custom service instances (not ECS).

Some kind of force_delete attribute on the service might help so that terraform can cycle through and deregister any instances left in the service before attempting to delete the service.

alexrudd on 11 Dec 2018

I was able to resolve this by running

aws servicediscovery list-services --region us-west-2

then selecting my service's ID from the list and running.

aws servicediscovery delete-service --id srv-oy************x

abhimanyugupta07 on 18 Jan 2019

👍2

In my case there was a modification in service discovery resource and terraform was unable to destroy the old resource. So I have to do it manually.

module.fargate_staging.aws_service_discovery_service.services[75]: Destroying... (ID: srv-jXXXXXXXXXX)

Solution:

Based on the service id, first i have to find the attached instance-id,

- aws servicediscovery list-instances --service-id=srv-jXXXXXXXXXX --region=eu-central-1 --profile=staging

{
    "Instances": [
        {
            "Attributes": {
                "AWS_INSTANCE_IPV4": "172.XX.XXX.XXX",
                "AWS_INIT_HEALTH_STATUS": "HEALTHY",
                "AVAILABILITY_ZONE": "eu-central-1c",
                "REGION": "eu-central-1",
                "ECS_SERVICE_NAME": "abcxyz",
                "ECS_CLUSTER_NAME": "staging",
                "ECS_TASK_DEFINITION_FAMILY": "staging-abcxyz"
            },
            "Id": "337cfbfd-bc9d-4b42-8a10-ABCXYZ913"
        }
    ]
}

Once i have attached instance-id, i have to deregister it before i delete the service.

- aws servicediscovery deregister-instance --service-id=srv-jXXXXXXXXXX --instance-id=337cfbfd-bc9d-4b42-8a10-ABCXYZ913 --region=eu-central-1 --profile=staging

- aws servicediscovery delete-service --id srv-jXXXXXXXXXX --region=eu-central-1 --profile=staging

It looks like terraform needs to fix this bug :)

adeelahmadch on 21 Jan 2019

👍8

Removing the resource aws_service_discovery_service.worker should first stop the service aws_ecs_service.worker, then proceed to delete the resource.

This is true if the associated aws_ecs_service resource itself is being removed or replaced.

The issue also occurs more fundamentally when Terraform needs to remove or replace just the aws_service_discovery_service resource itself in isolation - for example if the dns_records.type is subsequently changed. There are no other resource dependencies, however Terraform fails with same error as it does not first remove the existing service discovery instance records:

aws_service_discovery_service.{name}: ResourceInUse: Service contains registered instances; delete the instances before deleting the service

Until the Terraform AWS provider removes existing service discovery instance records, our options seem limited to manual removal or a destroy time provisioner.

The latter really doesn't sit well with me as it introduces risk, dependencies on the host machine running Terraform having AWS CLI and appropriate privileges - however in a heavily automated CI/CD environment it's perhaps a better interim workaround than random failures and manual intervention.

resource "aws_service_discovery_service" "core" {
  [..]

  /**
   * Workaround to https://github.com/terraform-providers/terraform-provider-aws/issues/4853
   * Terraform does not deregister existing service discovery instance records prior to removing
   * the `aws_service_discovery_service` resource, causing AWS to error with:
   *    ResourceInUse: Service contains registered instances; delete the instances before deleting the service
   */
  provisioner "local-exec" {
    when    = "destroy"
    command = <<EOF_COMMAND
      SERVICE_ID=$(aws servicediscovery list-services --filters '[{"Name":"NAMESPACE_ID","Values":["${var.service_discovery_namespace_id}"]}]' --region ${var.aws_region} \
        --query 'Services[?Name == `${var.service_discovery_name}`].Id' --output text) && \
      aws servicediscovery discover-instances --namespace-name ${var.service_discovery_domain} --service-name ${var.service_discovery_name} \
        --query 'Instances[*].InstanceId | join(`"\n"`, @)' --output text \
      | xargs -I {INSTANCE_ID} aws servicediscovery deregister-instance --service-id $SERVICE_ID --instance-id {INSTANCE_ID} && sleep 5
EOF_COMMAND
  }
}

dotjim on 24 Jan 2019

In my scenario, I have a direct dependency between the ECS service and service-discovery-service (the service references the service discovery service ARN).

In my case, a seemingly simple change to the DNS TTL value in the service discovery caused me to encounter this problem.

richardj-bsquare on 26 Feb 2019

👍1

Same happens here.

iTaybb on 12 Mar 2019

We are hitting the same scenario on services which are already running in production.
The proposed solutions on this issue are restarting your service-discovery instance but I assume this means downtime?

milanvdm on 21 Jun 2019

sarjuymd on 10 Jul 2019

This is a bad bug as it basically makes the provider feature incomplete and broken. IMO the original feature should have never been released if it doesn't handle this scenario.

eedwards-sk on 12 Sep 2019

ihakimi on 26 Sep 2019

xiang-chen-0 on 15 Oct 2019

dggmsa on 28 Oct 2019

araddas on 1 Nov 2019

just stop the task first before delete the service

hvar90 on 11 Dec 2019

I was able to resolve this by running

aws servicediscovery list-services --region us-west-2

then selecting my service's ID from the list and running.

aws servicediscovery delete-service --id srv-oy************x

that fixed it for me

le-garden-fox on 12 Dec 2019

MooreDerek on 17 Feb 2020

Is there a workaround that actually works around?

binarymist on 3 Jun 2020

This has been working great for me:

Add to aws_service_discovery_service resources:

  # Remove after https://github.com/terraform-providers/terraform-provider-aws/issues/4853 is resolved
  provisioner "local-exec" {
    when    = destroy
    command = "${path.module}/servicediscovery-drain.sh ${self.id}"
  }

servicediscovery-drain.sh:

#!/bin/bash

[ $# -ne 1 ] && echo "Usage: $0 <service-id>" && exit 1

serviceId="--service-id=$1"

echo "Draining servicediscovery instances from $1 ..."
ids="$(aws servicediscovery list-instances $serviceId --query 'Instances[].Id' --output text | tr '\t' ' ')"

found=
for id in $ids; do
  if [ -n "$id" ]; then
    echo "Deregistering $1 / $id ..."
    aws servicediscovery deregister-instance $serviceId --instance-id "$id"
    found=1
  fi
done

# Yes, I'm being lazy here...
[ -n "$found" ] && sleep 5 || true