Terraform-provider-aws: ResourceInUse: Service contains registered instances; delete the instances before deleting the service

Created on 16 Jun 2018  路  27Comments  路  Source: hashicorp/terraform-provider-aws

_This issue was originally opened by @tuaris as hashicorp/terraform#18264. It was migrated here as a result of the provider split. The original body of the issue is below._


Terraform Version

0.11.5

Terraform Configuration Files

resource "aws_service_discovery_private_dns_namespace" "app_service" {
    name = "app.service"
    vpc = "vcp-12345678"
}

resource "aws_service_discovery_service" "worker" {
    name = "worker"
    dns_config {
        namespace_id = "${aws_service_discovery_private_dns_namespace.app_service.id}"
        dns_records {
            ttl = 10
            type = "A"
        }
        routing_policy = "MULTIVALUE"
    }

    health_check_custom_config {
        failure_threshold = 1
    }
}

resource "aws_ecs_service" "worker" {
        ...
    service_registries {
        registry_arn = "${aws_service_discovery_service.worker.arn}"
                container_name = "worker"
    }
}

Debug Output

Error: Error applying plan:

1 error(s) occurred:

* aws_service_discovery_service.worker(destroy): 1 error(s) occurred:

* aws_service_discovery_service.worker: ResourceInUse: Service contains registered instances; delete the instances before deleting the service 

Expected Behavior

Removing the resource aws_service_discovery_service.worker should first stop the service aws_ecs_service.worker, then proceed to delete the resource.

Actual Behavior

Process fails with ResourceInUse: Service contains registered instances; delete the instances before deleting the service

Steps to Reproduce

To reproduce the issue, for example:

  1. Use the configuration above
  2. terraform apply
  3. Remove the resource aws_service_discovery_service.worker
  4. terraform apply
enhancement servicservicediscovery

Most helpful comment

In my case there was a modification in service discovery resource and terraform was unable to destroy the old resource. So I have to do it manually.

module.fargate_staging.aws_service_discovery_service.services[75]: Destroying... (ID: srv-jXXXXXXXXXX)

Solution:

Based on the service id, first i have to find the attached instance-id,

- aws servicediscovery list-instances --service-id=srv-jXXXXXXXXXX --region=eu-central-1 --profile=staging

{
    "Instances": [
        {
            "Attributes": {
                "AWS_INSTANCE_IPV4": "172.XX.XXX.XXX",
                "AWS_INIT_HEALTH_STATUS": "HEALTHY",
                "AVAILABILITY_ZONE": "eu-central-1c",
                "REGION": "eu-central-1",
                "ECS_SERVICE_NAME": "abcxyz",
                "ECS_CLUSTER_NAME": "staging",
                "ECS_TASK_DEFINITION_FAMILY": "staging-abcxyz"
            },
            "Id": "337cfbfd-bc9d-4b42-8a10-ABCXYZ913"
        }
    ]
}

Once i have attached instance-id, i have to deregister it before i delete the service.

- aws servicediscovery deregister-instance --service-id=srv-jXXXXXXXXXX --instance-id=337cfbfd-bc9d-4b42-8a10-ABCXYZ913 --region=eu-central-1 --profile=staging

- aws servicediscovery delete-service --id srv-jXXXXXXXXXX --region=eu-central-1 --profile=staging

It looks like terraform needs to fix this bug :)

All 27 comments

Still an issue, in my case it's a bit of a pain to add depends_on as this is abstracted away in a module... so

This does seem to be a nasty bug, because terraform should be able to handle deleting resources in the correct order, but it doesn't seem to be in this case. I'm also having this issue with terraform destroy and aws_service_discovery. Currently manual deletion of AWS resources is required when this error happens.

+1

any workaround to this issue?

I'm having the same issue pradeepbhadani cited. After a "terraform destroy" on an ECS fargate environment, I end up with orphaned DNS records in the service discovery namespace that cannot be deleted manually as they are managed by the service discovery service. Then, because those records are still there, the service discovery namespace cannot be deleted.

This issue is happening to me while running:

  • Terraform v0.11.9
  • provider.aws v1.32.0
  • provider.template v1.0.0

This is requiring AWS Support personnel to go in and delete the orphaned DNS records manually before the Service Discovery namespace can be deleted using AWS CLI.

I've run into this again in another scenario where the namespace wasn't being deleted. A service was being destroyed as part of updating it. I have ended up with an orphaned service discovery operation. An instance was attempted to be registered, but, the underlying ECS service was already destroyed. I'm again left with requiring AWS Support to go in and fix things behind the scenes.

I'm experiencing this, though with custom service instances (not ECS).

Some kind of force_delete attribute on the service might help so that terraform can cycle through and deregister any instances left in the service before attempting to delete the service.

I was able to resolve this by running

aws servicediscovery list-services --region us-west-2

then selecting my service's ID from the list and running.

aws servicediscovery delete-service --id srv-oy************x

In my case there was a modification in service discovery resource and terraform was unable to destroy the old resource. So I have to do it manually.

module.fargate_staging.aws_service_discovery_service.services[75]: Destroying... (ID: srv-jXXXXXXXXXX)

Solution:

Based on the service id, first i have to find the attached instance-id,

- aws servicediscovery list-instances --service-id=srv-jXXXXXXXXXX --region=eu-central-1 --profile=staging

{
    "Instances": [
        {
            "Attributes": {
                "AWS_INSTANCE_IPV4": "172.XX.XXX.XXX",
                "AWS_INIT_HEALTH_STATUS": "HEALTHY",
                "AVAILABILITY_ZONE": "eu-central-1c",
                "REGION": "eu-central-1",
                "ECS_SERVICE_NAME": "abcxyz",
                "ECS_CLUSTER_NAME": "staging",
                "ECS_TASK_DEFINITION_FAMILY": "staging-abcxyz"
            },
            "Id": "337cfbfd-bc9d-4b42-8a10-ABCXYZ913"
        }
    ]
}

Once i have attached instance-id, i have to deregister it before i delete the service.

- aws servicediscovery deregister-instance --service-id=srv-jXXXXXXXXXX --instance-id=337cfbfd-bc9d-4b42-8a10-ABCXYZ913 --region=eu-central-1 --profile=staging

- aws servicediscovery delete-service --id srv-jXXXXXXXXXX --region=eu-central-1 --profile=staging

It looks like terraform needs to fix this bug :)

Removing the resource aws_service_discovery_service.worker should first stop the service aws_ecs_service.worker, then proceed to delete the resource.

This is true if the associated aws_ecs_service resource itself is being removed or replaced.

The issue also occurs more fundamentally when Terraform needs to remove or replace just the aws_service_discovery_service resource itself in isolation - for example if the dns_records.type is subsequently changed. There are no other resource dependencies, however Terraform fails with same error as it does not first remove the existing service discovery instance records:

aws_service_discovery_service.{name}: ResourceInUse: Service contains registered instances; delete the instances before deleting the service 

Until the Terraform AWS provider removes existing service discovery instance records, our options seem limited to manual removal or a destroy time provisioner.

The latter really doesn't sit well with me as it introduces risk, dependencies on the host machine running Terraform having AWS CLI and appropriate privileges - however in a heavily automated CI/CD environment it's perhaps a better interim workaround than random failures and manual intervention.

resource "aws_service_discovery_service" "core" {
  [..]

  /**
   * Workaround to https://github.com/terraform-providers/terraform-provider-aws/issues/4853
   * Terraform does not deregister existing service discovery instance records prior to removing
   * the `aws_service_discovery_service` resource, causing AWS to error with:
   *    ResourceInUse: Service contains registered instances; delete the instances before deleting the service
   */
  provisioner "local-exec" {
    when    = "destroy"
    command = <<EOF_COMMAND
      SERVICE_ID=$(aws servicediscovery list-services --filters '[{"Name":"NAMESPACE_ID","Values":["${var.service_discovery_namespace_id}"]}]' --region ${var.aws_region} \
        --query 'Services[?Name == `${var.service_discovery_name}`].Id' --output text) && \
      aws servicediscovery discover-instances --namespace-name ${var.service_discovery_domain} --service-name ${var.service_discovery_name} \
        --query 'Instances[*].InstanceId | join(`"\n"`, @)' --output text \
      | xargs -I {INSTANCE_ID} aws servicediscovery deregister-instance --service-id $SERVICE_ID --instance-id {INSTANCE_ID} && sleep 5
EOF_COMMAND
  }
}

In my scenario, I have a direct dependency between the ECS service and service-discovery-service (the service references the service discovery service ARN).

In my case, a seemingly simple change to the DNS TTL value in the service discovery caused me to encounter this problem.

Same happens here.

We are hitting the same scenario on services which are already running in production.
The proposed solutions on this issue are restarting your service-discovery instance but I assume this means downtime?

+1

This is a bad bug as it basically makes the provider feature incomplete and broken. IMO the original feature should have never been released if it doesn't handle this scenario.

+1

+1

+1

+1

just stop the task first before delete the service

I was able to resolve this by running

aws servicediscovery list-services --region us-west-2

then selecting my service's ID from the list and running.

aws servicediscovery delete-service --id srv-oy************x

that fixed it for me

+1

Is there a workaround that actually works around?

This has been working great for me:

Add to aws_service_discovery_service resources:

  # Remove after https://github.com/terraform-providers/terraform-provider-aws/issues/4853 is resolved
  provisioner "local-exec" {
    when    = destroy
    command = "${path.module}/servicediscovery-drain.sh ${self.id}"
  }

servicediscovery-drain.sh:

#!/bin/bash

[ $# -ne 1 ] && echo "Usage: $0 <service-id>" && exit 1

serviceId="--service-id=$1"

echo "Draining servicediscovery instances from $1 ..."
ids="$(aws servicediscovery list-instances $serviceId --query 'Instances[].Id' --output text | tr '\t' ' ')"

found=
for id in $ids; do
  if [ -n "$id" ]; then
    echo "Deregistering $1 / $id ..."
    aws servicediscovery deregister-instance $serviceId --instance-id "$id"
    found=1
  fi
done

# Yes, I'm being lazy here...
[ -n "$found" ] && sleep 5 || true

Having the same issue right now.

The same problem :(

Was this page helpful?
0 / 5 - 0 ratings