_This issue was originally opened by @tuaris as hashicorp/terraform#18264. It was migrated here as a result of the provider split. The original body of the issue is below._
0.11.5
resource "aws_service_discovery_private_dns_namespace" "app_service" {
name = "app.service"
vpc = "vcp-12345678"
}
resource "aws_service_discovery_service" "worker" {
name = "worker"
dns_config {
namespace_id = "${aws_service_discovery_private_dns_namespace.app_service.id}"
dns_records {
ttl = 10
type = "A"
}
routing_policy = "MULTIVALUE"
}
health_check_custom_config {
failure_threshold = 1
}
}
resource "aws_ecs_service" "worker" {
...
service_registries {
registry_arn = "${aws_service_discovery_service.worker.arn}"
container_name = "worker"
}
}
Error: Error applying plan:
1 error(s) occurred:
* aws_service_discovery_service.worker(destroy): 1 error(s) occurred:
* aws_service_discovery_service.worker: ResourceInUse: Service contains registered instances; delete the instances before deleting the service
Removing the resource aws_service_discovery_service.worker should first stop the service aws_ecs_service.worker, then proceed to delete the resource.
Process fails with ResourceInUse: Service contains registered instances; delete the instances before deleting the service
To reproduce the issue, for example:
terraform applyaws_service_discovery_service.workerterraform applyStill an issue, in my case it's a bit of a pain to add depends_on as this is abstracted away in a module... so
This does seem to be a nasty bug, because terraform should be able to handle deleting resources in the correct order, but it doesn't seem to be in this case. I'm also having this issue with terraform destroy and aws_service_discovery. Currently manual deletion of AWS resources is required when this error happens.
+1
any workaround to this issue?
I'm having the same issue pradeepbhadani cited. After a "terraform destroy" on an ECS fargate environment, I end up with orphaned DNS records in the service discovery namespace that cannot be deleted manually as they are managed by the service discovery service. Then, because those records are still there, the service discovery namespace cannot be deleted.
This issue is happening to me while running:
This is requiring AWS Support personnel to go in and delete the orphaned DNS records manually before the Service Discovery namespace can be deleted using AWS CLI.
I've run into this again in another scenario where the namespace wasn't being deleted. A service was being destroyed as part of updating it. I have ended up with an orphaned service discovery operation. An instance was attempted to be registered, but, the underlying ECS service was already destroyed. I'm again left with requiring AWS Support to go in and fix things behind the scenes.
I'm experiencing this, though with custom service instances (not ECS).
Some kind of force_delete attribute on the service might help so that terraform can cycle through and deregister any instances left in the service before attempting to delete the service.
I was able to resolve this by running
aws servicediscovery list-services --region us-west-2
then selecting my service's ID from the list and running.
aws servicediscovery delete-service --id srv-oy************x
In my case there was a modification in service discovery resource and terraform was unable to destroy the old resource. So I have to do it manually.
module.fargate_staging.aws_service_discovery_service.services[75]: Destroying... (ID: srv-jXXXXXXXXXX)
Solution:
Based on the service id, first i have to find the attached instance-id,
- aws servicediscovery list-instances --service-id=srv-jXXXXXXXXXX --region=eu-central-1 --profile=staging
{
"Instances": [
{
"Attributes": {
"AWS_INSTANCE_IPV4": "172.XX.XXX.XXX",
"AWS_INIT_HEALTH_STATUS": "HEALTHY",
"AVAILABILITY_ZONE": "eu-central-1c",
"REGION": "eu-central-1",
"ECS_SERVICE_NAME": "abcxyz",
"ECS_CLUSTER_NAME": "staging",
"ECS_TASK_DEFINITION_FAMILY": "staging-abcxyz"
},
"Id": "337cfbfd-bc9d-4b42-8a10-ABCXYZ913"
}
]
}
Once i have attached instance-id, i have to deregister it before i delete the service.
- aws servicediscovery deregister-instance --service-id=srv-jXXXXXXXXXX --instance-id=337cfbfd-bc9d-4b42-8a10-ABCXYZ913 --region=eu-central-1 --profile=staging
- aws servicediscovery delete-service --id srv-jXXXXXXXXXX --region=eu-central-1 --profile=staging
It looks like terraform needs to fix this bug :)
Removing the resource aws_service_discovery_service.worker should first stop the service aws_ecs_service.worker, then proceed to delete the resource.
This is true if the associated aws_ecs_service resource itself is being removed or replaced.
The issue also occurs more fundamentally when Terraform needs to remove or replace just the aws_service_discovery_service resource itself in isolation - for example if the dns_records.type is subsequently changed. There are no other resource dependencies, however Terraform fails with same error as it does not first remove the existing service discovery instance records:
aws_service_discovery_service.{name}: ResourceInUse: Service contains registered instances; delete the instances before deleting the service
Until the Terraform AWS provider removes existing service discovery instance records, our options seem limited to manual removal or a destroy time provisioner.
The latter really doesn't sit well with me as it introduces risk, dependencies on the host machine running Terraform having AWS CLI and appropriate privileges - however in a heavily automated CI/CD environment it's perhaps a better interim workaround than random failures and manual intervention.
resource "aws_service_discovery_service" "core" {
[..]
/**
* Workaround to https://github.com/terraform-providers/terraform-provider-aws/issues/4853
* Terraform does not deregister existing service discovery instance records prior to removing
* the `aws_service_discovery_service` resource, causing AWS to error with:
* ResourceInUse: Service contains registered instances; delete the instances before deleting the service
*/
provisioner "local-exec" {
when = "destroy"
command = <<EOF_COMMAND
SERVICE_ID=$(aws servicediscovery list-services --filters '[{"Name":"NAMESPACE_ID","Values":["${var.service_discovery_namespace_id}"]}]' --region ${var.aws_region} \
--query 'Services[?Name == `${var.service_discovery_name}`].Id' --output text) && \
aws servicediscovery discover-instances --namespace-name ${var.service_discovery_domain} --service-name ${var.service_discovery_name} \
--query 'Instances[*].InstanceId | join(`"\n"`, @)' --output text \
| xargs -I {INSTANCE_ID} aws servicediscovery deregister-instance --service-id $SERVICE_ID --instance-id {INSTANCE_ID} && sleep 5
EOF_COMMAND
}
}
In my scenario, I have a direct dependency between the ECS service and service-discovery-service (the service references the service discovery service ARN).
In my case, a seemingly simple change to the DNS TTL value in the service discovery caused me to encounter this problem.
Same happens here.
We are hitting the same scenario on services which are already running in production.
The proposed solutions on this issue are restarting your service-discovery instance but I assume this means downtime?
+1
This is a bad bug as it basically makes the provider feature incomplete and broken. IMO the original feature should have never been released if it doesn't handle this scenario.
+1
+1
+1
+1
just stop the task first before delete the service
I was able to resolve this by running
aws servicediscovery list-services --region us-west-2then selecting my service's ID from the list and running.
aws servicediscovery delete-service --id srv-oy************x
that fixed it for me
+1
Is there a workaround that actually works around?
This has been working great for me:
Add to aws_service_discovery_service resources:
# Remove after https://github.com/terraform-providers/terraform-provider-aws/issues/4853 is resolved
provisioner "local-exec" {
when = destroy
command = "${path.module}/servicediscovery-drain.sh ${self.id}"
}
servicediscovery-drain.sh:
#!/bin/bash
[ $# -ne 1 ] && echo "Usage: $0 <service-id>" && exit 1
serviceId="--service-id=$1"
echo "Draining servicediscovery instances from $1 ..."
ids="$(aws servicediscovery list-instances $serviceId --query 'Instances[].Id' --output text | tr '\t' ' ')"
found=
for id in $ids; do
if [ -n "$id" ]; then
echo "Deregistering $1 / $id ..."
aws servicediscovery deregister-instance $serviceId --instance-id "$id"
found=1
fi
done
# Yes, I'm being lazy here...
[ -n "$found" ] && sleep 5 || true
Having the same issue right now.
The same problem :(
Most helpful comment
In my case there was a modification in service discovery resource and terraform was unable to destroy the old resource. So I have to do it manually.
module.fargate_staging.aws_service_discovery_service.services[75]: Destroying... (ID: srv-jXXXXXXXXXX)Solution:
Based on the service id, first i have to find the attached instance-id,
- aws servicediscovery list-instances --service-id=srv-jXXXXXXXXXX --region=eu-central-1 --profile=stagingOnce i have attached instance-id, i have to deregister it before i delete the service.
- aws servicediscovery deregister-instance --service-id=srv-jXXXXXXXXXX --instance-id=337cfbfd-bc9d-4b42-8a10-ABCXYZ913 --region=eu-central-1 --profile=staging- aws servicediscovery delete-service --id srv-jXXXXXXXXXX --region=eu-central-1 --profile=stagingIt looks like terraform needs to fix this bug :)