0.11.8 with AWS provider 1.30
I realize these aren't the latest versions but I don't think this module has changed recently and I haven't had a chance to test yet.
aws_emr_clusteraws_emr_security_configurationIf I have terraform manage my aws_emr_cluster and pass a terraform-managed aws_emr_security_configuration into that cluster, terraform destroy fails consistently to destroy the security configuration.
terraform destroy successfully cleans up all the resources it created
Error: Error applying plan:
1 error(s) occurred:
* aws_emr_security_configuration.main (destroy): 1 error(s) occurred:
* aws_emr_security_configuration.main: InvalidRequestException: Security configuration 'tf-emr-sc-20181022205505705400000001' cannot be deleted because it is in use by active clusters.
status code: 400, request id: e6338f23-d936-12e8-ad83-7bea3842861a
aws_emr_cluster and give it an aws_emr_security_configuration.terraform applyterraform destroyI think it's just not waiting long enough before attempting to destroy the security configuration. If I try again a minute or two later, the destroy works fine.
It doesn't look like the situation has changed, quickly looking at: https://github.com/terraform-providers/terraform-provider-aws/blob/master/aws/resource_aws_emr_security_configuration.go#L103
To fix, we can probably just add a resource.Retry() handler to the Delete function there that retries for a minute or two:
input := &emr.DeleteSecurityConfigurationInput{
Name: aws.String(d.Id()),
}
err := resource.Retry(1*time.Minute, func() *resource.RetryError {
_, err := conn.DeleteSecurityConfiguration(input)
if isAWSErr(err, "InvalidRequestException", "does not exist") {
return nil
}
if isAWSErr(err, "InvalidRequestException", "cannot be deleted because it is in use by active clusters") {
return resource.RetryableError(err)
}
if err != nil {
return resource.NonRetryableError(err)
}
return nil
})
if err != nil {
return fmt.Errorf("error deleting EMR Security Configuration (%s): %s", d.Id(), err)
}
Then to acceptance test, just create a test configuration that creates a security configuration and a cluster that utilizes that security configuration. 馃憤
I'm wondering if the problem is pretty inconsistent though, because TestAccAWSEMRCluster_security_config already performs something like the test configuration I mention and looking through the last 6 months of our daily acceptance testing I can't find a failure with the above error.
Hmm, I can try to reduce my situation then for a proper regression test. I assumed it had nothing to do with the rest of my cluster configuration but maybe it does, if you don't get the issue. Definitely happening 100% of the time here. Given how long EMR clusters take to spin up and down, it'll probably take me a bit to find what's going wrong, but I'll try to post back with some actual terraform.
We notice the same behaviour, the versions we are using:
Initializing provider plugins...
- Checking for available provider plugins on https://releases.hashicorp.com...
- Downloading plugin for provider "null" (1.0.0)...
- Downloading plugin for provider "aws" (1.51.0)...
- Downloading plugin for provider "template" (1.0.0)...
We ended up using a sleep 100 to mitigate the issue, which is not ideal and we also would like to see it fixed 馃憤
Would love to see this get fixed. Here is a non-sleep workaround for anyone who is interested. Requires aws and jq.
resource "aws_emr_cluster" "my_cluster" {
...
provisioner "local-exec" {
when = "destroy"
command = "echo ${aws_emr_cluster.my_cluster.id} > cluster_id.txt"
}
}
resource "aws_emr_security_configuration" "my_security" {
...
provisioner "local-exec" {
when = "destroy"
command = "while [ ! `aws emr describe-cluster --cluster-id $(cat cluster_id.txt) | jq 'any(.Cluster.Status.State; contains(\"TERMINATED\"))' | grep true` ]; do sleep 5; done"
}
}
I don't know if this is related or not, but we're observing "terraform destroy" jobs involving EMR clusters returning as "completed" while the cluster is still in the "Terminating" (as opposed to "Terminated") state.
Poking into this some more, I wonder if this isn't because the EMR cluster delete method only waits for there to be zero running instances in the cluster, not for AWS to report the cluster as being terminated, which I think was introduced in f7405d0773e9ba50b5ed1072b7e35501058ab786. This likely causes Terraform to think the cluster has been terminated, and the security configuration can be deleted, when from the AWS side, the cluster still exists and the security configuration cannot be deleted.
Any thoughts on changing the cluster deletion wait to wait for EMR to report the state as terminated, rather than for it to have zero running instances?
Any thoughts on changing the cluster deletion wait to wait for EMR to report the state as terminated, rather than for it to have zero running instances?
Sounds like a great idea. 馃憤
FYI, we've been using a provider patched with the code in #12578 and it seems like it has fixed this issue for us.
We actually had to get around this issue by adding lifecycle policy of create_before_destroy to the
aws_emr_security_configuration resource.
```
resource "aws_emr_security_configuration" "my_config" {
...
lifecycle {
create_before_destroy = true
}
}
We actually had to get around this issue by adding lifecycle policy of create_before_destroy to the
aws_emr_security_configuration resource.resource "aws_emr_security_configuration" "my_config" { ... lifecycle { create_before_destroy = true } }
@ashkan3 this didn't seem to work for me? Still getting the dependency error when destroying....