v0.9.2
...
resource "digitalocean_droplet" "kr_manager" {
name = "${var.do_name}"
image = "${var.do_image}"
region = "${var.do_region}"
size = "${var.do_size}"
ssh_keys = [XXX]
provisioner "local-exec" {
command = "echo ${digitalocean_droplet.kr_manager.ipv4_address} >> hosts"
}
provisioner "remote-exec" {
inline = ["dnf install -y python python-dnf"]
connection {
type = "ssh"
user = "${var.ssh_user}"
private_key = "${file(var.ssh_key)}"
timeout = "1m"
}
}
provisioner "local-exec" {
command = "ansible-playbook ${var.play}"
}
provisioner "local-exec" {
command = "docker-machine create --driver generic --generic-ip-address ${digitalocean_droplet.kr_manager.ipv4_address} --generic-ssh-key ${var.ssh_key} ${var.do_name}"
}
provisioner "local-exec" {
when = "destroy"
command = "rm hosts"
}
provisioner "local-exec" {
when = "destroy"
command = "docker-machine rm -f ${var.do_name}"
}
}
...
https://gist.github.com/IOAyman/3e86d9c06d03640786184c1429376328
It should have run the on-destroy provisioners
It did not run the on-destroy provisioners
terraform apply -var-file=infrasecrets.tfvars
terraform destroy -var-file=infrasecrets.tfvars
Are there any other GitHub issues (open or closed) or Pull Requests that should be linked here? For example:
Hi @IOAyman! Thanks for opening this.
This is indeed related to both #13097 and #13395, and seems to be another example of the same general problem. However, each of these would be solved in a different part of the codebase so I'm going to leave all three open with these connectors between them though ultimately we will probably attempt to fix them all with one PR in the end.
I'm having this issue in a local-exec provisioner when I am tainting aws_instance resources, but I can't tell from this issue text if I'm hitting this particular bug. Are the same mechanism used to -/+
after taint as when a explicit terraform destroy
is used?
@Gary-Armstrong I'm almost certain that you've found another variant of the same root cause there. Thanks for mentioning it!
Having slept on it a bit I've changed my mind and am going to fold all of these issues into this one as an umbrella issue for addressing the various places that destroy provisioners aren't working yet, since I strongly suspect that the fix will be a holistic one that covers all of these cases together.
Note for me or someone else who picks this up in future: the root challenge here is that the destroy provisioners currently live only in config, but yet most of our destroy use-cases don't have access to config for one reason or another. I think the holistic fix here is to change how destroy provisioners work so that during apply we include the fully-resolved destroy provisioner configs as part of the instance state, and then when we destroy those instances we use the stashed config in the state to destroy them, thus avoiding the direct dependency on config during destroy. This can then address all of the variants of the problem we've seen so far, and probably ones we've not yet seen:
create_before_destroy
lifecycleMore info: I eliminated some instances by setting count = 0 and the local-exec ran properly.
I'd like to add some color here:
when = "destroy" provisioners are also not being run for aws_instances that are marked as tainted, which are then destroyed and re-created. This is on v0.9.3.
I see that this was #3 in the above comment, sorry for adding more traffic. A word on the mind of the maintainer, here, though.. I consider it my responsibility to design idempotency around the create-destroy provisioners. I expect TF to be a trigger, based on when specific events start, not at some point during their execution. I can design my provisioners to error when I want them to, and ignore errors that I don't consider constructive; thereby avoiding the whole discussion of "when should we run these?"
Maybe the destroyers should have an option or two allowing us to customize when/where they fire? Food for thought.
My current destruction provisioner (This might be helpful to see my position):
provisioner "local-exec" {
when = "destroy"
command = "knife node delete ${self.tags.Name} -y; knife client delete ${self.tags.Name} -y; true"
/* The reason we're ignoring the return values is either it works, or it
wasnt there to begin with. Or you don't have permissions, but it's going
to wind up desynchronized anyway. */
}
An alternative would be a way to gate the overall instance termination on the relative success or failure of other designated dependent destructors prior to outright instance destruction. I wonder if that can be accomplished by putting the local-exec destructor before the Chef provisioner, but I haven't tested to see if that would work or not. Then I could save the desync by designing my destructor to stop terraform destroy in a better way.
In the case of a tainted resource, tracking the individual run results and configurations of provisioners against the resource as part of resource state should help Terraform in deciding whether a provisioner needs to run its destroy action on the resource, with additional guidance from the resource's configuration.
For provisioners that kick off config management tools, a successful run usually indicates there's something there that needs to be torn down at destroy time. There will probably be a common set of actions that each CM tool uses for decommissioning, as in @in4mer's example where API calls need to get made to remove the instance from a Chef server.
(I actually found this thread because I thought that's how Terraform's Chef provisioner _already_ worked!)
Remote or local exec are more open-ended, so they might need to have destroy-stage actions explicitly defined in the resource config, defaulting to noop.
Edit: Sorry, it appears to run correctly now. Please ignore this whole comment
I have the same issue with a null_resource
and a local-exec
provisioner in it. Creation works fine, but terraform destroy
completely skips over the when = "destroy"
command.
I'm bumping this because it's still an issue.
Honestly, IDGAF about we're waiting for eleventeen planets to align so we can have a celestially perfect tea ceremony to please our ancestors in the afterlife but we'll have to wait six more years for it to happen. This issue needs a fix; gate the destroy provisioner blocks off the resource they're attached to. If the resource was there, run the provisioners and let the admins sort it out.
I have seen a problem related to this but I think I know what is happening. I am destroying an instance and inside it I have a provisioner set with when="destroy"
. This is not running because the instance networking is taken down before the remote-exec can be run and then when the provisioner is run it cannot ssh on the machine.
terraform destroy -target=aws_instance.biz_gocd -force
aws_vpc.vpc_biz_dev: Refreshing state... (ID: vpc-4ab68733)
aws_key_pair.auth: Refreshing state... (ID: biz-GoCD-Key)
aws_security_group.biz_dev: Refreshing state... (ID: sg-c5062db6)
aws_subnet.subnet_biz_dev: Refreshing state... (ID: subnet-e9c91e8d)
aws_instance.biz_gocd: Refreshing state... (ID: i-029333e7696ca72c9)
aws_eip_association.eip_assoc: Destroying... (ID: eipassoc-a6a11391)
aws_eip_association.eip_assoc: Destruction complete after 1s
aws_instance.biz_gocd: Destroying... (ID: i-029333e7696ca72c9)
aws_instance.biz_gocd: Provisioning with 'remote-exec'...
aws_instance.biz_gocd (remote-exec): Connecting to remote host via SSH...
aws_instance.biz_gocd (remote-exec): Host:
aws_instance.biz_gocd (remote-exec): User: ubuntu
aws_instance.biz_gocd (remote-exec): Password: false
aws_instance.biz_gocd (remote-exec): Private key: false
aws_instance.biz_gocd (remote-exec): SSH Agent: true
aws_instance.biz_gocd (remote-exec): Connecting to remote host via SSH...
....
timeout
I dont think this should be expected behaviour. Instead destroy provisioner should be queued before any other changes
This issue bugs us with our Terraform managed DC/OS cluster. Since we can't run destroy provisioners (for killing dcos-mesos-slave service), the jobs on our destroyed nodes are not moved to other nodes timely...
@bizmate's comment is interesting. There could be an easy fix there for some (maybe the majority) use cases.
I see the same issue as bizmate. In my case I'm trying to run a destroy provisioner on ebs volume attachments, but it seems like we lose the network routes before the provisioner has run. My case is slightly different as I'm going through a bastion and the code below is in a module (this could be the edge case).
resource "aws_volume_attachment" "volume_attachments" {
...
# Fix for https://github.com/terraform-providers/terraform-provider-aws/issues/2084
provisioner "remote-exec" {
inline = ["sudo poweroff"]
when = "destroy"
on_failure = "continue"
connection {
user = "centos"
host = "${element(aws_instance.cluster_nodes.*.private_ip, count.index % var.num_nodes)}"
private_key = "${file(var.private_key_path)}"
bastion_host = "${var.bastion_public_ip}"
agent = false
}
}
provisioner "local-exec" {
command = "sleep 30"
when = "destroy"
}
}
I found a workaround with a null_resource
which can be used for more fine-grained provisioning.
The following works for me with a successful execution of the destroy-provisioner
of an aws_instance
by executing the teardown.sh
script.
resource "aws_instance" "my_instance" {
...
}
resource "null_resource" "my_instance_provisioning" {
triggers {
uuid = "${aws_instance.my_instance.id}"
}
provisioner "remote-exec" {
inline = [
"bash setup.sh",
]
}
provisioner "remote-exec" {
when = "destroy"
inline = [
"bash teardown.sh",
]
}
The clue is that the null_resource
will be destroyed before the aws_instance
is destroyed, hence it can still establish a connection. Hope that help you folks same some time and make better and cleaner infrastructures :)
I am getting error:
aws_instance.ec2_instance: aws_instance.ec2_instance: self reference not allowed: "aws_instance.ec2_instance.id"
Here is my code ....
resource "aws_instance" "ec2_instance" {
ami = "${var.AMI_ID}"
instance_type = "${var.ec2_type}"
key_name = "${var.ec2_keyname}"
vpc_security_group_ids = ["${var.ec2_security_group_id}"]
subnet_id = "${var.ec2_subnet_id}"
iam_instance_profile = "${var.ec2_role_name}"
........
resource "null_resource" "my_instance_provisioning" {
triggers {
uuid = "${aws_instance.ec2_instance.id}"
}
#provisioner "local-exec" {
# command = "sleep 30"
# when = "destroy"
#}
provisioner "file" {
source = "scripts/teardown.sh"
destination = "/tmp/teardown.sh"
connection {
type = "ssh"
user = "${var.ec2_runuser}"
}
}
provisioner "remote-exec" {
inline = [
"sudo chmod +x /tmp/teardown.sh",
"sudo /tmp/teardown.sh",
]
when = "destroy"
connection {
type = "ssh"
user = "${var.ec2_runuser}"
}
}
}
hi @matikumra69: could you put your code in an _environment_, then it's better readable and I can help you with that :) Use the insert_code
Update: @matikumra69 can you provide the whole code for aws_instance
? Your error has probably nothing to do with my proposal but more that within the aws_instance
your refer to itself by calling for example ${aws_instance.ec2_instance.private_ip}
. You should use ${self.private_ip}
in this case.
Hi @mavogel, I am getting this error.
module root:
module instance_module.root: 1 error(s) occurred:
This is what I am doing .....
resource "null_resource" "my_instance_provisioning" {
triggers {
uuid = "${self.private_ip}"
}
provisioner "remote-exec" {
inline = [
"sudo touch /tmp/abc",
]
connection {
type = "ssh"
user = "${var.ec2_runuser}"
}
}
}
hi @matikumra69, first: please use code block: A guide is here. Then your code is better readable.
Btw: a null_resource
has no property private_ip
, you should pass it in as follows:
resource "null_resource" "my_instance_provisioning" {
triggers {
uuid = "${aws_instance.ec2_instance.private_ip}"
}
provisioner "remote-exec" {
inline = [
"sudo touch /tmp/abc",
]
connection {
type = "ssh"
user = "${var.ec2_runuser}"
}
}
It would also help if you could provide your whole code with all definitions as a gist and link it here. With such small code snippets it's hard to solve your issue
@apparentlymart Just touching on your comment regarding 'tainted' instance flow, would supporting that just be a case of removing this EvalIf
? https://github.com/hashicorp/terraform/blob/master/terraform/node_resource_destroy.go#L215
As not running the destroy
provisioners when tainting a resource is causing us issues...
Any update on this one? This is causing us issues as we'd love to be able to use a destroy provisioner to do some cleanup on aws instances before they are destroyed. It's currently not working for us when using create_before_destroy
(not sure if this is the same issue or not).
On terraform v0.11.7, with create_before_destroy=true and tainting the resource, the destruction time provisioners - local and remote exec - are not being ran on that resource.
However If I ran a destroy of the entire infrastructure the destroy provisioners are ran.
It seems to me that just preventing it from running on tainted resources is too heavy-handed. I understand your reasons @apparentlymart, but perhaps it would be better to leave handling it to the user code. Once I have the provisioner running I can do all kinds of checks and conditional execution. I'd much rather have to hack something like that then just not being able to hook into destruction cycle at all.
So maybe you can at least fix that. If you are worried about backward compatibility, then maybe there should be an option, like when = "tainted"
or when = [ "tainted", "destroy" ]
or something like that.
Also current behavior should be clearly documented. It isn't obvious and there is nothing about it here: https://www.terraform.io/docs/provisioners/index.html#destroy-time-provisioners
Run on this issue today. After tainting an instance and then running apply
on-destroy provisioners are not run.
Did anybody come up with some workaround?
For me the only thing that "works" is being super specific in my destroy order. Not ideal and on occasion I still run into race conditions with RDS / Postgres databases.
Note I am putting my provisoners
into modules to wrangle control over the order
terraform destroy \
-target=module.one \
-target=module.two \
-target=module.three \
-var-file=varfile.tfvars \
-auto-approve=true \
-input=false
terraform destroy \
-target=module.four \
-target=module.five \
-target=module.six \
-var-file=varfile.tfvars \
-auto-approve=true \
-input=false
For some reason, on destroy provisioners do get called when setting the count to 0 but not when actually destroying the resource.
When changing a resource sufficiently to cause a new one to be created, and with lifecycle set to 'create_before_destroy' the destroy provisioners are not run on the deposed entity.
When those entities are running containers, its quite important they are drained onto the new resources before being deleted.
I believe this issue is still present in v0.11.8. Is it still being worked on?
The use case is an important one for enterprises, specifically:
Note: This scenario applies to the case where local-exec is part of a resource module, and the on-destroy of the module is intended to trigger the local-exec.
Note: Related issue which leads to this being a desireable alternative: https://github.com/terraform-providers/terraform-provider-azurerm/issues/1143
@archmangler, regardless, this ticket is also about on-destroy provisioners not being run for tainted resources, which is a fundamental abdication of responsibility.. When a resource is created, its provisioners are run. When a resource is tainted and re-created, EVEN THOUGH there is a destruction of a given resource, that resource's on-destroy provisioners do not work.
This is logically identical to tainted resources not having their on-creation provisioners run after an apply
, and if that were occurring people would be screaming bloody murder because that's bugged AF. Which this is too, but we can't get anybody to pay attention to it. @apparentlymart, any idea if this fundamental bug will ever be addressed?
I can confirm that it doesn't run for google_container_cluster
resource as well:
resource "google_container_cluster" "gke-0" {
......
provisioner "local-exec" {
command = "gcloud beta container clusters get-credentials ${self.name} --region ${self.zone} --project ${self.project}"
}
provisioner "local-exec" {
when = "destroy"
command = <<EOF
kubectl config unset users.gke_${self.project}_${self.zone}_${self.name} && \
kubectl config unset contexts.gke_${self.project}_${self.zone}_${self.name} && \
kubectl config unset clusters.gke_${self.project}_${self.zone}_${self.name}
EOF
}
}
Output:
terraform apply
......
Enter a value: yes
google_container_cluster.gke-1: Destroying... (ID: ops-gke-1)
google_container_cluster.gke-1: Still destroying... (ID: ops-gke-1, 10s elapsed)
google_container_cluster.gke-1: Still destroying... (ID: ops-gke-1, 20s elapsed)
google_container_cluster.gke-1: Still destroying... (ID: ops-gke-1, 30s elapsed)
google_container_cluster.gke-1: Still destroying... (ID: ops-gke-1, 40s elapsed)
google_container_cluster.gke-1: Still destroying... (ID: ops-gke-1, 50s elapsed)
google_container_cluster.gke-1: Still destroying... (ID: ops-gke-1, 1m0s elapsed)
google_container_cluster.gke-1: Still destroying... (ID: ops-gke-1, 1m10s elapsed)
google_container_cluster.gke-1: Still destroying... (ID: ops-gke-1, 1m20s elapsed)
google_container_cluster.gke-1: Still destroying... (ID: ops-gke-1, 1m30s elapsed)
google_container_cluster.gke-1: Still destroying... (ID: ops-gke-1, 1m40s elapsed)
google_container_cluster.gke-1: Still destroying... (ID: ops-gke-1, 1m50s elapsed)
google_container_cluster.gke-1: Still destroying... (ID: ops-gke-1, 2m0s elapsed)
google_container_cluster.gke-1: Still destroying... (ID: ops-gke-1, 2m10s elapsed)
google_container_cluster.gke-1: Still destroying... (ID: ops-gke-1, 2m20s elapsed)
google_container_cluster.gke-1: Still destroying... (ID: ops-gke-1, 2m30s elapsed)
google_container_cluster.gke-1: Destruction complete after 2m36s
Apply complete! Resources: 0 added, 0 changed, 1 destroyed.
@apparentlymart we're running into the recreation issue, too.
GKE node pool, create before destroy, on-destroy not called when a new pool replaces the old one.
Doesn't work for any azurerm resources either
Could it be that this is only the case when on-destroy is called on a provisioner in a module and that this is the correct behaviour. In which case the question becomes: "is there a way to trigger a provider on-destroy of the containing module?"
Does not work outside modules during taint operation.
+1 I cannot create an instance with create_before_destroy as well as a destroy time provisioner (provisioner never runs). Key for me appears to be when create_before_destroy = true is set.
@apparentlymart @radeksimko @jbardin Any news about tackling this? Thanks
There is no single word about when
argument in the official documentation
@damdo: This is something that has to wait until at least the 0.12 release, since it's going to require some fundamental changes to the provisioner system.
@dev-e: That link is only for the local-exec
provisioner, see: https://www.terraform.io/docs/provisioners/index.html#destroy-time-provisioners
Thank you @jbardin I really hope this can be included in 0.12, since it is such an important release
cc. @apparentlymart @radeksimko
I can also confirm like @jstewart612 that putting the clause "when = "destroy"", doesn't work on azurerm provider. A VM is getting deleted before i can execute remote commands, this is crucial as my servers must notify other servers they are down and they must do it from inside that VM due to the certificates already present only there!
provisioner "remote-exec" {
when = "destroy"
EDIT Well it happens to be a different issue here, the permission to keyvault which hosts the certificate being used to connect to the VM gets removed before terraform connects to the VM, so it just cant connect and doesn't destroy the VM using gets deleted before terraform uses it to connect to the VM.
EDIT
Ok kind of answered myself, needs an explicit destroy.
Still not quite the behavior I'd expect here.. https://www.terraform.io/docs/provisioners/index.html#destroy-time-provisioners
It still does not work with null_resource
in the following scenario (not running explicit destroy)
Terraform Configuration main.tf
resource "null_resource" "control" {
triggers {
uuid = "${random_string.uid.result}"
}
provisioner "local-exec" {
command = "echo hello"
}
provisioner "local-exec" {
when = "destroy"
command = "echo bye"
}
}
resource "random_string" "uid" {}
Commands executed in order, where the main.tf
file resides
terraform init
terraform apply [yes]
mv main.tf main.tf-ignore
terraform apply [yes]
I assume this is also a bug?
Since terraform 0.12 is released. Are there any updates/workarounds for this issue ?
Very much disagree that this is an enhancement rather than a bug. Specifically, the behavior around deposed resources and create_before_destroy
lifecycle configs. I would expect from reading the docs that a configuration like so:
resource "aws_instance" "worker" {
count = var.worker_count
ami = var.ami_id
// ...
lifecycle {
create_before_destroy = true
}
provisioner "local-exec" {
command = "${path.module}/scripts/wait_for_nomad.py ${self.private_ip}"
}
provisioner "local-exec" {
when = "destroy"
command = "${path.module}/scripts/drain_node.py ${self.private_ip}"
}
}
Would let me have more-or-less seamless upgrades of my Nomad workers by changing the AMI ID and triggering a replacement. This is not the case, as the destroy provisioner doesn't run and thus the nodes do not drain.
If we're going to add a when
option to provisioners, can we also let them run before/after a resource is modified? I just ran into an issue while migrating a bunch of ECS tasks to Fargate where I could have wallpapered over a lot of taints / manual modifications if I'd been able to do this...
@hashibot Relabing this as an "enhancement" doesn't make it any less of a bug, it just makes HashiCorp look tone-deaf and out of touch with users and user issues.
This is still either a design or implementation bug borne out of an egregious lack of logic. We can try to paper over this and pretend everything is hunky dory, or we can fix it and actually make things hunky dory! Is there some kind of [political?] internal struggle within Hashi that makes it impossible to fix fundamental issues like this?
It's been A YEAR AND A HALF since this bug was reported. Let's kill this bug dead!
Hi everyone,
This issue is labelled as an enhancement because the initial design of the destroy-time provisioners feature intentionally limited the scope to run only when the configuration is still present, to allow this feature to be shipped without waiting for a more complicated design that would otherwise have prevented the feature from shipping at all.
We're often forced to make compromises between shipping a partial feature that solves a subset of problems vs. deferring the whole feature until a perfect solution is reached, and in this case we decided that having this partial functionality was better than having no destroy-time provisioner functionality at all.
The limitations are mentioned explicitly in the documentation for destroy-time provisioners, and because provisioners are a last resort we are not prioritizing any development for bugs or features relating to provisioners at this time. We are keeping this issue open to acknowledge and track the use-case, but do not expect to work on it for the foreseeable future.
Please note also that our community guidelines call for kindness, respect, and patience. We understand that it is frustrating for an issue to not see any updates for a long time, and we hope this comment helps to clarify the situation and allow you all to look for alternative approaches to address the use-cases you were hoping to meet with destroy-time provisioners.
The docs do not cover the use I posted above, where a changed resource that is deposing an existing resource does not cause the deposed resource鈥檚 provisioner to trigger. I believe this is a bug based on the limitations in the link you shared which are:
If this is not going to be addressed, the docs should be updated to reflect this additional limitation of this feature.
Sent with GitHawk
The fact that 'when = "destroy"' is not triggered when a resource is removed from the tf code and applied with 'terraform apply' is a big issue for me as I am starting to use CI/CD environment.
The only way I've been able to trigger the provisioner is to run 'terraform destroy -target RESOURCE_TYPE.NAME'. This needs to be done manually before removing the resource from tf code and running 'terraform apply'.
I need the on-destroy provisioner to do some task that only happens when on destroy. For example, cleanup puppet certificate and remove server from monitoring on VM deletion.
This is currently preventing me from achieving full CI/CD.
Any news about that ? The issue still here with Terraform 0.12.24, and in CI environement, the on-destroy is needed. Nobody want to add manual tasks when you want to delete some instances ;)
any solutions or workaround?
I also came across this problem: local-exec
"on-destroy" won't execute if it's executed within a module.
Background:
postgressql_databsae
resource to create databases. (The definition is within a module.) My problem is that as long as somebody is connected to the database TF can't destroy it. Solution: local-exec "on-destroy" executing a script which terminates connections. However it's not triggered...Same story here. Destroy-time provisioner is inside a module, and it is used to remove k8s service holding IP address. Without it being executed there's always an error while removing IP address resource.
I have this issue too inside a module.
lifecycle {
create_before_destroy = true
}
provisioner "local-exec" {
when = destroy
command = "triton-docker exec -i ${self.name} consul leave"
}
this works when run with --destroy
, but not with --apply
when the resources are destroyed and replaced with new versions
If I remove the create_before_destroy = true
then the --apply
works as expected and executes the local-exec, destroys the resource and creates a new resource, but I don't wont to set this to false.
Most helpful comment
I'm bumping this because it's still an issue.
Honestly, IDGAF about we're waiting for eleventeen planets to align so we can have a celestially perfect tea ceremony to please our ancestors in the afterlife but we'll have to wait six more years for it to happen. This issue needs a fix; gate the destroy provisioner blocks off the resource they're attached to. If the resource was there, run the provisioners and let the admins sort it out.