Terraform: Destroy 'provisioner' for instance resources

Created on 10 Oct 2014  ·  86Comments  ·  Source: hashicorp/terraform

I would be great to have sort of a 'provisioner' for destroying an instance resource.

Example:
When creating a instance, I bootstrap it with chef and the node is registered with the chef server. Now I need a way of automatically deleting the node from the chef server after terraform destroys the instance.

core enhancement

Most helpful comment

Thanks for the feedback @jeffbyrnes @eedwardsdisco @cbarbour!

I started working on the PR this weekend - I have it working with file provisioners (you can now execute arbitrary logic before a destroy 😍)

Need to polish this up, and port the changes to the other provisioners.. I will then send out the PR as a WIP so we can all start discussing the changes.. and work on getting the site docs, etc updated then.

Getting excited :)

All 86 comments

Yes, a provisioner that would be called on_destroy.
Would be useful for puppet as well, to clean up certificates so that the same(getting into puppetmaster and issuing puppet cert clean ${aws_route53_record.foobar.*.cname).
But more importantly the use of such on_destroy event has some critical cases:

  • In some cases you need to do general cleanup before destroying an instance, (or push logs to storage)
  • gracefully deregistering it from continuous delivery stack
  • shutting off services gracefully
  • notifying monitoring tools that is _ok_ that the instance is dieing etc.

What is the behavior if the provisioner fails? Does the destroy fail and it is run again until it succeeds?

I'm open to the idea, but I'm not sure I see how Terraform could ever safely do this.

With provisioning on create, it is much simpler (for reference): The resource is created, the provisioners are run. If any provisioner fails, the resource is "tainted", and the entire thing is destroyed/created and tried again on a subsequent run.

Destroy provisioners don't have the same "taint" concept, which results in them going into an uncertain state that Terraform can't reason about. If Terraform can't reason about it, we can't safely change infrastructure (one of the core tenets of the project). This might imply that this feature is not fit for this project.

I totally understand your point and honestly I don't have a smart answer.
But I can offer two options which would work for me and hopefully for
others as well.

I see the destroy provisioner as a post destroy hook. So the instance
should be destroyed by terraform no matter what. After this,
terraform should try to execute the destroy provisioner and in case it
fails:

1) just print out a warning and let me handle the fallout manually or

2) remember the state and re-execute only the destroy provisioner on the
next run. Maybe till it succeeds.

Does this make sense? I really think terraform is awesome and I hope to use
it together w/ chef.

I think we can run the destroy provisioner before the destroy step itself. This way if the provisioner fails, we abort the destroy, and the user can re-attempt on another Terraform run.

@armon What if there are multiple provisioners though? It would force us to keep track of provisioner state. Maybe that is part of the feature, but just worth pointing out.

I think they have to be idempotent or just plan to run multiple times in some cases. We don't have to track their state, we just abort the Apply() on that node. Then we just retry it later. I think for most cases (de-registering servers) this should be fine.

I am with @armon on this one.

Three rules:

  • provisioners' actions must be idempotent
  • if a provisioner fails, destroy of resource is aborted.
  • a force option should taint/destroy the resource even if the on_destroy provisioner fails.

Agree that this would be a useful feature, and that idempotency should keep it safe. Options from @pmoust seems reasonable.

This would be an excellent feature - we've found need for being able to clean up before a resource is destroyed.

+1

+1

With the addition of the 'chef' provisioner, IMHO this is a must have.

Right now terraform can provision chef instances, but not actually destroy them. This leaves stale clients and nodes on the chef server.

@mitchellh I see this being implemented as a 'destroy' block within the provisioner block itself. If it's there, then on_destroy is called before destroy is called.

Without this feature, terraform is going to keep causing garbage to pile up on our chef server. We'd be happy to spend some time on a POC PR.

cc @thegedge

:+1: +1

I'm surprised packer does this, but terraform doesn't: https://www.packer.io/docs/provisioners/chef-client.html#skip_clean_client

@mitchellh do what puppet does? have a test to see if it needs to execute the on_destroy?

:+1: +1

:+1: +1

so what's the next step here? create a proof-of-concept nop destroy provisioner and add the hooks? I like @pmoust's three rules, which can become these test cases:

  • provisioner succeeds, destroy the resource
  • provisioner fails, destroy of resource is aborted
  • force option provided, provisioner fails, still destroy the resource

Should there be an option to skip the on_destroy provisioner altogether?

The "junk in Chef Server" problem (which I also have!) makes me think that we should consider letting a single provisioner hook in to both the create and destroy parts of the lifecycle. Possibly to update, too.

It'd be nice if you could just add provisioner "chef" and it would integrate with both the create and destroy actions and tidy up during destroy.

Of course at that point the provisioner starts to look quite a lot like a resource. In #3084 I included a chef_node resource that creates the chef server object but isn't able to actually get the server set up. I spent some time trying to figure out a suitable workflow there and didn't hit one; a provisioner being able to hook into destroying could be the missing link to make that work well.

Another use case for this feature is to allow an unmount command to run before a aws_volume_attachment gets destroyed from a running instance.

+1

+1

+1 - found this Googling for "terraform destroy hook", was looking for options to clean up old chef nodes/clients

+1

+1 my use case it unmount an ebs volume before destroying

+1 I need this for my use cases. For windows AD to unjoin, chef remove from server, and then for Elasticsearch nodes, Consul cluster, and other use cases where I need to control what happens to the EBS volume when the resource gets destroyed.

Right now the alternative is basically building some high level logic above and outside of terraform that can transition my nodes into various states of a lifecycle.

The concern is that all it takes is one TF run that deletes or taints or recreates a resource to have this lifecycle management completely bypassed. NSFP... not safe for prod.

+1

+1

I really need this functionality, and would like to start working on it. Has anyone started a branch to implement this functionality?

If not would anyone like to start working on one with me?

+1

So I put together a feature spec that attempts to address this issue. I hope it raises a few eyebrows if anything?

Destroy Provisioners in Terraform

It would be great if @mitchellh @pmoust @kforsthoevel @armon @phinze could review the proposal and give some feedback on the matter.

Just looking to get this feature implemented, in any guise.

+1

This is almost a requirement for a terraform+chef infra

+1

I haven't seen any documentation around how to have consul agents leave the cluster as part of a terraform update (if I missed it let me know.)

When we rotate base images it leaves nodes in consul forever (as far as I've seen/waited) unless we manually run a consul force-leave on another server. The destroy proposal from @kris-nova (and others) sounds like it would help solve that problem by allowing us to define a destroy script on the server to issue a consul leave command before the destroy takes place.

Well I think this has had enough time to marinate, I decided I am going to be proactive and just send a PR with what I described earlier. Hopefully that will be enough to raise some eyebrows and get this issue brought back to life. I will start working on it now, if anyone cares to get involved let me know :)

Thanks everyone

A before_destroy provisioner would also be helpful. I have some EC2 instances with EBS volumes configured as LVM volumes to accommodate future growth easier. Before you detach these volumes, you have to disable the volume group first, so every time I want to recreate or destroy these instances terraform fails, unless I connect to each instance and disable the VGs first.

@liviudm,

My thoughts when writing the Destroy Provisioners in Terraform RFC were to implement the support for all existing provisioners. Just merely call them before the Apply(), and only during a Terraform destroy operation. This way we can take advantage of already working terraform features and support, just change the way they are executed.

I think having one-off provisioners for a destroy or before_destroy seems very junk-drawer-y to me. What if you want to run a chef command, or a local-exec, or remote-exec command before destroying a resource? Globbing them all into one provisioner seems messy and could lead way to having a lot of arguments for the provisioner that will only be used in very specific cases. Also that feels like duplicate code, and duplicate feature work. The features are already there and supported, lets just call them when we need them.

But again.. this was just my thought on the matter. Feedback? Bueller?

@kris-nova I went through your RFC, looks good from my point of view and makes sense. The only note I want to add, you said:

The provisioner will run before the destroy action takes place, and never during an apply operation.

I guess you're talking about resource destroy, not terraform destroy. The destroy provisioners should run also if the resource has changes that requires the resource to be redeployed, such as an EC2 AMI change.

@kris-nova I don't have much experience with using terraform at scale, so I don't have much intuition for the specific implications of using this across a diverse set of resources, but I believe your RFC would provide enough functionality to address my own needs. Thanks for carrying it forward!

@liviudm,

Yes your assumption is correct regarding resource destroy. I also agree this destroy provisioner should be used in all management of state deltas. As you mentioned, they should be ran on redeploy as well.

@ajbouh,

Thanks for backing me on this. We have been waiting for this functionality for some time. Now that I know that I have some support I think I am going to make this feature a priority for me to work on.

Thanks for the feedback everyone!

@kris-nova I have a number of projects and tools waiting to be ported over to terraform that are blocking on this. Can't wait for the new world this'll open up. 🌎

@kris-nova Two questions on the RFC

  • Does terraform taint execute the on_destroy action? Or is it only terraform destroy
  • Can there be only 1 on_destroy action type per resource?

We are a windows shop that uses centralized Chef, Active Directory, Microsoft DNS, and Octopus Deploy. A taint or destroy action on a single VM would mandate 4 powershell scripts be executed. Some of those would be considered 'attempt' and others would be considered 'require'

@kris-nova that RFC looks pretty good. Really keen to have Terraform handle a Chef client & node as part of the terraform destroy, as well as have some mechanisms for defining what should happen if a node becomes tainted & will be re-created.

@spuder @kris-nova
IMHO on_destroy MUST ONLY be ran when an action takes place (i.e. when resource's unique identifier is removed/replaced in state). terraform taint just marks the resource as tainted, it does not and _should_ _not_ perform any action on the actual infrastructure. The natural step after tainting is to create a plan and apply it. on_destroy will be included in said plan for tainted resources.

I haven't followed up on this in a while - just want to make sure this is still a concern - I can focus on a PR this weekend if we still need this bit of functionality.

Thoughts?

It would be useful to me. I'm still experiencing problems related to #2957, and a destroy provisioner would make it possible to automatically unmount volumes, allowing for detachment.

Still +1 from me... being able to handle full creation and destruction lifecycle from terraform is a huge boon. I feel like anyone using chef server needs this.

Absolutely necessary to properly manage a Chef-driven infra using Terraform. Otherwise, lots of manually deleting nodes & clients on the Chef Server has to happen, which is easy to forget.

Thanks for the feedback @jeffbyrnes @eedwardsdisco @cbarbour!

I started working on the PR this weekend - I have it working with file provisioners (you can now execute arbitrary logic before a destroy 😍)

Need to polish this up, and port the changes to the other provisioners.. I will then send out the PR as a WIP so we can all start discussing the changes.. and work on getting the site docs, etc updated then.

Getting excited :)

Currently looking how my changes will work with @mitchellh's new graph for terraform apply https://github.com/hashicorp/terraform/pull/9388 ... checking checking checking

@kris-nova Feel free to let me know if you have any questions, but the changes shouldn't be _so_ radical.

+1 Needed for Puppet Unprovision here too.

Hey @kris-nova, do you happen to have your branch available anywhere? This is a feature we'd like to prioritize for 0.8 and I don't want to negate the effort you've put in!

@mitchellh - I would love to get this in for 0.8 I have done a lot of work on this - and will get a branch up soon (after I merge with the new core changes) - if this is a priority I can certainly bump it up in my TODO list... Can you point me to a place where I can contribute / communicate / get involved with the goings on of 0.8? I have just had this on the back burner, but can get a valid PR out in a few days if necessary - cheers!

@kris-nova Thanks! Ohhhhh to be clear: I'm not asking YOU to do the work, I was planning on starting it and realized (remembered) that you started it and just wanted to not insult you by working over what you already spent effort on. If you push your branch I'll polish it up, make sure it works with the new graph, and get it in the beta. The timelines are pretty short so I can't ask you to get it going. :)

@mitchellh - To be clear myself - I would be willing to get the work done on whatever timeline the community needs.. and it would be great work..

I just don't know what the goals/expectations are around the PR.. but am happy to adhere to them..

You won't insult me if you code it yourself, I will just drop off the thread and let you handle it.. But there is no reason to try to drag my work along with a feature you plan on implementing without me..

It's your project man, if you want to code it great.. If you don't want me involved I am very happy to stay out of the way :)

@kris-nova Perfect! If you want to give it a stab, beta 2 is coming out next week and I'd love it in by then. beta 1 is coming out tomorrow and I have no expectations of it being in then. :)

I'd love for you to be involved, but I understand this is free, community effort and I don't feel comfortable imposing any time requirements on the community. If you feel comfortable in those timelines then I'd love to see it. If not, let me know and I'll pick it up. Thanks!

Sorry for intruding, but for me this conversation looks a bit strange. It looks like Hashicorp team is planing to roll-out beta for next version ASAP and they are willing to include this feature, that so many of us need, into the release. What can be better?
I think it should be pretty clear that regardless of the professionalism level, an OSS contributor cannot compete with the perspective and understanding of goals of the main team members. Therefore if it's something that must be done really fast I believe that the best candidate would be a team member. OTOH, if there is a peace of work done by an OSS community/contributor it's kinda ridiculous to just ignore it and do it from scratch (unless there are some very good reasons to do so).

After all, it is an OSS (even though it's a Hashicorp product as well) and I think that one of the main things about OSS is to not be attached to your code or develop some feelings but rather collaborate on making a product great.

Not sure that it's an appropriate comment - please feel free to delete it if you think it's not.

@ashald - I guess my point was more in line with the fact that this issue has been open for over 2 years and hasn't seen much movement other than what I have been driving..

Apparently now it's a pressing issue that needs to be completed by next week ¯\_(ツ)_/¯

_Which I am happy to do for the record.._

If we need a branch with my work I can push it up as a WIP.. but this doesn't look like a WIP conversation/learning opportunity to be honest.. but more inline with not stepping on anyones toes..

I just wanted to be clear that I am willing to contribute/share/collaborate in the spirit of making the project better.. but am not necessarily interested in letting someone "polish it up" because they finally decided to get around to coding a feature that the community has been asking for since 2014 and are impatient with the velocity that the community offers..

Imposing a time limit is NBD in my eyes, and it's about time this feature is finally getting the attention it deserves. I think it's great that we are finally trying to put this to bed. As I mentioned earlier.. I am happy to help..

It just doesn't seem like any help is being requested is all ;)

@Ashald I agree with you. I think we're saying the same thing. I _don't_ want to impose any time limits on an OSS contribution, which is why I said to @kris-nova that I'm not asking her to do the work. If she is happy to do it within the time conditions, then I'd love to see it, but I _did not_ ask that. I originally asked her to push up her WIP so that I could take it over so that she didn't have to feel the pressure of the time conditions. And, I'm not looking to mooch or freeload off her work: if she feels like I don't deserve -- for any reason -- to build off her work, then she doesn't need to do anything and I'll do it myself.

As for being quiet for a couple years and then suddenly prioritizing an issue... that is what happened. Basically: we have a few weeks of 0.8 beta ahead of us (beta, not RC, where we can introduce a few new features) and I found some extra bandwidth to fit something new in. I sorted issues "old to new" so I'd find the older issues we haven't touched in awhile, and this one seemed like a great one to add. The timeline of "one week" can be a lot of pressure for people, but for me its just another day on the job, which is why I'm not asking anyone other than myself to get it done in that time period.

@kris-nova I have zero expectations! I just didn't want to insult you in any way by ignoring the work you've already done. I'm not looking for free work here. If you don't want me to "polish it up" then just tell me you won't push it up and this is completely okay. I just want to know what would make you happy here so that we can move forward with this feature request.

Would I love help? Of course! But if anything makes you feel uncomfortable about it (timelines -- which you said are nbd --, the process, the feature itself, anything else in life, etc.), then I'm also happy to do it myself. I'm just offering optionality.

EDIT: I do want to also note that upon reflection, I realize how I could've been dismissive and "take it or leave it" in my initial comment. I'm sorry, that is not what I meant and that wasn't the tone I had in my head. Text-based communication is hard that way. I hope that above is clear though that I just want to find a way to get this going for the project and have no expectations of any scenario.

In hindsight I removed my comment, sorry @kris-nova, honestly no offense meant, appreciate your effort.

You are right @denisbr and @mitchellh - I am starting to regret my reaction as well.. Lets just do what we do best and approach this as excellent engineers.

@mitchellh - after looking through the new core changes all I would be able to offer would a few functions in the apply life cycle that may or may not be what we are looking for - here are my super rough field notes and hacky code that demonstrates where I was going with this https://github.com/kris-nova/terraform/tree/destroy-provisioners

@mitchellh - If you wouldn't mind getting the feature out - I would be interested in your take on how to solve this

The RFC is still up - and seems to have a good following for it - so at least the scope and story is defined.

Thanks for understanding everyone - and more importantly working as a team on this - we need it more than ever.

Cheers

@kris-nova Looks good! My thoughts (that may overlap with yours, just trying to be as detailed as possible):

  • Config changes, looks like you have those on lockdown.
  • Change EvalTree for apply operation: ignore destroy provisioners
  • Change EvalTree for destroy operation: run destroy provisioners prior to resource apply
  • Probably need to modify the EvalApplyProvisioner EvalNode to understand the on_destroy semantics to return an error or not (hence halting the evaluation process, hence halting the destruction of the resource)
  • Copious tests in context_destroy_test.go and context_apply_test.go to be able to test this stuff in-memory.

Those together would probably do it!

Let me know @kris-nova where your comfort level is and when you want me or don't want me to jump in and I'm happy to provide input. I think your RFC is solid (I actually missed it before you mentioned it!), the only reaction I'd have is maybe splitting up the on_destroy into two fields later to give us a _when_ field (when to run) and a field to specify behavior (attempt/require), this would allow us to build future provisioner lifecycle stuff (only on update, for example). HOWEVER, that is a nitpick and bike shedding in the sense that those changes are easy relative to the core feature itself, so I'll ignore those. :)

Woohoo. Reviewing this thread after a similar discussion in #649.

Is this still targeted for v0.8?

It didn't make it, sorry! This was my fault for not starting this up earlier. But that's okay, we'll get it into 0.9 for sure. It may even sneak into a 0.8.x release depending on the type of changes necessary.

Thanks for the update @mitchellh we are starting to work through the issues with Chef AD and Consul and how we remove old stale servers.

Would love to hear what you end up with @techcadia

We ended up just having a job that runs every X minutes to detect stale objects.

@Techcadia / @imduffy15 I usually baked a script into the instance that ran a removal from our chef server on box termination.

@stack72 How are you telling the difference between shutdown, reboot and terminate?

@mitchellh we're missing this sooo much for managing our Consul instances in AWS. Would be very happy to see the feature! Hope it will make its way into the next release.

@Ashald have you tried using the consul leave_on_terminate feature?

@eedwardsdisco that's a dangerous option for server nodes.

It's a Christmas time, let's hope a miracle will happen and it will sneak into one of 0.8.X releases! :)

Any chances to get provision on destroy?

Aside from the branch on @kris-nova's fork - which is currently over two thousand commits behind - is there anywhere work on this is being done? Is there an open PR?

I've long since given up hope on this ever being worked on.

No open PR at the moment, but it is on the roadmap for 0.9. :)

@mitchellh Is there an official roadmap published somewhere else?

https://github.com/hashicorp/terraform/milestone/2

There is not. Years and years ago (pre-Terraform) I used to but I've been burned too many times by people being _very_ upset when something doesn't make it in. I've learned since to not make such commitments.

That being said, its on the roadmap. We hope to ship it.

Howdy folks, I've continued the fantastic RFC from this issue and this is the final RFC that I'm going to run with to build this: https://docs.google.com/document/d/15nEcV7fxskDgYrXoNMl6RYIo10PCiZGle7TP8xitrFE/edit#

The work is in f-destroy-prov and is just about complete now. It is just missing some polish work around validation and docs. So I can confirm this will 100% make it into TF 0.9. Thanks so much to @kris-nova for the original RFC work, that helped save copious time during the design phase so I could just start going with implementation!

Wow! This is fantastic work @mitchellh :1st_place_medal:

Your RFC 2.0 looks like exactly what the community needs, and I am so glad to see this coming to life. Again, can't thank you enough for all that you do. Looking forward to the PR - and to one day having a long awaited closure on the issue.

Go Terraform!

If on_failure = "continue" is set (not the default) then we'll just continue and it'll only be in the output. The output may be noisy but we don't have a better way to surface that information at the moment (bonus: it'll be in red, so that helps if you have colors on). If this becomes a real issue then we can consider some other options, but I'd rather get the working feature out into the hands of the community and see where pain points end up being. 🤢

Ah thanks - I nuked my original question (as I read on I found more information in the implementation section..) which was

How will Terraform handle a failed destroy provisioner? Will this be tracked anywhere?

+1 to iterating on this.. I think you are nailing the feature exactly as we want, and I am pumped to see where this evolves to!

I am curious though.. what if we have an on_failure = "continue" and the provisioner itself breaks the program? (Might be a question for another forum, so feel free to defer me) Will terraform recover from the potential fatal, and continue destroying?

I think the discussion here is fine. 👍

If a provisioner returns any error (connection error, execution error, etc.) it will continue destruction. If the provisioner causes _Terraform_ to crash we'll not, but I think that's a reasonable tradeoff (provisioners should not crash Terraform 😛).

The broadness on the "failure" type is necessary due to the current core <=> provisioner API interface. Changing that would be pretty messy/annoying/deeper. Again, possible if it ends up adding high value, but not something I'd casually do unless there was enough of a proven use case.

Hope I answered your Q right... I didn't fully understand it (late Friday night reading perhaps an excuse. I dunno, but prob my own fault).

Great explanation! Thanks! You're right.. maybe I should get back to Friday night. Cheers

I'm going to lock this issue because it has been closed for _30 days_ ⏳. This helps our maintainers find and focus on the active issues.

If you have found a problem that seems similar to this, please open a new issue and complete the issue template so we can capture all the details necessary to investigate further.

Was this page helpful?
0 / 5 - 0 ratings