Terraform: Cannot destroy intrastructure given only tfstate

Created on 3 Oct 2018 · 10Comments · Source: hashicorp/terraform

Reopening https://github.com/hashicorp/terraform/issues/5425 since it's closing comment was shown to be inaccurate and the fact that it is closed seems to have prevented it from getting any attention.

@bcokert summarized the issue fairly well along with providing reproduction steps in this comment: https://github.com/hashicorp/terraform/issues/5425#issuecomment-320530179

It seems to me that @apparentlymart was correct when he said here that the only information required to destroy a terraform stack should be the information required by the provider blocks and variables they depend on. Since this isn't the case it seems this is a bug and the issue should not have been closed.

config thinking

Source

kurtwheeler

👍7

Most helpful comment

having the ability to remove the stack directly with .tfstate will help a lot during development and test especially when you have modified the source .tf files already before doing destroy.
.tfstate in the backend has all the details created during Apply phase. In that case destroy command can simply refer to the tfstate file from the backend without depending on the source .tf files.
In my view it's a basic capability which comes very much handy for non-production environments.

RaghuMeda on 23 May 2019

👍16

All 10 comments

Hi @kurtwheeler,

Things have changed quite a bit since that issue, so there may be parts that are no longer relevant.

It is not expected that you should be able to destroy infrastructure with only the state data. You will need provider configuration, especially when there are multiple aliased providers, and providers in modules.

jbardin on 4 Oct 2018

Hi @jbardin! I'm not actually sure how what you said is different than what I said, specifically:

the only information required to destroy a terraform stack should be the information required by the provider blocks and variables they depend on.

However this doesn't seem to be the case. Instead as @bcokert mentioned in the issue I linked you have to provide all of the variables in variables.tf regardless of whether the provider configuration depends on that variable or not.

In fact I've even had issues with putting an output block in variables.tf when a destroy doesn't completely succeed. If the attribute of a resource that has already been destroyed by a failed terraform destroy is supposed to be output, terraform will not run a subsequent destroy because it cannot provide that value. This often comes up if I forget to empty an S3 bucket, see that it couldn't be destroyed, empty it out, and then rerun terraform destroy. Instead of destroying the S3 buckets which are now empty, it instead complains that it cannot get an attribute of an unrelated resource that has already been destroyed.

kurtwheeler on 4 Oct 2018

👍1

Hi @kurtwheeler,

Thanks for the clarification, I wasn't quite sure what part you were highlighting.

It's going to be expected that the config is also valid, and since variables are required, they need values to pass validation. I'll have to verify how this work with HCL2, as I can't recall off the top of my head where that variable validation will be happening.

jbardin on 4 Oct 2018

Hmm okay, I guess I can understand that terraform will validate the config before running destroy. However, I question whether that is fully necessary. It seems to me that destroy is a special case where the only configuration which needs to be valid is the provider configuration. Does it truly matter if the full config isn't valid if the provider configuration is valid and the state file contains all the information about the resources that need to be destroyed?

kurtwheeler on 4 Oct 2018

👍2

I've dealt with this issue in a slightly different manner, but wanted to chime in to provide some more context to @jbardin et al.

At my current job (and also the previous one) I've worked on Terraform code for a platform that we implemented multiple times. The 'platform code' consisted of a few Terraform modules, and we would then create a ton of platforms by repeatedly implementing those modules. Part of that platform was an immutable container cluster, and every time we would improve on that container cluster, we would replace the clusters in every platform with fresh ones, and remove the old ones.

Consider the following workflow:

# Update local git repo with new Terraform code
$ cd platform-modules
$ git pull

# Move to implementation plans directory and run custom-built CLI tool 
# that renders a Terraform plan for a fresh cluster for the 'prd' environment
$ cd ../platform-plans
$ platform-cli create-plan -plan=platform/prd-2

# Run Terraform to create fresh cluster
$ cd platform/prd-2
$ terraform init && terraform apply

# Destroy the old 'prd-1' cluster
$ cd ../prd-1
$ terraform destroy

In quite a few cases this last step would fail because the new platform code wasn't matching the state of the existing cluster. This especially happened when removing resources from the codebase. I addressed this by having our platform-cli store a config file as part of the implementation plan, that contained the Git hash of the platform-modules repo so I could quickly checkout the appropriate commit for an existing cluster to make the destroy work.

In my opinion that's quite strange. Terraform has all the state, so it should know which resources to deal with, and how they relate. It should be able to retrieve from state which providers are necessary to run a successful 'destroy'. For instance, the original code may use the 'random', 'null', 'template' and 'aws' providers, but since the resulting resources are all on AWS, it would only need configuration for the 'aws' provider. It could then use provider settings from environment variables, existing Terraform configuration files, a Terraform vars file, or from user input when nothing was found.

In my opinion, it shouldn't need the code that resulted in the state to successfully run a 'destroy' action. It shouldn't even look at it (or at least not by default?). Although, it's entirely possible I'm overlooking cases where this is very much necessary.

bennycornelissen on 4 Oct 2018

👍2

I think a lot of the comments underestimate the complexity of a destroy command. For example, there is the use case of destroy provisioners. You can set a provisioner hook to run commands on destroy. To know about these, you'd need to process the config and determine if any resources have these types of requirements. Last I checked, destroy provisioners are stored in config and not state.

nbering on 4 Oct 2018

👍3

@nbering thanks for your insight. I hadn’t considered destroy provisioners myself. I do wonder what the downsides would be to storing destroy provisioner config in state. Perhaps they can contain sensitive data in some cases.

It looks like we both prove the point that this is a fairly complex issue 😉

bennycornelissen on 4 Oct 2018

I'm sure there's other edge-cases as well, but that one stands out as obvious if you know about it.

I'm not saying Terraform shouldn't be able to destroy infrastructure without having the config, given a tfstate file... I'm just pointing out there are cases where you _do_ need the config file, so it's not exactly black and white.

nbering on 5 Oct 2018

👍2

RaghuMeda on 23 May 2019

👍16

Any update on this? This issue is more practical with AWS provider. AWS release new features regularly and our template keeps changing. We don't want this new feature to be applied to existing resource and affect the user, so we create a new resource after destroying the old resource or run both in parallel and destroy the old one when everything is ok. This feature would help us do it.