Terraform: Taint multiple resource instances at once

Created on 1 May 2015  Â·  53Comments  Â·  Source: hashicorp/terraform

Hey guys,

I've got an AWS instance resource for my workers in a module (mymodule):

resource "aws_instance" "worker" {
  count = "12"
 ...
}

Is it possible to taint all 12 resources? I've tried the following:

❯ terraform taint -module=mymodule "aws_instance.worker.*"
The resource aws_instance.worker.* couldn't be found in the module root.mymodule.

Rather than having to do this:

❯ terraform taint -module=mymodule "aws_instance.worker.0"
The resource aws_instance.worker.0 in the module root.mymodule has been marked as tainted!

❯ terraform taint -module=mymodule "aws_instance.worker.1"
The resource aws_instance.worker.1 in the module root.mymodule has been marked as tainted!

...

Cheers,
Alex

cli enhancement

Most helpful comment

+1 for the ability to taint a whole module or multiple resources

I just got absolutely crushed by not having this ability to taint multiple resources. I was tainting an instance, and not realising that the downstream null resource file provisioners weren't working because they weren't tainted. face palm.

All 53 comments

+1

+1

+1

+1

+1

This seems like it wouldn't be too hard to implement.

The code that finds the resource to taint is here:
https://github.com/hashicorp/terraform/blob/6f9a358cc432e1b29da00ff364f741293743ff75/command/taint.go#L94

If there were a rule that the wildcard can only be a * and it may only be the entirety of the last part of the resource path (so e.g. no aws_instance.foo* or *.baz) then this would just entail iterating over the mod.Resources map looking for keys that have the right prefix.

+1

+1

+1

2444 is similar to this and would be awesome!

has there been any progress on this? this would be very helpful for us at Box.

+1

+1

+1

+1

+1

+1

+1

this would be really needed for resources created using count :)

+1

Very useful, especially with count.

+1

+1

+1

+1

+1

+1

+1

If wildcard is difficult to implement, then an intermediate improvement would be to allow taint on multiple resource at a time. This would let us take advantage of shell expansion and do stuff like

terraform taint aws_instance.worker.{1..12} <- does not work

+1, this would be very useful

Any kind of wildcard would be good!

+1

Hi all! Thanks for the interest here.

This still seems like a good idea, though it's not an immediate plan for the Terraform team at Hashcorp, due to other work taking priority. Since this issue seems to just be attracting +1 upvotes now, I'm going to lock the conversation to reduce the notification noise for the many subscribers to this issue. In the mean time, if anyone in the community has the time and motivation to work on this we'd be happy to review a PR!

In hashicorp/terraform#18404, @Aeolun shared the following PHP script to scrape the terraform show output and run terraform taint for each address found:

#!/usr/bin/php
<?php

exec("terraform show", $output);

if (!isset($argv[1])) exit("Need an argument to taint\n");
$pattern = $argv[1];

foreach($output as $line) {
        if (substr($line, -1, 1) == ":" && $line != 'Outputs:' && fnmatch($pattern, $line)) {
                passthru("terraform taint ".substr($line, 0, -1));
        }
}

I've created a commit with a proposal how it could look and while it works @apparentlymart pointed out that #12289 could fix it and that waits for v0.12 so until then all we can do is discussed the desired behaviour. I personally could really used wildcards for module names.

@blckct would it be possible for you to share some more detail about what your configuration looks like, what terraform taint commands you're running regularly in your workflow, and what higher-level goal you're trying to achieve by doing that?

I just want to get a better sense of what is motivating the use of terraform taint, since so far it's been designed as a very ancillary command that most users should never need, but it's clear from this thread that it's being used for some things that weren't intended and I'd like to understand better what those needs are so we can determine whether there's some other missing feature in Terraform that would be a better answer to those use-cases.

Thanks!

I personally use it while developing a script. If something goes wrong, the
system often still marks some steps completed (e.g. scripts).

Difficult to think of examples now, but to get the script part of my
provisioning to re run, I taint all hosts that require it.

Especially when working with a large number of hosts, tainting every one
individually takes a while.

On Sat, Aug 11, 2018, 02:21 Martin Atkins notifications@github.com wrote:

@blckct https://github.com/blckct would it be possible for you to share
some more detail about what your configuration looks like, what terraform
taint commands you're running regularly in your workflow, and what
higher-level goal you're trying to achieve by doing that?

I just want to get a better sense of what is motivating the use of terraform
taint, since so far it's been designed as a very ancillary command that
most users should never need, but it's clear from this thread that it's
being used for some things that weren't intended and I'd like to understand
better what those needs are so we can determine whether there's some other
missing feature in Terraform that would be a better answer to those
use-cases.

Thanks!

—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
https://github.com/hashicorp/terraform/issues/1768#issuecomment-412149055,
or mute the thread
https://github.com/notifications/unsubscribe-auth/ABEJQszTT3adKMvD3dsMT00y4ml5pGQTks5uPcEVgaJpZM4ENSKh
.

Thanks for sharing that, @Aeolun.

Just want to repeat back what I understood there to make sure: during development of a configuration, you may be writing a script to be run as a provisioner and that script may have bugs that cause it to produce an incorrect result, but it doesn't fail and so Terraform thinks provisioning was successful. You then use terraform taint to let Terraform know that it actually failed, even though it seemed to succeed.

That _is_ the intended use for terraform taint, but I'm surprised to hear that you'd be working with "a large number of hosts" during such development; that sounds expensive! Is that just a matter of convenience for you, or is there something specifically blocking you from doing your development on a smaller set of hosts (ideally just one) and then deploying it "for real" once you've verified that it's working correctly in your development environment?

Thanks again for sharing that use case!

Hi Martin,

Not really a 'large' number of hosts. But calling taint any number of times
over 1 multiple times per hour gets old fast ;)

In this case, I'm setting up a cluster that depends on 3 hosts to do it's
job correctly (so count is 3), the scripts are connecting them, so I need
at least 3 to confirm everything is working as expected (presumably I can
crank it up to any amount afterwards).

However, there's 2 or 3 dependent provisioning steps sometimes, so I end up
having to taint 3 x 3 items.

Originally I expected taint to work on the full name of a step and then
taint all hosts in that step, but I had to specify every host separately.

All the steps are actually using the same provisioner too, so I could
presumably just taint the provisioner if that was possible, and get
everything done with one call.

I now made a script that does so by first retrieving a list of all
steps/hosts and then glob-ing to see what to taint.

Hope that helps!

On Sat, Aug 11, 2018, 02:54 Martin Atkins notifications@github.com wrote:

Thanks for sharing that, @Aeolun https://github.com/Aeolun.

Just want to repeat back what I understood there to make sure: during
development of a configuration, you may be writing a script to be run as a
provisioner and that script may have bugs that cause it to produce an
incorrect result, but it doesn't fail and so Terraform thinks provisioning
was successful. You then use terraform taint to let Terraform know that
it actually failed, even though it seemed to succeed.

That is the intended use for terraform taint, but I'm surprised to hear
that you'd be working with "a large number of hosts" during such
development; that sounds expensive! Is that just a matter of convenience
for you, or is there something specifically blocking you from doing your
development on a smaller set of hosts (ideally just one) and then deploying
it "for real" once you've verified that it's working correctly in your
development environment?

Thanks again for sharing that use case!

—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
https://github.com/hashicorp/terraform/issues/1768#issuecomment-412158180,
or mute the thread
https://github.com/notifications/unsubscribe-auth/ABEJQjdZgTxcKauqIGaXZIN_fjA90w4Cks5uPcjigaJpZM4ENSKh
.

Otherwise your understanding of my situation is pretty much spot on.

Scripts finish correctly, but it turns out my cluster clients or hosts do
not correctly connect to each other, so it's a config change, taint, and
executing again.

Sometimes I break things badly enough that I just restart all steps
(including provisioning hosts), but generally that's not necessary.

On Sat, Aug 11, 2018, 09:10 Bart Riepe aeolun@gmail.com wrote:

Hi Martin,

Not really a 'large' number of hosts. But calling taint any number of
times over 1 multiple times per hour gets old fast ;)

In this case, I'm setting up a cluster that depends on 3 hosts to do it's
job correctly (so count is 3), the scripts are connecting them, so I need
at least 3 to confirm everything is working as expected (presumably I can
crank it up to any amount afterwards).

However, there's 2 or 3 dependent provisioning steps sometimes, so I end
up having to taint 3 x 3 items.

Originally I expected taint to work on the full name of a step and then
taint all hosts in that step, but I had to specify every host separately.

All the steps are actually using the same provisioner too, so I could
presumably just taint the provisioner if that was possible, and get
everything done with one call.

I now made a script that does so by first retrieving a list of all
steps/hosts and then glob-ing to see what to taint.

Hope that helps!

On Sat, Aug 11, 2018, 02:54 Martin Atkins notifications@github.com
wrote:

Thanks for sharing that, @Aeolun https://github.com/Aeolun.

Just want to repeat back what I understood there to make sure: during
development of a configuration, you may be writing a script to be run as a
provisioner and that script may have bugs that cause it to produce an
incorrect result, but it doesn't fail and so Terraform thinks provisioning
was successful. You then use terraform taint to let Terraform know that
it actually failed, even though it seemed to succeed.

That is the intended use for terraform taint, but I'm surprised to
hear that you'd be working with "a large number of hosts" during such
development; that sounds expensive! Is that just a matter of convenience
for you, or is there something specifically blocking you from doing your
development on a smaller set of hosts (ideally just one) and then deploying
it "for real" once you've verified that it's working correctly in your
development environment?

Thanks again for sharing that use case!

—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
https://github.com/hashicorp/terraform/issues/1768#issuecomment-412158180,
or mute the thread
https://github.com/notifications/unsubscribe-auth/ABEJQjdZgTxcKauqIGaXZIN_fjA90w4Cks5uPcjigaJpZM4ENSKh
.

Great, thanks for confirming!

@apparentlymart Well, for me it was because provisioning scripts weren't perfect so sometimes destroying everything was required during development. Like, tainting infra is rare where everything is already running but it's quite needed when you're trying to get it running.

We have a need for wildcard taint. We have several local files and heredoc generated config files and documentation that we don't check in. Although these are in state each developer tends to taint to get these generated on disc. We have a utility script to loop and taint. This takes some time as it retrieves and stores the state every taint (and leaves a state backup file on disc). Nice to have wildcard taint.

So how many more years we will be waiting for this feature? I've got failing scripts with one of the nodes, and I need to destroy 4 nodes in here, why I'm writing bash scripts for this? C'mon this is very important feature.

@apparentlymart I disagree, in production normally you need at least 3 nodes to destroy, because of automation you'd do with master nodes, which comes always in 3. One of them failing now, and recreating all of them makes sense in order to get proper outcome from ansible playbook from scratch.

I was looking for this kind of feature but don't see any for now.
Finally ended up putting on for loop.
for i in {1..20}; do terraform taint aws_instance.worker.$i ;done

:)

you can use terraform apply -parallelism=5 to provision 5 at a time.

@apparentlymart
My use case for taint is part of a migration of many resources from one state file to another (splitting a large pipeline). Because certain resources do not support terraform import, I'm running into conflicts when the destination pipeline runs (separate issue). My workaround (besides opening feature requests to add import support :) ) is to taint the "parent" resources to flush those "child" resources which cannot be imported. Specific example is aws_lambda_permission, so I need to taint the associated Lambda function.

Our code runs in multiple AWS accounts/regions with differing quantities of Lambda functions in each, so need to dynamically determine how many to taint. Scripting for consistency of course.

+1 for the ability to taint a whole module or multiple resources

I just got absolutely crushed by not having this ability to taint multiple resources. I was tainting an instance, and not realising that the downstream null resource file provisioners weren't working because they weren't tainted. face palm.

Is there a workaround for this in the meanwhile?

edit: in https://github.com/hashicorp/terraform/issues/23023#issuecomment-570077235 I found a suggestion to do a 'state rm' on all nodes. We can change that to do a taint on all elements.
terraform state list | cut -f 1 -d '[' | xargs -L 1 terraform taint

I needed to taint a whole module, and this function can help here too:

terraform-taint-all () {
  resource_prefix=$1
  for resource in $(terraform state list | grep -oP "$resource_prefix.*"); do
    terraform taint $resource;
  done
}

So if you want to taint all resources created with count, lets say a security_group_rule inside a module, just do:

terraform-taint-all module.mymodule.security_group_rule

@apparentlymart – In 2016, I implemented this feature. My PR was closed because–although it worked–it added tech debt to Terraform, because there was secret state work going on behind the scenes and my PR wouldn't blend well. Given that it's 4 years later, has the situation changed? If I were to implement this today, would my PR be closed again?

https://github.com/hashicorp/terraform/pull/10256

My use case is to taint the elements of the array, given its base name
The regex approach could be enhanced to add something like "\[\d+\]" to the regex ?!?

Also, the "state rm" sub-command supports it, so the code to do this would be somewhere in core already, me-thinks! :)

$ /opt/terraform-0.12/bin/terraform state list
aws_launch_configuration.worker[0]
aws_launch_configuration.worker[1]
aws_launch_configuration.worker[2]

Try to taint it ... no joy

$ /opt/terraform-0.12/bin/terraform taint aws_launch_configuration.worker

Error: No such resource instance

There is no resource instance in the state with the address
aws_launch_configuration.worker. If the resource configuration has just been
added, you must run "terraform apply" once to create the corresponding
instance(s) before they can be tainted.

But state rm works fine ...

$ /opt/terraform-0.12/bin/terraform state rm aws_launch_configuration.worker
Removed aws_launch_configuration.worker[0]
Removed aws_launch_configuration.worker[1]
Removed aws_launch_configuration.worker[2]
Successfully removed 3 resource instance(s).

could you grep and filter from -
terraform state list
possibly?

I've updated the title of the issue here to clarify the goal of tainting multiple resource instances at once, without specifying a particular solution (wildcard, or multiple addresses as proposed in https://github.com/hashicorp/terraform/issues/22117). This is not any declaration of intent on the part of the Core team, but some issue consolidation on our side.

Taint by tag?

Was this page helpful?
0 / 5 - 0 ratings

Related issues

shubhambhartiya picture shubhambhartiya  Â·  72Comments

ncraike picture ncraike  Â·  77Comments

oillio picture oillio  Â·  78Comments

kforsthoevel picture kforsthoevel  Â·  86Comments

atkinchris picture atkinchris  Â·  68Comments