Terraform: provider/aws: Changing count of instances with volume attachments causes all attachments to be forced to new resources

Created on 22 Feb 2016 · 15Comments · Source: hashicorp/terraform

here's the scenario: (using latest 0.6.11)

have a cluster of aws_instance with a count

each instance has an aws_ebs_volume and a corresponding aws_volume_attachment, each using the same count (obviously)

all is well for the initial plan/apply

Now, increase the count by 1. Expect to simply add another instance with its ebs volume and attachment.

Instead, Terraform wants to force new resource for ALL the volume attachments. not good!

here's an example (ive removed some of the irrelevant detail so this might now work as is)

resource "aws_instance" "kube_worker" {
  count = "5"
  ami = "ami-something"
  instance_type = "t2.micro"
  availability_zone = "us-west-2a"
  subnet_id = "sn-something"
}

resource "aws_ebs_volume" "docker" {
  count = "5"
  availability_zone = "us-west-2a"
  type = "gp2"
  size = "10"
}

resource "aws_volume_attachment" "docker" {
  count = "5"
  device_name = "/dev/xvdd"
  volume_id = "${element(aws_ebs_volume.docker.*.id, count.index)}"
  instance_id = "${element(aws_instance.kube_worker.*.id, count.index)}"
}

if you plan/apply this, then change the 5's to 6's and re-plan, you get something that wants to force a new resource fo the first 5 volume attachments, because it thinks the instance_id and volume_id have changed (which they have not, obviously).

( I unfortunately did not save the actual log.)

This of course fails, because the volumes are still there and attached and Terraform cannot re-attach them.

My only recourse was to taint the existing instances and rebuild them all. This is bad, as I would like to be able to non-disruptively add a new node to my Kubernetes cluster using Terraform. I used to be able to do this before I had these volume attachments on each node.

bug provideaws

Source

SpencerBrown

👍17

Most helpful comment

@hsergei thank you for posting your workaround (p.s. it works equally as well for aws_volume_attachment if you do ignore_changes = [ "volume", "instance" ])

PaulCapestany on 30 Sep 2016

🎉4 👍1

All 15 comments

another error case:

Build the instances as above. Then taint aws_instance.kube_worker.0 and plan. Plan shows this output:

-/+ aws_volume_attachment.docker.0
    device_name:  "/dev/xvdd" => "/dev/xvdd"
    force_detach: "" => "<computed>"
    instance_id:  "i-b316bf6b" => "${element(aws_instance.kube_worker.*.id, count.index)}" (forces new resource)
    volume_id:    "vol-77d8fcb7" => "vol-77d8fcb7"

-/+ aws_volume_attachment.docker.1
    device_name:  "/dev/xvdd" => "/dev/xvdd"
    force_detach: "" => "<computed>"
    instance_id:  "i-b216bf6a" => "${element(aws_instance.kube_worker.*.id, count.index)}" (forces new resource)
    volume_id:    "vol-49d8fc89" => "vol-49d8fc89"

Notice that it wants to rebuild the aws_volume_attachment for aws_instance.kube_worker.1 which it should not do.

Running plan causes:

Error applying plan:

1 error(s) occurred:

* aws_volume_attachment.docker.1: Error waiting for Volume (vol-49d8fc89) to detach from Instance: i-b216bf6a

because the instance at count 1 is still running and has the volume attached.

To make this work, I have toterminate _all_ the aws_instance.kube_worker.* instances on the AWS console. Running terraform taint or terraform destroy on the instances does not work.

SpencerBrown on 26 Feb 2016

👍3

I think this issue is caused by the problem reported in #2957.

SpencerBrown on 28 Feb 2016

I'm seeing the same issue, while using the cloudstack provider. Every time I increase the count, terraform wants to update all resources (not just the new one).

miguelaferreira on 17 Mar 2016

I've had same issue with EBS vols, which I could work around by moving from separate EBS resource definition to incorporating it into aws_instance resource. Then it started to happen with EIP which can't be defined within aws_instance. The work around that seem to work so far is adding ignore_changes for the attribute that appears in "Mismatch reason". For me it was adding this block to aws_eip definition :

  lifecycle {
    ignore_changes = [ "instance" ]
  }

hsergei on 22 Sep 2016

@hsergei thank you for posting your workaround (p.s. it works equally as well for aws_volume_attachment if you do ignore_changes = [ "volume", "instance" ])

PaulCapestany on 30 Sep 2016

🎉4 👍1

I also used the user_data but added a check if the device needs to be formatted, in order to avoid accidental data loss:

function format_if_necessary() {
    echo "$(date '+%Y-%m-%d %H:%M:%S') format_if_necessary ${1}" >> ~/user-data.log 2>&1
    # return if $1 is not a block device path
    [ -b ${1} ] || return 1
    # format the block device if it isn't already
    [ $(sudo blkid ${1} > /dev/null 2>&1; echo $?) = 0 ] || sudo mkfs -t ext4 ${1} >> ~/user-data.log 2>&1
}

derFunk on 17 Mar 2017

@PaulCapestany Your solution of using ignore_changes = ["volume", "instance"] does not work when I destroy an instance. Terraform has the correct plan: destroy instance and destroy it's volume_attachment. terraform apply fails: "aws_volume_attachment.attach.2: Error waiting for Volume (vol-xyz) to detach from Instance"

Did you try to destroy an instance and see the expected result: instance and attachment destroyed, volume detached?

LeslieK on 20 Aug 2017

No I did not test destroy. I guess you can always remove ignore_changes and let TF use its default behavior

hsergei on 20 Aug 2017

@hsergei @PaulCapestany : I succeeded only by stopping the instance I want to destroy via console.
terraform plan => destroy instance[3] and attach[3] (good plan!)
terraform apply => error: attach[3] waiting for volume to detach
result: instance is unchanged; volume is in "busy" state; attach[3] ??
via console: stop instance
terraform plan => destroy instance[3] (looks like attach[3] was destroyed)
terraform apply => success: instance[3] destroyed
volume[3] is in "available" state

looks like terraform does not stop instance; this causes volume to move from "in-use" to "busy"
and attach is destroyed because the second terraform plan did not have to destroy it so it must have been destroyed on the first terraform plan

verified: terraform's order of destroying resources:

volume_attachment
instance
error happened after volume_attachment was destroyed but before instance was destroyed

would be better to destroy the instance before destroying the attachment so that if
destroying instance failed, attachment still exists

LeslieK on 21 Aug 2017

the bug with aws_volume_attachment is that the destroy does not unmount the volume from a running instance
apparently stopping the instance helps terraform unmount the volume and the instance can then be destroyed
http://docs.aws.amazon.com/AWSEC2/latest/UserGuide/ebs-detaching-volume.html

LeslieK on 21 Aug 2017

Use destroy provisioners to unmount volumes.

c4milo on 21 Aug 2017

I'm hitting this too. I would like Terraform to just destroy the instances (causing the volumes to be detached "naturally") but apparently this is not possible, because terraform never gets to actually destroying the instance because it wants to destroy the attachment first, which will never work.

Stopping the instance first helps, but I don't see why it should be necessary.

oranenj on 21 Aug 2017

@c4milo : please give an example with a code snippet. in the docs, I don't know which provisioner you might be talking about. Thanks.

LeslieK on 21 Aug 2017

https://www.terraform.io/docs/provisioners/index.html#destroy-time-provisioners

c4milo on 21 Aug 2017

I'm going to lock this issue because it has been closed for _30 days_ ⏳. This helps our maintainers find and focus on the active issues.

If you have found a problem that seems similar to this, please open a new issue and complete the issue template so we can capture all the details necessary to investigate further.