here's the scenario: (using latest 0.6.11)
have a cluster of aws_instance with a count
each instance has an aws_ebs_volume and a corresponding aws_volume_attachment, each using the same count (obviously)
all is well for the initial plan/apply
Now, increase the count by 1. Expect to simply add another instance with its ebs volume and attachment.
Instead, Terraform wants to force new resource for ALL the volume attachments. not good!
here's an example (ive removed some of the irrelevant detail so this might now work as is)
resource "aws_instance" "kube_worker" {
count = "5"
ami = "ami-something"
instance_type = "t2.micro"
availability_zone = "us-west-2a"
subnet_id = "sn-something"
}
resource "aws_ebs_volume" "docker" {
count = "5"
availability_zone = "us-west-2a"
type = "gp2"
size = "10"
}
resource "aws_volume_attachment" "docker" {
count = "5"
device_name = "/dev/xvdd"
volume_id = "${element(aws_ebs_volume.docker.*.id, count.index)}"
instance_id = "${element(aws_instance.kube_worker.*.id, count.index)}"
}
if you plan/apply this, then change the 5's to 6's and re-plan, you get something that wants to force a new resource fo the first 5 volume attachments, because it thinks the instance_id and volume_id have changed (which they have not, obviously).
( I unfortunately did not save the actual log.)
This of course fails, because the volumes are still there and attached and Terraform cannot re-attach them.
My only recourse was to taint the existing instances and rebuild them all. This is bad, as I would like to be able to non-disruptively add a new node to my Kubernetes cluster using Terraform. I used to be able to do this before I had these volume attachments on each node.
another error case:
Build the instances as above. Then taint aws_instance.kube_worker.0 and plan. Plan shows this output:
-/+ aws_volume_attachment.docker.0
device_name: "/dev/xvdd" => "/dev/xvdd"
force_detach: "" => "<computed>"
instance_id: "i-b316bf6b" => "${element(aws_instance.kube_worker.*.id, count.index)}" (forces new resource)
volume_id: "vol-77d8fcb7" => "vol-77d8fcb7"
-/+ aws_volume_attachment.docker.1
device_name: "/dev/xvdd" => "/dev/xvdd"
force_detach: "" => "<computed>"
instance_id: "i-b216bf6a" => "${element(aws_instance.kube_worker.*.id, count.index)}" (forces new resource)
volume_id: "vol-49d8fc89" => "vol-49d8fc89"
Notice that it wants to rebuild the aws_volume_attachment for aws_instance.kube_worker.1 which it should not do.
Running plan causes:
Error applying plan:
1 error(s) occurred:
* aws_volume_attachment.docker.1: Error waiting for Volume (vol-49d8fc89) to detach from Instance: i-b216bf6a
because the instance at count 1 is still running and has the volume attached.
To make this work, I have toterminate _all_ the aws_instance.kube_worker.* instances on the AWS console. Running terraform taint or terraform destroy on the instances does not work.
I think this issue is caused by the problem reported in #2957.
I'm seeing the same issue, while using the cloudstack provider. Every time I increase the count, terraform wants to update all resources (not just the new one).
I've had same issue with EBS vols, which I could work around by moving from separate EBS resource definition to incorporating it into aws_instance resource. Then it started to happen with EIP which can't be defined within aws_instance. The work around that seem to work so far is adding ignore_changes for the attribute that appears in "Mismatch reason". For me it was adding this block to aws_eip definition :
lifecycle {
ignore_changes = [ "instance" ]
}
@hsergei thank you for posting your workaround (p.s. it works equally as well for aws_volume_attachment if you do ignore_changes = [ "volume", "instance" ])
I also used the user_data but added a check if the device needs to be formatted, in order to avoid accidental data loss:
function format_if_necessary() {
echo "$(date '+%Y-%m-%d %H:%M:%S') format_if_necessary ${1}" >> ~/user-data.log 2>&1
# return if $1 is not a block device path
[ -b ${1} ] || return 1
# format the block device if it isn't already
[ $(sudo blkid ${1} > /dev/null 2>&1; echo $?) = 0 ] || sudo mkfs -t ext4 ${1} >> ~/user-data.log 2>&1
}
@PaulCapestany Your solution of using ignore_changes = ["volume", "instance"] does not work when I destroy an instance. Terraform has the correct plan: destroy instance and destroy it's volume_attachment. terraform apply fails: "aws_volume_attachment.attach.2: Error waiting for Volume (vol-xyz) to detach from Instance"
Did you try to destroy an instance and see the expected result: instance and attachment destroyed, volume detached?
No I did not test destroy. I guess you can always remove ignore_changes and let TF use its default behavior
@hsergei @PaulCapestany : I succeeded only by stopping the instance I want to destroy via console.
terraform plan => destroy instance[3] and attach[3] (good plan!)
terraform apply => error: attach[3] waiting for volume to detach
result: instance is unchanged; volume is in "busy" state; attach[3] ??
via console: stop instance
terraform plan => destroy instance[3] (looks like attach[3] was destroyed)
terraform apply => success: instance[3] destroyed
volume[3] is in "available" state
looks like terraform does not stop instance; this causes volume to move from "in-use" to "busy"
and attach is destroyed because the second terraform plan did not have to destroy it so it must have been destroyed on the first terraform plan
verified: terraform's order of destroying resources:
would be better to destroy the instance before destroying the attachment so that if
destroying instance failed, attachment still exists
the bug with aws_volume_attachment is that the destroy does not unmount the volume from a running instance
apparently stopping the instance helps terraform unmount the volume and the instance can then be destroyed
http://docs.aws.amazon.com/AWSEC2/latest/UserGuide/ebs-detaching-volume.html
Use destroy provisioners to unmount volumes.
I'm hitting this too. I would like Terraform to just destroy the instances (causing the volumes to be detached "naturally") but apparently this is not possible, because terraform never gets to actually destroying the instance because it wants to destroy the attachment first, which will never work.
Stopping the instance first helps, but I don't see why it should be necessary.
@c4milo : please give an example with a code snippet. in the docs, I don't know which provisioner you might be talking about. Thanks.
https://www.terraform.io/docs/provisioners/index.html#destroy-time-provisioners
I'm going to lock this issue because it has been closed for _30 days_ โณ. This helps our maintainers find and focus on the active issues.
If you have found a problem that seems similar to this, please open a new issue and complete the issue template so we can capture all the details necessary to investigate further.
Most helpful comment
@hsergei thank you for posting your workaround (p.s. it works equally as well for
aws_volume_attachmentif you doignore_changes = [ "volume", "instance" ])