Terraform: Re-attaching EBS volumes to new EC2 instances...

Created on 15 Jul 2015 · 15Comments · Source: hashicorp/terraform

Hey guys,

I have a scenario where if I decide to bring down an EC2 instance in order to update my Elasticsearch version to a new version (this setup is for my Elasticsearch cluster on AWS), I would like to detach my EBS volume and re-attach it to the new EC2 instance with the updated Elasticsearch AMI.

I believe I can already detach an EBS volume on EC2 destruction this way:

ebs_block_device {
    device_name = "/dev/sdh"
    volume_type = "io1"
    iops = "4000"
    volume_size = "500"
    **delete_on_termination = "false"**
  }
}

However, I'm still trying to find out how to re-attach this existing EBS volume to a new EC2 instance in Terraform prior to bringing that EC2 instance/node back into my cluster.

Can you guys please help me with this?

Thanks,
Ben.

bug provideaws

Source

bdesilva

Most helpful comment

After much testing, the most reliable solution has turned out to be using a provisioner on the aws_volume_attachment. No matter how long it takes to bring up the aws_instance, the attachment will not be mounted until the host is booted and SSH is available for provisioning.

resource "aws_volume_attachment" "riak" {
  skip_destroy = true
  provisioner "remote-exec" {
    script = "attach-data-volume.sh"
    connection {
      host = "${aws_instance.riak.public_ip}"
    }
  }
}

#!/bin/bash

devpath=$(readlink -f /dev/sdh)

sudo file -s $devpath | grep -q ext4
if [[ 1 == $? && -b $devpath ]]; then
  sudo mkfs -t ext4 $devpath
fi

sudo mkdir /data
sudo chown riak:riak /data
sudo chmod 0775 /data

echo "$devpath /data ext4 defaults,nofail,noatime,nodiratime,barrier=0,data=writeback 0 2" | sudo tee -a /etc/fstab > /dev/null
sudo mount /data

# TODO: /etc/rc3.d/S99local to maintain on reboot
echo deadline | sudo tee /sys/block/$(basename "$devpath")/queue/scheduler
echo never | sudo tee /sys/kernel/mm/transparent_hugepage/enabled

aiwilliams on 22 Mar 2017

👍17

All 15 comments

Same issue here.

I also tried to create an aws_ebs_volume and attach it with aws_volume_attachment. But here I don't see how we can bootstrap (mkfs + mount) the disk with cloudinit or a provisioning script.

Pryz on 23 Jul 2015

I've managed to reattach a volume (including mounting) using the user_data field:

user_data = "#!/bin/bash\nmkdir /data; mount /dev/xvdh /data;

resource "aws_instance" "web" {
  ami = "ami-b7f0f987"
  instance_type = "t2.micro"
  availability_zone = "${var.aws_region}a"
  vpc_security_group_ids = ["${aws_security_group.default.id}"]
  iam_instance_profile = "ecsInstanceRole"
  key_name = "us-west-ecs"
  user_data = "#!/bin/bash\nmkdir /data; mount /dev/xvdh /data; service docker restart; echo 'ECS_CLUSTER=${aws_ecs_cluster.cms.name}\nECS_ENGINE_AUTH_TYPE=dockercfg\nECS_ENGINE_AUTH_DATA={\"${var.registry}\": {\"auth\": \"${var.auth}\",\"email\": \"${var.email}\"}}' >> /etc/ecs/ecs.config;"
}

# Attach DBMS Volume
resource "aws_volume_attachment" "ebs_att" {
  device_name = "/dev/xvdh"
  volume_id = "<volume-id to reattach>"
  instance_id = "${aws_instance.web.id}"

}

If one wants to use an EBS volume attached to a docker container, like in the example above. Make sure service docker restart is included. One _has to_ restart the docker daemon.
Otherwise the attached volume is not used, unfortunately the aws init process won't tell you that.

johannesboyne on 3 Sep 2015

No update on this ?

Should we use aws_ebs_volume instead ?

Pryz on 4 Sep 2015

Regarding your question

But here I don't see how we can bootstrap (mkfs + mount) the disk with cloudinit or a provisioning script

I'm pretty sure you could bootstrap it via the user_data field, doing something like:

#!/bin/bash
mkfs -t ext4 /dev/xvdh #boostrapping
mkdir /data #create mount point
mount /dev/xvdh /data #mount it
#service docker restart (if you want to use it as a docker volume)

Should we use aws_ebs_volume instead ?

Well, I guess for the creation it would be correct

Does this helps you?

johannesboyne on 4 Sep 2015

Well, you can do that if you are using an "aws_ebs_volume" resource. But not an "ebs_block_device" insinde an "aws_instance" resource.

I was hoping doing the bootstrap of the ebs volume outside of CloudInit since I have a lot of difference instances and most of the time the only differences are about EBS.

Pryz on 4 Sep 2015

O.k., now I see your point. It's an interesting question.

johannesboyne on 4 Sep 2015

Using the proposition of @phinze for now. See : https://github.com/hashicorp/terraform/pull/2050

Pryz on 4 Sep 2015

Hey all – I believe this issue has been resolved with aws_volume_attachment. Mounting needs to be done in the user_data section as mentioned. Thanks!

catsby on 2 Dec 2015

I have an existing volume that I want to attach to an Amazon Linux instance and ensure that upon reboot it will be reattached.

riak-user-data.sh:

#!/bin/bash

mkdir /data
echo '/dev/sdh /data ext4 defaults,nofail,noatime,nodiratime,barrier=0,data=writeback 0 2' >> /etc/fstab
mount -a

resource "aws_instance" "riak" {
  ...

  user_data = "${file("riak-user-data.sh")}"

  provisioner "remote-exec" {
    inline = [
      "sudo mkdir -m 0755 -p /etc/ansible/facts.d",
    ]
    connection {
      user = "ec2-user"
      bastion_host = "${data.terraform_remote_state.vpc.jumphost_eip_public_ip}"
      bastion_user = "ubuntu"
    }
  }
}

resource "aws_ebs_volume" "riak" { ... }

resource "aws_volume_attachment" "riak" {
  device_name = "/dev/sdh"
  volume_id   = "${aws_ebs_volume.riak.id}"
  instance_id = "${aws_instance.riak.id}"
}

The process runs as follows, eventually timing out because the remote-exec provisioner is never able to connect:

aws_instance.riak: Creating...
aws_instance.riak: Still creating... (10s elapsed)
aws_instance.riak (remote-exec): Using configured bastion host...
aws_instance.riak (remote-exec):   Host: 34.XXX.XXX.XXX
aws_instance.riak (remote-exec):   User: ubuntu
aws_instance.riak (remote-exec):   Password: false
aws_instance.riak (remote-exec):   Private key: false
aws_instance.riak (remote-exec):   SSH Agent: true
...
aws_instance.riak: Still creating... (5m30s elapsed)
Error applying plan:

1 error(s) occurred:

* aws_instance.riak: 1 error(s) occurred:

* timeout

It seems that the user_data is leading to a system that fails to start SSH; likely the system is failing to completely boot (a clear warning about fstab in AWS docs). When I remove the specified user_data the machine will successfully boot and then we see:

aws_volume_attachment.riak: Creating...
  device_name:  "" => "/dev/sdh"
  force_detach: "" => "<computed>"
  instance_id:  "" => "i-01c2d24bbeecb4586"
  skip_destroy: "" => "true"
  volume_id:    "" => "vol-0d4d48ac7cdef77ec"
aws_volume_attachment.riak: Still creating... (10s elapsed)
aws_volume_attachment.riak: Still creating... (20s elapsed)
aws_volume_attachment.riak: Creation complete (ID: vai-3330874248)

Apply complete! Resources: 2 added, 0 changed, 0 destroyed.

Of course, the volume is attached but not mounted.

My questions is, how can any cloud-init user_data that attempts to mount the device in any way ever successfully execute in a reliable way considering that the attachment cannot occur until after the instance ID is obtained? I suppose that if I have no remote-exec then the API call to create the instance will return fast enough with an instance ID so that the attachment API call can be made before the cloud-init user_data is executed?

I would appreciate knowing if I am thinking of this all wrong. Thanks!

aiwilliams on 21 Mar 2017

resource "aws_volume_attachment" "riak" {
  skip_destroy = true
  provisioner "remote-exec" {
    script = "attach-data-volume.sh"
    connection {
      host = "${aws_instance.riak.public_ip}"
    }
  }
}

#!/bin/bash

devpath=$(readlink -f /dev/sdh)

sudo file -s $devpath | grep -q ext4
if [[ 1 == $? && -b $devpath ]]; then
  sudo mkfs -t ext4 $devpath
fi

sudo mkdir /data
sudo chown riak:riak /data
sudo chmod 0775 /data

echo "$devpath /data ext4 defaults,nofail,noatime,nodiratime,barrier=0,data=writeback 0 2" | sudo tee -a /etc/fstab > /dev/null
sudo mount /data

# TODO: /etc/rc3.d/S99local to maintain on reboot
echo deadline | sudo tee /sys/block/$(basename "$devpath")/queue/scheduler
echo never | sudo tee /sys/kernel/mm/transparent_hugepage/enabled

aiwilliams on 22 Mar 2017

👍17

Ran with @aiwilliams's example a bit:

https://github.com/GSA/devsecops-example/blob/03067f68ee2765f8477ae84235f7faa1d2f2cb70/terraform/files/attach-data-volume.sh

afeld on 1 Sep 2017

Is @aiwilliams approach still considered the best way to do this? Notably, this doesn't work well (at all?) if your aws_instance is in a private subnet.

mwakerman on 20 Mar 2018

@mwakerman i suppose you can set a bastion_host in your provisioner

``` provisioner "remote-exec" {
script = "attach-data-volume.sh"
connection {
host = "${aws_instance.riak.public_ip}"
bastion_host = xxxxx

}
```

i have the same issue and just discovered that provisionners can be attached to any resource, not just the instance, and that's going to fix a big issue for me.

earzur on 21 Mar 2018

👍2

~~Thanks @earzur, that should work for us.~~

So it looks like terraform doesn't allow you to use a passphrase encrypted private_key in the connection block of the remote-exec provisioner. May be able to temporarily add and then revoke unencrypted keys but might also try and do the mkfs and mount in user-data after polling until the EBS volume has been attached.

Ended up doing it all in user-data by giving the instance an IAM role that included anec2:Describe* policy and waiting until the EBS volume attaches with (credit):

~~while [ -e /dev/xvdh ] ; do sleep 1 ; done~~

EC2_INSTANCE_ID=$(wget -q -O - http://169.254.169.254/latest/meta-data/instance-id || die \"wget instance-id has failed: $?\")
EC2_AVAIL_ZONE=$(wget -q -O - http://169.254.169.254/latest/meta-data/placement/availability-zone || die \"wget availability-zone has failed: $?\")
EC2_REGION="`echo \"$EC2_AVAIL_ZONE\" | sed -e 's:\([0-9][0-9]*\)[a-z]*\$:\\1:'`"

#############
# EBS VOLUME
#
# note: /dev/sdh => /dev/xvdh
# see: https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/device_naming.html
#############

# wait for EBS volume to attach
DATA_STATE="unknown"
until [ $DATA_STATE == "attached" ]; do
    DATA_STATE=$(aws ec2 describe-volumes \
        --region $${EC2_REGION} \
        --filters \
            Name=attachment.instance-id,Values=$${EC2_INSTANCE_ID} \
            Name=attachment.device,Values=/dev/sdh \
        --query Volumes[].Attachments[].State \
        --output text)
    echo 'waiting for volume...'
    sleep 5
done

echo 'EBS volume attached!'

mwakerman on 22 Mar 2018

👍4 🎉1

I'm going to lock this issue because it has been closed for _30 days_ ⏳. This helps our maintainers find and focus on the active issues.

If you have found a problem that seems similar to this, please open a new issue and complete the issue template so we can capture all the details necessary to investigate further.