Terraform-provider-aws: Creation of aws_instance with from ebs_block_device disks order

Created on 18 Jun 2018  路  19Comments  路  Source: hashicorp/terraform-provider-aws

_This issue was originally opened by @davivcgarcia as hashicorp/terraform#18271. It was migrated here as a result of the provider split. The original body of the issue is below._


Terraform Version

$ terraform -v
Terraform v0.11.7
+ provider.aws v1.22.0

Terraform Configuration Files


resource "aws_instance" "k8s_node" {
  ami           = "${data.aws_ami.default.id}"
  instance_type = "m5.xlarge"
  key_name      = "${aws_key_pair.default.key_name}"

  subnet_id              = "${aws_subnet.main_us-east-1a.id}"
  vpc_security_group_ids = ["${aws_security_group.default.id}"]

  root_block_device {
    volume_size = "40"
    volume_type = "standard"
  }

  ebs_block_device {
    device_name = "/dev/sdb"
    volume_size = "80"
    volume_type = "standard"
  }

  ebs_block_device {
    device_name = "/dev/sdc"
    volume_size = "250"
    volume_type = "standard"
  }

  tags {
    Name = "k8s-node"
  }
}

Expected Behavior

The resources should have the primary/boot disk (nvme0n1) of 40GB, a secondary disk (nvme1n1) of 80GB and a tertiary disk (nvme2n1) of 250GB.

Actual Behavior

Terraform creates the instance with wrong disk order, being the secondary disk (nvme1n1) of 250GB and the tertiary disk (nvme2n1) of 80GB.

Steps to Reproduce

  1. terraform init
  2. terraform apply

Output

aws_instance.k8s_node: Creating...
  ami:                                               "" => "ami-950e95ea"
  associate_public_ip_address:                       "" => "<computed>"
  availability_zone:                                 "" => "<computed>"
  ebs_block_device.#:                                "" => "2"
  ebs_block_device.2554893574.delete_on_termination: "" => "true"
  ebs_block_device.2554893574.device_name:           "" => "/dev/sdc"
  ebs_block_device.2554893574.encrypted:             "" => "<computed>"
  ebs_block_device.2554893574.snapshot_id:           "" => "<computed>"
  ebs_block_device.2554893574.volume_id:             "" => "<computed>"
  ebs_block_device.2554893574.volume_size:           "" => "250"
  ebs_block_device.2554893574.volume_type:           "" => "standard"
  ebs_block_device.2576023345.delete_on_termination: "" => "true"
  ebs_block_device.2576023345.device_name:           "" => "/dev/sdb"
  ebs_block_device.2576023345.encrypted:             "" => "<computed>"
  ebs_block_device.2576023345.snapshot_id:           "" => "<computed>"
  ebs_block_device.2576023345.volume_id:             "" => "<computed>"
  ebs_block_device.2576023345.volume_size:           "" => "80"
  ebs_block_device.2576023345.volume_type:           "" => "standard"
  ephemeral_block_device.#:                          "" => "<computed>"
  get_password_data:                                 "" => "false"
  instance_state:                                    "" => "<computed>"
  instance_type:                                     "" => "m5.xlarge"
  ipv6_address_count:                                "" => "<computed>"
  ipv6_addresses.#:                                  "" => "<computed>"
  key_name:                                          "" => "default"
  network_interface.#:                               "" => "<computed>"
  network_interface_id:                              "" => "<computed>"
  password_data:                                     "" => "<computed>"
  placement_group:                                   "" => "<computed>"
  primary_network_interface_id:                      "" => "<computed>"
  private_dns:                                       "" => "<computed>"
  private_ip:                                        "" => "<computed>"
  public_dns:                                        "" => "<computed>"
  public_ip:                                         "" => "<computed>"
  root_block_device.#:                               "" => "1"
  root_block_device.0.delete_on_termination:         "" => "true"
  root_block_device.0.volume_id:                     "" => "<computed>"
  root_block_device.0.volume_size:                   "" => "40"
  root_block_device.0.volume_type:                   "" => "standard"
  security_groups.#:                                 "" => "<computed>"
  source_dest_check:                                 "" => "true"
  subnet_id:                                         "" => "subnet-036d839562552db17"
  tags.%:                                            "" => "2"
  tags.Name:                                         "" => "k8s_node"
  tenancy:                                           "" => "<computed>"
  volume_tags.%:                                     "" => "<computed>"
  vpc_security_group_ids.#:                          "" => "1"
  vpc_security_group_ids.2684253548:                 "" => "sg-0a12ea76c68402986"
$ lsblk 
NAME        MAJ:MIN RM  SIZE RO TYPE MOUNTPOINT
nvme0n1     259:2    0   40G  0 disk 
鈹溾攢nvme0n1p1 259:3    0    1M  0 part 
鈹斺攢nvme0n1p2 259:4    0   40G  0 part /
nvme1n1     259:0    0  250G  0 disk 
nvme2n1     259:1    0   80G  0 disk 
question servicec2

Most helpful comment

I can confirm I'm running into this as well and I don't even use Terraform.

I'm experiencing out-of-order device names when upgrading from Ubuntu 14.04 -> 18.04 (images based off the official AMI).

For me I only have 2 EBS block devices, a boot and a data and even then the devices are out of order.

My provisioning system expects that /dev/nvme0n1 be root and /dev/nvme1n1 be data.

Disk /dev/nvme0n1: 120 GiB, 128849018880 bytes, 251658240 sectors
Units: sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes


Disk /dev/nvme1n1: 8 GiB, 8589934592 bytes, 16777216 sectors
Units: sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
Disklabel type: dos
Disk identifier: 0x34a452b2

All 19 comments

@davivcgarcia in some cases you might need to reference non-root device names by /dev/xvd_ instead of /dev/sd_, e.g. /dev/xvdb instead of /dev/sdb. It depends on the AMI. The ordering of the ebs_block_device configurations in the Terraform configuration does not determine any sort of ordering with the instance disks. If the AMI has the information baked in, you can see it with the AWS CLI ec2 describe-images

@bflad I'm trying to use an m5.xlarge instance and its naming is /dev/nvme[0-26]n1(https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/device_naming.html) but Terraform says it's invalid. I already tried /dev/sd[b-y] and /dev/xvd[b-y] as device_name and the behavior is the same.

I have the same problem trying to create m5.xlarge instance with 4 devices plus root. With /dev/sdX and /dev/xvdX order is not as declared, with /dev/nvmeXn1 I got terraform error:

* aws_instance.new-instance: Error launching source instance: InvalidBlockDeviceMapping: Invalid device name /dev/nvme2n1
    status code: 400, request id: XXXXXXXXXXXXXXXXX

I replaced the ebs_block_device scope in the aws_instance resource to standalone aws_ebs_volume resources with recommended device names (/dev/sd[f-p]), but the problem is still the same. I also tried to create these instances manually at AWS, and the problem is still the same.

resource "aws_volume_attachment" "infra_docker_ebs_attach" {
  device_name = "/dev/sdf"
  volume_id   = "${aws_ebs_volume.infra_docker_ebs.*.id["${count.index}"]}"
  instance_id = "${aws_instance.ocp_infra_node.*.id["${count.index}"]}"
  count       = "${aws_instance.ocp_infra_node.count}"
}

resource "aws_ebs_volume" "infra_docker_ebs" {
  availability_zone = "us-east-1a"
  size              = "80"
  type              = "standard"
  count             = "${aws_instance.ocp_infra_node.count}"
}

resource "aws_volume_attachment" "infra_gluster_ebs_attach" {
  device_name = "/dev/sdp"
  volume_id   = "${aws_ebs_volume.infra_gluster_ebs.*.id["${count.index}"]}"
  instance_id = "${aws_instance.ocp_infra_node.*.id["${count.index}"]}"
  count       = "${aws_instance.ocp_infra_node.count}"
}

resource "aws_ebs_volume" "infra_gluster_ebs" {
  availability_zone = "us-east-1a"
  size              = "300"
  type              = "standard"
  count             = "${aws_instance.ocp_infra_node.count}"
}

I changed my instance type from m5.xlarge to t2.xlarge and the order was satisfied properly by Terraform. In summary, I think this issue is on AWS side, not Terraform AWS Provider.

Same issue here. I'm creating a couple of volumes from snapshots using separate aws_ebs_volume resources, so my root device isn't an issue but the two additional volumes still cause problems. As others have done, I tried device naming in Terraform with /dev/sd_ and /dev/xvd_ but the resulting order seems to be random.

Also tried using depends_on to always create and mount one resource before the other (slowing my provisioning down but was an acceptable sacrifice), but this didn't work either.

As I'm creating from snapshots, I've just ended up labelling the devices first (e.g. xfs_admin -L vol1 /dev/nvme1n1), then when creating volumes from new snapshots I can mount them based on the label.

I ran into this same issue and discussions with AWS have uncovered that the ordering of disk device naming is not guaranteed to remain the same as defined at build time. This has to do with device discovery by the AMI, the order they are discovered determines the device name assigned.

This is definitely new behavior starting with the nvme* disks. I have had to implement some custom scripting that runs from user-data to map the devices as defined in terraform to the actual mount points on the host. It means you can't use /dev/nvme1n1 or similar in fstab anymore either, you must use UUID to ensure proper mounting.

I'll add my voice here - it's the same for aws_launch_configuration too. It doesn't matter whether one uses the sdX or xvdX nomenclature, or what the ebs_block_device ordering is in the resource. Block-device ordering on the actual machine is consistent but out of order.

This seems to have been the case for a while. I back-revisioned to a 1.21.0 binary I had and it still creates the disks out of order. The difference is that the older instance types that still use SCSI emulation (e.g., t2.large) respected the device names Terraform provides. The new instance types that default to /dev/nvmeXp1 do not, however - they're strictly named in the order presented to the OS.

Hence if I have /dev/xvdf, /dev/xvdg, and /dev/xvdh on one of the new NVMe systems but the provider creates them in the order g-f-h (which it does consistently), they will be 2-1-3 in the OS.

This may represent a bug in both the Terraform provider and AWS - that the disks are created out of order, and that the hypervisor does not respect the requested name order.

To add some data to this, here's the EBS devices in an ASG I have configured:

  ebs_block_device {
    device_name           = "/dev/xvdf"
    volume_type           = "gp2"
    volume_size           = 16
    delete_on_termination = true
    encrypted             = true
    iops                  = 0
    snapshot_id           = ""
    no_device             = false
  }

  ebs_block_device {
    device_name           = "/dev/xvdg"
    volume_type           = "gp2"
    volume_size           = 500
    delete_on_termination = true
    encrypted             = true
    iops                  = 0
    snapshot_id           = ""
    no_device             = false
  }

  ebs_block_device {
    device_name           = "/dev/xvdh"
    volume_type           = "gp2"
    volume_size           = 1000
    delete_on_termination = true
    encrypted             = true
    iops                  = 0
    snapshot_id           = ""
    no_device             = false
  }

Here's the output for that section from aws autoscaling describe-launch-configuration, note that it's an array and the order it's in:

            "BlockDeviceMappings": [
                {
                    "DeviceName": "/dev/xvdh",
                    "Ebs": {
                        "VolumeSize": 1000,
                        "VolumeType": "gp2",
                        "DeleteOnTermination": true,
                        "Encrypted": true
                    }
                },
                {
                    "DeviceName": "/dev/xvdf",
                    "Ebs": {
                        "VolumeSize": 16,
                        "VolumeType": "gp2",
                        "DeleteOnTermination": true,
                        "Encrypted": true
                    }
                },
                {
                    "DeviceName": "/dev/xvdg",
                    "Ebs": {
                        "VolumeSize": 500,
                        "VolumeType": "gp2",
                        "DeleteOnTermination": true,
                        "Encrypted": true
                    }
                },
                {
                    "DeviceName": "/dev/sda1",
                    "Ebs": {
                        "VolumeSize": 8,
                        "VolumeType": "gp2",
                        "DeleteOnTermination": true
                    }
                }
            ],

Here's the output of lsblk from a c5.large system launched using that LaunchConfig:

NAME        MAJ:MIN RM  SIZE RO TYPE MOUNTPOINT
nvme0n1     259:3    0    8G  0 disk
鈹斺攢nvme0n1p1 259:4    0    8G  0 part /
nvme1n1     259:0    0 1000G  0 disk
nvme2n1     259:1    0   16G  0 disk
nvme3n1     259:2    0  500G  0 disk

As you can see, the in-OS ordering reflects the ordering of the BlockDeviceMappings array, which is out-of-order WRT the desired arrangement expressed in the Terraform resource. This does not happen on older instance types (e.g., c4.large) because it still adopts the naming (if not ordering) given in the launch configuration or instance definition.

Since AWS has stopped honoring that naming convention, I would hope that terraform could perhaps start sorting that array according to device_name so we users could have at least somewhat predictable naming schemes.

@bflad I think the issue on the provider side is that ebs_block_device is declared as a schema.TypeSet in both aws/resource_aws_launch_configuration.go and aws/resource_aws_instance.go. That means a list of its members is sorted by their _hashes_, which produces the predictably-misordered BlockDeviceMappings in AWS. Since the newest AWS instances are ordering their block devices by this array's order and not by the naming schema, we have what's happening above.

I'm not tooled up to test different versions of this code and don't know whether switching to schema.TypeList is possible or trivial, but AFAICT that's why this is happening.

I always thought it was weird that my block devices looked out of order in the console and in terraform plan output. Now I know

There is someone that have tried to solve this issue like this. https://github.com/leboncoin/terraform-aws-nvme-example

I have not tried it but have solved it that way manually. I have used the script here to upgrade c4 to c5 and t2 to t3.
https://aws.amazon.com/premiumsupport/knowledge-center/boot-error-linux-m5-c5/

Here is some more info that the example uses to identify the right volume by aws volume-id and attaches the right device with UUID.
https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/nvme-ebs-volumes.html

This should be build in to Terraform aws provider somehow.

Bad news. I wrote a patch to switch ebs_block_devices from a set to an array on both launch configurations and instances, and found out that one's client-side ordering seems to not matter at all.

It's entirely possible that made the wrong changes, but terraform and its internal tests seemed happy, and both the output of running terraform apply and terraform show seemed to show the block devices in written order. However, in checking the BlockDeviceMappings section from the AWS API (e.g. aws ec2 describe-instances) I found that they were not arranged in the order I'd created - in fact, create/destroy produced different results several times.

I went back to the upstream provider code (1.40) and observe similar behavior - terraform apply happened to hash my 3 devices in reverse order (3-2-1), but the order in the AWS API after was 1-3-2.

I'm going to attempt to submit a bug to AWS, but would suggest those of you affected do the same. Specifically, the new NVMe instances do not follow the bus order implied by device naming, but rather order by their appearance in BlockDeviceMappings. This is exacerbated when attaching multiple devices simultaneously (as with terraform), since they seem to be created asynchronously and attached to BlockDeviceMappings in order of completion.

I can confirm I'm running into this as well and I don't even use Terraform.

I'm experiencing out-of-order device names when upgrading from Ubuntu 14.04 -> 18.04 (images based off the official AMI).

For me I only have 2 EBS block devices, a boot and a data and even then the devices are out of order.

My provisioning system expects that /dev/nvme0n1 be root and /dev/nvme1n1 be data.

Disk /dev/nvme0n1: 120 GiB, 128849018880 bytes, 251658240 sectors
Units: sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes


Disk /dev/nvme1n1: 8 GiB, 8589934592 bytes, 16777216 sectors
Units: sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
Disklabel type: dos
Disk identifier: 0x34a452b2

so what's the deal with this? I had to go back to t2 family (that doesnt use this newfangled /dev/nvme* business) because dealing with the volumes-out-of-order thing makes it impossible to manage this in a sane way.

The upshot is that Amazon somehow considers this working as-designed. I've spoken with one of the Nitro engineers and, while he acknowledged that it makes life harder for users, I didn't get the impression that they ever intend to correct this.

Their primary suggested "solution" was to use udev to order devices the way you expect. A secondary solution I started but abandoned was using snapshots of empty filesystems. The net of it is that I've just stopped buying as much EBS storage.

[edit]
For completeness' sake, I should point out that this "only" happens when you attach devices simultaneously, as with a Launch Config or Template. If you incrementally add devices to an instance, they attach in expected order.

Hey all, I came up with a solid solution which I've had in production for the last couple months. I finally had a chance to document it on my blog today, have a look and see if this helps you.

https://russell.ballestrini.net/aws-nvme-to-block-mapping/

Because "I fixed this, read my blog" posts are information-free and prone to link-rot, the above user found a Python script called ebsnvme-id on AWS Linux that apparently has the ability to extract (among other things) the bdev field from Nitro NVMe devices, which correlates to the name you gave a device (e.g., /dev/sda1) at allocation. It does this by sending Nitro-specific ioctl requests to the device.

He then wrote a bash wrapper to walk the first 26 NVMe devices on a system and symlink them by name to the contents of the bdev field.

This isn't a solution from the hardware/terraform side, but integrating the ioctl() code and a more robust symlink management in one's userdata would help paper over this Nitro defect. It doesn't help that most userdata executables are probably shell scripts, but it's a start.

See also: https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/nvme-ebs-volumes.html

@rbcrwd Thanks for posting a summary of my post. In addition I'm hosting all the files needed for this solution, and a video explaining how it works on my 12 year old blog.

If you know the size of the disk you can filter in the user script using lsblk and jq

This works

DISKNAME=`lsblk -dJo NAME,SIZE,MOUNTPOINT | jq -r '..|.?|select(.size|startswith("${storageSize}")).name'`
sudo zpool create datadrive $DISKNAME -f

Passing in the size of the esb drives size.

Similar solution to @ChrisMcKee , but without jq. This assumes your device has a unique size.

DISK_NAME=`lsblk -do NAME,SIZE | grep ${ebs_device_size}G | cut -d ' ' -f 1`
DEVICE_NAME=/dev/$${DISK_NAME}

# do things with $${DEVICE_NAME}

_(double-quote is for ENV vars in the cloudinit-config template)_

Was this page helpful?
0 / 5 - 0 ratings