_This issue was originally opened by @davivcgarcia as hashicorp/terraform#18271. It was migrated here as a result of the provider split. The original body of the issue is below._
$ terraform -v
Terraform v0.11.7
+ provider.aws v1.22.0
resource "aws_instance" "k8s_node" {
ami = "${data.aws_ami.default.id}"
instance_type = "m5.xlarge"
key_name = "${aws_key_pair.default.key_name}"
subnet_id = "${aws_subnet.main_us-east-1a.id}"
vpc_security_group_ids = ["${aws_security_group.default.id}"]
root_block_device {
volume_size = "40"
volume_type = "standard"
}
ebs_block_device {
device_name = "/dev/sdb"
volume_size = "80"
volume_type = "standard"
}
ebs_block_device {
device_name = "/dev/sdc"
volume_size = "250"
volume_type = "standard"
}
tags {
Name = "k8s-node"
}
}
The resources should have the primary/boot disk (nvme0n1) of 40GB, a secondary disk (nvme1n1) of 80GB and a tertiary disk (nvme2n1) of 250GB.
Terraform creates the instance with wrong disk order, being the secondary disk (nvme1n1) of 250GB and the tertiary disk (nvme2n1) of 80GB.
terraform init
terraform apply
aws_instance.k8s_node: Creating...
ami: "" => "ami-950e95ea"
associate_public_ip_address: "" => "<computed>"
availability_zone: "" => "<computed>"
ebs_block_device.#: "" => "2"
ebs_block_device.2554893574.delete_on_termination: "" => "true"
ebs_block_device.2554893574.device_name: "" => "/dev/sdc"
ebs_block_device.2554893574.encrypted: "" => "<computed>"
ebs_block_device.2554893574.snapshot_id: "" => "<computed>"
ebs_block_device.2554893574.volume_id: "" => "<computed>"
ebs_block_device.2554893574.volume_size: "" => "250"
ebs_block_device.2554893574.volume_type: "" => "standard"
ebs_block_device.2576023345.delete_on_termination: "" => "true"
ebs_block_device.2576023345.device_name: "" => "/dev/sdb"
ebs_block_device.2576023345.encrypted: "" => "<computed>"
ebs_block_device.2576023345.snapshot_id: "" => "<computed>"
ebs_block_device.2576023345.volume_id: "" => "<computed>"
ebs_block_device.2576023345.volume_size: "" => "80"
ebs_block_device.2576023345.volume_type: "" => "standard"
ephemeral_block_device.#: "" => "<computed>"
get_password_data: "" => "false"
instance_state: "" => "<computed>"
instance_type: "" => "m5.xlarge"
ipv6_address_count: "" => "<computed>"
ipv6_addresses.#: "" => "<computed>"
key_name: "" => "default"
network_interface.#: "" => "<computed>"
network_interface_id: "" => "<computed>"
password_data: "" => "<computed>"
placement_group: "" => "<computed>"
primary_network_interface_id: "" => "<computed>"
private_dns: "" => "<computed>"
private_ip: "" => "<computed>"
public_dns: "" => "<computed>"
public_ip: "" => "<computed>"
root_block_device.#: "" => "1"
root_block_device.0.delete_on_termination: "" => "true"
root_block_device.0.volume_id: "" => "<computed>"
root_block_device.0.volume_size: "" => "40"
root_block_device.0.volume_type: "" => "standard"
security_groups.#: "" => "<computed>"
source_dest_check: "" => "true"
subnet_id: "" => "subnet-036d839562552db17"
tags.%: "" => "2"
tags.Name: "" => "k8s_node"
tenancy: "" => "<computed>"
volume_tags.%: "" => "<computed>"
vpc_security_group_ids.#: "" => "1"
vpc_security_group_ids.2684253548: "" => "sg-0a12ea76c68402986"
$ lsblk
NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINT
nvme0n1 259:2 0 40G 0 disk
鈹溾攢nvme0n1p1 259:3 0 1M 0 part
鈹斺攢nvme0n1p2 259:4 0 40G 0 part /
nvme1n1 259:0 0 250G 0 disk
nvme2n1 259:1 0 80G 0 disk
@davivcgarcia in some cases you might need to reference non-root device names by /dev/xvd_
instead of /dev/sd_
, e.g. /dev/xvdb
instead of /dev/sdb
. It depends on the AMI. The ordering of the ebs_block_device
configurations in the Terraform configuration does not determine any sort of ordering with the instance disks. If the AMI has the information baked in, you can see it with the AWS CLI ec2 describe-images
@bflad I'm trying to use an m5.xlarge instance and its naming is /dev/nvme[0-26]n1
(https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/device_naming.html) but Terraform says it's invalid. I already tried /dev/sd[b-y]
and /dev/xvd[b-y]
as device_name
and the behavior is the same.
I have the same problem trying to create m5.xlarge instance with 4 devices plus root. With /dev/sdX and /dev/xvdX order is not as declared, with /dev/nvmeXn1 I got terraform error:
* aws_instance.new-instance: Error launching source instance: InvalidBlockDeviceMapping: Invalid device name /dev/nvme2n1
status code: 400, request id: XXXXXXXXXXXXXXXXX
I replaced the ebs_block_device
scope in the aws_instance
resource to standalone aws_ebs_volume
resources with recommended device names (/dev/sd[f-p]
), but the problem is still the same. I also tried to create these instances manually at AWS, and the problem is still the same.
resource "aws_volume_attachment" "infra_docker_ebs_attach" {
device_name = "/dev/sdf"
volume_id = "${aws_ebs_volume.infra_docker_ebs.*.id["${count.index}"]}"
instance_id = "${aws_instance.ocp_infra_node.*.id["${count.index}"]}"
count = "${aws_instance.ocp_infra_node.count}"
}
resource "aws_ebs_volume" "infra_docker_ebs" {
availability_zone = "us-east-1a"
size = "80"
type = "standard"
count = "${aws_instance.ocp_infra_node.count}"
}
resource "aws_volume_attachment" "infra_gluster_ebs_attach" {
device_name = "/dev/sdp"
volume_id = "${aws_ebs_volume.infra_gluster_ebs.*.id["${count.index}"]}"
instance_id = "${aws_instance.ocp_infra_node.*.id["${count.index}"]}"
count = "${aws_instance.ocp_infra_node.count}"
}
resource "aws_ebs_volume" "infra_gluster_ebs" {
availability_zone = "us-east-1a"
size = "300"
type = "standard"
count = "${aws_instance.ocp_infra_node.count}"
}
I changed my instance type from m5.xlarge
to t2.xlarge
and the order was satisfied properly by Terraform. In summary, I think this issue is on AWS side, not Terraform AWS Provider.
Same issue here. I'm creating a couple of volumes from snapshots using separate aws_ebs_volume resources, so my root device isn't an issue but the two additional volumes still cause problems. As others have done, I tried device naming in Terraform with /dev/sd_
and /dev/xvd_
but the resulting order seems to be random.
Also tried using depends_on
to always create and mount one resource before the other (slowing my provisioning down but was an acceptable sacrifice), but this didn't work either.
As I'm creating from snapshots, I've just ended up labelling the devices first (e.g. xfs_admin -L vol1 /dev/nvme1n1
), then when creating volumes from new snapshots I can mount them based on the label.
I ran into this same issue and discussions with AWS have uncovered that the ordering of disk device naming is not guaranteed to remain the same as defined at build time. This has to do with device discovery by the AMI, the order they are discovered determines the device name assigned.
This is definitely new behavior starting with the nvme* disks. I have had to implement some custom scripting that runs from user-data to map the devices as defined in terraform to the actual mount points on the host. It means you can't use /dev/nvme1n1 or similar in fstab anymore either, you must use UUID to ensure proper mounting.
I'll add my voice here - it's the same for aws_launch_configuration
too. It doesn't matter whether one uses the sdX
or xvdX
nomenclature, or what the ebs_block_device
ordering is in the resource. Block-device ordering on the actual machine is consistent but out of order.
This seems to have been the case for a while. I back-revisioned to a 1.21.0 binary I had and it still creates the disks out of order. The difference is that the older instance types that still use SCSI emulation (e.g., t2.large) respected the device names Terraform provides. The new instance types that default to /dev/nvmeXp1
do not, however - they're strictly named in the order presented to the OS.
Hence if I have /dev/xvdf
, /dev/xvdg
, and /dev/xvdh
on one of the new NVMe systems but the provider creates them in the order g-f-h
(which it does consistently), they will be 2-1-3
in the OS.
This may represent a bug in both the Terraform provider and AWS - that the disks are created out of order, and that the hypervisor does not respect the requested name order.
To add some data to this, here's the EBS devices in an ASG I have configured:
ebs_block_device {
device_name = "/dev/xvdf"
volume_type = "gp2"
volume_size = 16
delete_on_termination = true
encrypted = true
iops = 0
snapshot_id = ""
no_device = false
}
ebs_block_device {
device_name = "/dev/xvdg"
volume_type = "gp2"
volume_size = 500
delete_on_termination = true
encrypted = true
iops = 0
snapshot_id = ""
no_device = false
}
ebs_block_device {
device_name = "/dev/xvdh"
volume_type = "gp2"
volume_size = 1000
delete_on_termination = true
encrypted = true
iops = 0
snapshot_id = ""
no_device = false
}
Here's the output for that section from aws autoscaling describe-launch-configuration
, note that it's an array and the order it's in:
"BlockDeviceMappings": [
{
"DeviceName": "/dev/xvdh",
"Ebs": {
"VolumeSize": 1000,
"VolumeType": "gp2",
"DeleteOnTermination": true,
"Encrypted": true
}
},
{
"DeviceName": "/dev/xvdf",
"Ebs": {
"VolumeSize": 16,
"VolumeType": "gp2",
"DeleteOnTermination": true,
"Encrypted": true
}
},
{
"DeviceName": "/dev/xvdg",
"Ebs": {
"VolumeSize": 500,
"VolumeType": "gp2",
"DeleteOnTermination": true,
"Encrypted": true
}
},
{
"DeviceName": "/dev/sda1",
"Ebs": {
"VolumeSize": 8,
"VolumeType": "gp2",
"DeleteOnTermination": true
}
}
],
Here's the output of lsblk
from a c5.large
system launched using that LaunchConfig:
NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINT
nvme0n1 259:3 0 8G 0 disk
鈹斺攢nvme0n1p1 259:4 0 8G 0 part /
nvme1n1 259:0 0 1000G 0 disk
nvme2n1 259:1 0 16G 0 disk
nvme3n1 259:2 0 500G 0 disk
As you can see, the in-OS ordering reflects the ordering of the BlockDeviceMappings array, which is out-of-order WRT the desired arrangement expressed in the Terraform resource. This does not happen on older instance types (e.g., c4.large
) because it still adopts the naming (if not ordering) given in the launch configuration or instance definition.
Since AWS has stopped honoring that naming convention, I would hope that terraform could perhaps start sorting that array according to device_name so we users could have at least somewhat predictable naming schemes.
@bflad I think the issue on the provider side is that ebs_block_device
is declared as a schema.TypeSet
in both aws/resource_aws_launch_configuration.go
and aws/resource_aws_instance.go
. That means a list of its members is sorted by their _hashes_, which produces the predictably-misordered BlockDeviceMappings in AWS. Since the newest AWS instances are ordering their block devices by this array's order and not by the naming schema, we have what's happening above.
I'm not tooled up to test different versions of this code and don't know whether switching to schema.TypeList
is possible or trivial, but AFAICT that's why this is happening.
I always thought it was weird that my block devices looked out of order in the console and in terraform plan
output. Now I know
There is someone that have tried to solve this issue like this. https://github.com/leboncoin/terraform-aws-nvme-example
I have not tried it but have solved it that way manually. I have used the script here to upgrade c4 to c5 and t2 to t3.
https://aws.amazon.com/premiumsupport/knowledge-center/boot-error-linux-m5-c5/
Here is some more info that the example uses to identify the right volume by aws volume-id and attaches the right device with UUID.
https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/nvme-ebs-volumes.html
This should be build in to Terraform aws provider somehow.
Bad news. I wrote a patch to switch ebs_block_devices from a set to an array on both launch configurations and instances, and found out that one's client-side ordering seems to not matter at all.
It's entirely possible that made the wrong changes, but terraform and its internal tests seemed happy, and both the output of running terraform apply
and terraform show
seemed to show the block devices in written order. However, in checking the BlockDeviceMappings
section from the AWS API (e.g. aws ec2 describe-instances
) I found that they were not arranged in the order I'd created - in fact, create/destroy produced different results several times.
I went back to the upstream provider code (1.40) and observe similar behavior - terraform apply
happened to hash my 3 devices in reverse order (3-2-1), but the order in the AWS API after was 1-3-2.
I'm going to attempt to submit a bug to AWS, but would suggest those of you affected do the same. Specifically, the new NVMe instances do not follow the bus order implied by device naming, but rather order by their appearance in BlockDeviceMappings
. This is exacerbated when attaching multiple devices simultaneously (as with terraform), since they seem to be created asynchronously and attached to BlockDeviceMappings
in order of completion.
I can confirm I'm running into this as well and I don't even use Terraform.
I'm experiencing out-of-order device names when upgrading from Ubuntu 14.04 -> 18.04 (images based off the official AMI).
For me I only have 2 EBS block devices, a boot and a data and even then the devices are out of order.
My provisioning system expects that /dev/nvme0n1
be root and /dev/nvme1n1
be data.
Disk /dev/nvme0n1: 120 GiB, 128849018880 bytes, 251658240 sectors
Units: sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
Disk /dev/nvme1n1: 8 GiB, 8589934592 bytes, 16777216 sectors
Units: sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
Disklabel type: dos
Disk identifier: 0x34a452b2
so what's the deal with this? I had to go back to t2 family (that doesnt use this newfangled /dev/nvme* business) because dealing with the volumes-out-of-order thing makes it impossible to manage this in a sane way.
The upshot is that Amazon somehow considers this working as-designed. I've spoken with one of the Nitro engineers and, while he acknowledged that it makes life harder for users, I didn't get the impression that they ever intend to correct this.
Their primary suggested "solution" was to use udev to order devices the way you expect. A secondary solution I started but abandoned was using snapshots of empty filesystems. The net of it is that I've just stopped buying as much EBS storage.
[edit]
For completeness' sake, I should point out that this "only" happens when you attach devices simultaneously, as with a Launch Config or Template. If you incrementally add devices to an instance, they attach in expected order.
Hey all, I came up with a solid solution which I've had in production for the last couple months. I finally had a chance to document it on my blog today, have a look and see if this helps you.
Because "I fixed this, read my blog" posts are information-free and prone to link-rot, the above user found a Python script called ebsnvme-id
on AWS Linux that apparently has the ability to extract (among other things) the bdev
field from Nitro NVMe devices, which correlates to the name you gave a device (e.g., /dev/sda1) at allocation. It does this by sending Nitro-specific ioctl requests to the device.
He then wrote a bash wrapper to walk the first 26 NVMe devices on a system and symlink them by name to the contents of the bdev
field.
This isn't a solution from the hardware/terraform side, but integrating the ioctl()
code and a more robust symlink management in one's userdata
would help paper over this Nitro defect. It doesn't help that most userdata
executables are probably shell scripts, but it's a start.
See also: https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/nvme-ebs-volumes.html
@rbcrwd Thanks for posting a summary of my post. In addition I'm hosting all the files needed for this solution, and a video explaining how it works on my 12 year old blog.
If you know the size of the disk you can filter in the user script using lsblk and jq
This works
DISKNAME=`lsblk -dJo NAME,SIZE,MOUNTPOINT | jq -r '..|.?|select(.size|startswith("${storageSize}")).name'`
sudo zpool create datadrive $DISKNAME -f
Passing in the size of the esb drives size.
Similar solution to @ChrisMcKee , but without jq
. This assumes your device has a unique size.
DISK_NAME=`lsblk -do NAME,SIZE | grep ${ebs_device_size}G | cut -d ' ' -f 1`
DEVICE_NAME=/dev/$${DISK_NAME}
# do things with $${DEVICE_NAME}
_(double-quote is for ENV vars in the cloudinit-config template)_
Most helpful comment
I can confirm I'm running into this as well and I don't even use Terraform.
I'm experiencing out-of-order device names when upgrading from Ubuntu 14.04 -> 18.04 (images based off the official AMI).
For me I only have 2 EBS block devices, a boot and a data and even then the devices are out of order.
My provisioning system expects that
/dev/nvme0n1
be root and/dev/nvme1n1
be data.