Packer: AMI BUILDER (CHROOT) Attachment point /dev/sdf is already in use (parallel builds)

Created on 12 Jan 2016  路  10Comments  路  Source: hashicorp/packer

Packer 0.8.6

This issue is a little bit tricky to reproduce. You need to run multiple builds on the same host. I think it happens when a build launches at the same time as an other build frees its EBS volume.

To reproduce this I run 6 terminals with the same build. I start one build and when its around the logging of "==> amazon-chroot: Unmounting the root device..." I start the 5 other builds. The builds are just using shell provisioner doing:

apt-get update
apt-get install --force-yes -y vim
apt-get install --force-yes -y telnet

The base ami is ubuntu trusty and the packages are already installed. This is just a simple build to reproduce the bug.

I can see in the build log of one of the builds that I started before that /dev/sdf is used there and unmounted at about the same time as the failing build is starting up. This is the error message I get:

amazon-chroot output will be in this color.

==> amazon-chroot: Prevalidating AMI Name...
==> amazon-chroot: Gathering information about this EC2 instance...
==> amazon-chroot: Inspecting the source AMI...
==> amazon-chroot: Checking the root device on source AMI...
==> amazon-chroot: Creating the root volume...
==> amazon-chroot: Attaching the root volume to /dev/sdf
==> amazon-chroot: Error attaching volume: InvalidParameterValue: Invalid value '/dev/sdf' for unixDevice. Attachment point /dev/sdf is already in use
==> amazon-chroot:  status code: 400, request id: []
==> amazon-chroot: Deleting the created EBS volume...
Build 'amazon-chroot' errored: Error attaching volume: InvalidParameterValue: Invalid value '/dev/sdf' for unixDevice. Attachment point /dev/sdf is already in use
    status code: 400, request id: []

==> Some builds didn't complete successfully and had errors:
--> amazon-chroot: Error attaching volume: InvalidParameterValue: Invalid value '/dev/sdf' for unixDevice. Attachment point /dev/sdf is already in use
    status code: 400, request id: []

==> Builds finished but no artifacts were created.

https://gist.github.com/gardleopard/597378585a698c09f3f5

bug buildeamazon

Most helpful comment

I just ran into this issue on the system @benmullin333 was asking about above, and I was able to work around it by unattaching volumes from the ec2 instance packer was running on鈥攊t already had /dev/sd[fghijklmn] attached and was trying to attach /dev/sdn and was failing. My guess is all the other block devices were attached due to previous packer runs that were interrupted and left the block devices behind.

In the past I think we've cleaned up the situation by rebooting the server, which also causes the block devices to be cleared out.

All 10 comments

@gardleopard Can you share a gist to your packer template? Please be sure to remove any secrets

@gardleopard Thanks for the bug report. Can you clarify whether this _always_ happens when running concurrent chroot builds or only at certain times during the build? In the latter case this might be a race condition with the EC2 API. Maybe we can try to use a different device for each build.

@cbednarski I have documented this in the first comment. This does not always happen, when it happens I belive it is because a ebs just got detached. But I have not proven that this is whats happens, its just what I think is happening.

I agree that using a different device for each build can be a solution. From what I have seen only devices from f to p are available, so it probably needs some logic to know what devices to use.

any updates regarding this issue? Have you experimented with the different devices approach?

Hmm, looked into this a bit. Packer already tries its best to pick a device path that's not already in use. It loops through /dev/<prefix>[fghijklmnop], statting each file, and then returns the first one that doesn't exist.

I found a few aws threads that seem to indicate that they were having trouble re-using attachment points. https://forums.aws.amazon.com/message.jspa?messageID=598669

I wonder if you could provide a little more debugging info.

If this happens again, could you collect all of the volume-ids that were created for all of the packer runs (i.e. if running packer 6 times, get the chroot root volume id), and run

 aws ec2 describe-volumes --volume-ids <all volume ids>

and paste the output?

We experience this problem as well. Our use case is building multiple AMIs in parallel from Jenkins pipelines (either the same job or different jobs).

We see this almost every time we run more than a single packer invocation at the same time. Our template simply untar's a tarball - it doesn't install any services or otherwise start processes that might get left running. The documentation wants you to run lsof and friends: the only thing I ever found was jbd2 which I don't have any control over (I tried having packer mount the chroot volume with no journal, but it didn't help)

Here's a gist showing what happens after a system reboot and after it's run several jobs: https://gist.github.com/stormsilver/4c60aba51d2b2e5e11bee09e1e9f73ff

hi @mwhooker I'm seeing this problem too
here's the info you requested from @gardleopard this is after I see a failure
https://gist.github.com/benmullin333/ad959531b97f18f033c26a39e23e147d

My guess is there is a transitional state for these devices that's tripping up packer.

I don't understand how/when packer decides to create new block devices. I'm inclined to guess that a 3x exponential backoff retry could solve/improve the problem.. not sure where, though.

Bump. Still a problem in 1.3.2.

I just ran into this issue on the system @benmullin333 was asking about above, and I was able to work around it by unattaching volumes from the ec2 instance packer was running on鈥攊t already had /dev/sd[fghijklmn] attached and was trying to attach /dev/sdn and was failing. My guess is all the other block devices were attached due to previous packer runs that were interrupted and left the block devices behind.

In the past I think we've cleaned up the situation by rebooting the server, which also causes the block devices to be cleared out.

Was this page helpful?
0 / 5 - 0 ratings