It appears that sometimes Packer will say that it has enabled ENA support for an Amazon EBS AMI; however Amazon does not list it as enabled (See screenshots below). This prevents us from launching into gen 5 hardware.
Our best guess is that we're running into some kind of race condition? Like perhaps ENA takes a few seconds to enable but the build terminates the instance before that is complete? It's possible this is a bug on Amazon's side, but we have no way to confirm that hypothesis.
This only appears to have start happening to us for the last 2 weeks. We have been using Packer + ENA for well over a year with no issues.
Any insights on this or what we can do to better isolate the issue would be helpful.
This problem seems to happen to us randomly when controlling for other variables such as environment it's running in, packer version, AWS cred's used. We are not totally sure how to actually reproduce this issue, which is making this particularly difficult to troubleshoot.
We have an Ansible playbook that provision the image with the ENA configuration required for CentOS 7 as described in this doc: https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/enhanced-networking.html#enabling_enhanced_networking and this code has not been changed anytime recently. AWS has also not published any changes we can find to how to enable this on the OS side.
1.6.2
packer.json
{
"variables": {
"aws_profile": null
},
"builders": [
{
"type": "amazon-ebs",
"source_ami_filter": {
"filters": {
"name": "REDACTED (This is a centos 7 ami)",
"image-type": "machine",
"virtualization-type": "hvm",
"root-device-type": "ebs",
"architecture": "x86_64"
},
"owners": ["REDACTED"],
"most_recent": true
},
"ssh_username": "centos",
"region": "us-east-1",
"run_tags": {
"Environment": "dev",
"Platform": "infrastructure",
"Function": "packer-builder",
"Name": "Packer Builder"
},
"tags": {
"Family": "REDACTED",
"SourceAMI": "{{ .SourceAMI }}"
},
"instance_type": "m3.medium",
"ena_support": true,
"profile": "{{user `aws_profile`}}",
"ami_name": "REDACTED-{{timestamp}}",
"security_group_ids": ["REDACTED"],
"ssh_timeout": "8m"
}
],
"provisioners": [
{
"type": "file",
"source": "./ansible.cfg",
"destination": "/tmp/ansible.cfg"
},
{
"type": "shell",
"inline": [
"sudo mv /tmp/ansible.cfg /etc/ansible/ansible.cfg",
"sudo chown root:root /etc/ansible/ansible.cfg"
]
},
{
"type": "ansible-local",
"playbook_dir": "./ansible",
"playbook_file": "./ansible/playbook.yaml"
}
]
}
We have run this on various version of Mac OS, Centos 7 and on alpine linux powered by AWS Fargate.
Image is created and says it recieves the ENA support flag:
Using ec2 describe-images, shows ena is not enabled for this image:
Hi, thanks for reaching out.
Was this a regression from a previous version of Packer, or did it just start happening even though you didn't upgrade?
Race condition makes the most sense to me... the code for this is here: https://github.com/hashicorp/packer/blob/master/builder/amazon/common/step_modify_ebs_instance.go#L57-L66 and it's pretty straightforward. But it's also basically the last thing we do before creating the AMI so it's possible that something around eventual consistency means we haven't created the AMI with the proper settings.
Was this a regression from a previous version of Packer, or did it just start happening even though you didn't upgrade?
We saw this problem with multiple version of packer.
It seems the issue has gone away for us; guessing it was some kind of race condition on Amazon's end that they fixed silently?
Is there a way Packer could check to make sure its enabled before continuing? If this comes up for other people that seems like it would fix it if it is indeed a race condition
We can probably do a describe-images call before creating the AMI, inside a retry loop.
Hi,
Sorry for the late reply - but I hope this helps you as well... We started also having this issue (meaning, only 1/3rd of the "Packer builds" succeeded) with a very old (over 6 months) old Packer version - and all out of nowhere. We raised the issue open with AWS (case ID is 7336402521 if someone wants to refer it with them) - and their latest update (from Monday) is that they have started to rollout a fix globally - region by region - and it should take some days finish the rollout. We'll try this again beginning of next week to check whether everything's back to normal.
Our workaround for this was to change us off from Nitro-based instances for time being (as it's an ENA thingy). Going back to Nitros when AWS has confirmed to have the fix in...
Thanks for following up with us -- sounds to me like this is an upstream bug with Amazon, so I'm going to close this. If it continues to happen after this week, make a note and we can reopen the issue.
I'm going to lock this issue because it has been closed for _30 days_ โณ. This helps our maintainers find and focus on the active issues.
If you have found a problem that seems similar to this, please open a new issue and complete the issue template so we can capture all the details necessary to investigate further.
Most helpful comment
Thanks for following up with us -- sounds to me like this is an upstream bug with Amazon, so I'm going to close this. If it continues to happen after this week, make a note and we can reopen the issue.