After updating from 1.4.0 to 1.4.1 the Amazon-EBS builder fails reproducably when it tries to copy/encrypt the resulting AMI:
amazon-ebs: Creating encrypted copy in build region: us-east-1
amazon-ebs: Waiting for all copies to complete...
amazon-ebs: 1 error(s) occurred:
==> amazon-ebs:
==> amazon-ebs: * Error waiting for AMI (ami-078d087247324ef86) in region (us-east-1): ResourceNotReady: failed waiting for successful resource state
Downgrading back to 1.4.0 solves the problem.
Can you please share
PACKER_LOG=1 packer build template.json
.Template: https://gist.github.com/dhs-rec/7084aa659a3fb77572299d02a209b5ff
Log: https://gist.github.com/dhs-rec/34f393ed7ec1fe0d25531cfe446e6007
It might be related to the kms_key_id
setting. If I omit that to use the default key it works. However, this worked fine with 1.4.0.
Hey @SwampDragons / @dhs-rec,
same issue with Packer 1.4.2
Looks like the kms_key_id
field is not working since Packer Version > 1.4.0.
You will need to provide the region_kms_key_ids
+ ami_regions
+ region
option to make it work.
Works for Packer <= 1.4.0
AWS_DEFAULT_REGION=eu-central-1
"encrypt_boot": true,
"kms_key_id": "{{ user `kms_key` }}"
same error for
"encrypt_boot": true,
"region": "eu-central-1",
"ami_regions": [ "eu-central-1" ],
"kms_key_id": "{{ user `kms_key` }}",
All will fail. Also the State Reason of the AMI will be Copy image failed with an internal error
.
==> amazon-ebs: Waiting for the instance to stop...
==> amazon-ebs: Creating AMI dI0wyoo from instance i-0e6626ac4c2dd7464
amazon-ebs: AMI: ami-0412ca842b62553fa
==> amazon-ebs: Waiting for AMI to become ready...
==> amazon-ebs: Copying/Encrypting AMI (ami-0412ca842b62553fa) to other regions...
amazon-ebs: Copying to: eu-central-1
amazon-ebs: Waiting for all copies to complete...
==> amazon-ebs: 1 error(s) occurred:
==> amazon-ebs:
==> amazon-ebs: * Error waiting for AMI (ami-0cb2b7d3600f4e652) in region (eu-central-1): ResourceNotReady: failed waiting for successful resource state
AWS_DEFAULT_REGION=eu-central-1
"encrypt_boot": true,
"kms_key_id": "{{ user `kms_key` }}",
"ami_regions": [ "eu-central-1" ]
Here you will end up with 2 broken images.
==> amazon-ebs: Waiting for the instance to stop...
==> amazon-ebs: Creating AMI eSPnMS3 from instance i-0d9b3a802d9d71ce0
amazon-ebs: AMI: ami-0841f5532986aaad7
==> amazon-ebs: Waiting for AMI to become ready...
==> amazon-ebs: Copying/Encrypting AMI (ami-0841f5532986aaad7) to other regions...
amazon-ebs: Copying to: eu-central-1
amazon-ebs: Copying to: eu-central-1
amazon-ebs: Waiting for all copies to complete...
==> amazon-ebs: 2 error(s) occurred:
==> amazon-ebs:
==> amazon-ebs: * Error waiting for AMI (ami-0a4bcac577bdd2204) in region (eu-central-1): ResourceNotReady: failed waiting for successful resource state
==> amazon-ebs: * Error waiting for AMI (ami-0a8513872d4bdfeb8) in region (eu-central-1): ResourceNotReady: failed waiting for successful resource state
AWS_DEFAULT_REGION=eu-central-1
"encrypt_boot": true,
"ami_regions": [ "eu-central-1" ],
"region_kms_key_ids":
{
"eu-central-1": "{{ user `kms_key` }}"
}
AMI is copied to the region twice. Only one AMI will be tagged
==> amazon-ebs: Waiting for the instance to stop...
==> amazon-ebs: Creating AMI H2K6CvC from instance i-02adcfbd74204ac86
amazon-ebs: AMI: ami-0fa8c2bc9e5b1aece
==> amazon-ebs: Waiting for AMI to become ready...
==> amazon-ebs: Copying/Encrypting AMI (ami-0fa8c2bc9e5b1aece) to other regions...
amazon-ebs: Copying to: eu-central-1
amazon-ebs: Copying to: eu-central-1
amazon-ebs: Waiting for all copies to complete...
==> amazon-ebs: Modifying attributes on AMI (ami-078a0c880621cb546)...
AWS_DEFAULT_REGION=eu-central-1
"encrypt_boot": true,
"region": "eu-central-1",
"ami_regions": [ "eu-central-1" ],
"region_kms_key_ids":
{
"eu-central-1": "{{ user `kms_key` }}"
}
Everything is fine, AMIs are tagged, works as expected.
==> amazon-ebs: Waiting for the instance to stop...
==> amazon-ebs: Creating AMI 5uzeVLD from instance i-0495201b131b230ce
amazon-ebs: AMI: ami-0235001035f9a07d0
==> amazon-ebs: Waiting for AMI to become ready...
==> amazon-ebs: Copying/Encrypting AMI (ami-0235001035f9a07d0) to other regions...
amazon-ebs: Copying to: eu-central-1
amazon-ebs: Waiting for all copies to complete...
==> amazon-ebs: Modifying attributes on AMI (ami-0c88880007d74e1e5)...
Thanks for the detailed report and output on this. I'll look into it before the next release.
The fix timmjd recommended did not work for me. Even though I supplied a custom KMS key packer used the default key.
When I use the following template snippet (the original fix referenced in issue #7673) packer ignores the customized region KMS key mapping provided and uses the default EBS KMS key. I would consider this a bug.
"encrypt_boot": true,
"region": "eu-central-1",
"ami_regions": [ "eu-central-1" ],
"region_kms_key_ids":
{
"eu-central-1": "{{ userkms_key
}}"
}
From PR #7870:
When I used the provided artifact I changed my template back to the original format (that worked in 1.4.0) and packer failed with the waiting for AMI.
"region": "{{user aws_region}}",
"encrypt_boot": "true",
"kms_key_id": "{{user shared_services_account_kms_key_alias}}"
@AndrewCi I can't reproduce. :( When I provide my key id, the region copy succeeds, no duplicates, using the non-default key provided. I don't know what's going on for you here. Is there any more information you can provide? Full template/full debug logs, for example?
Clarification: I can't reproduce based on the config I copied above. I think I know what's up with the validation for the workaround you were using when the above wasn't working.
@SwampDragons not sure if this is relevant but the build region and copy region are the same in my case. Also, I'm using the ARN in the alias format.
Even setting the "regions" list, this works for me:
{
"builders": [
{
"type": "amazon-ebs",
"force_deregister": true,
"ssh_username": "ubuntu",
"ami_name": "Test AMI",
"instance_type": "t2.micro",
"source_ami_filter": {
"filters": {
"virtualization-type": "hvm",
"name": "ubuntu/images/*ubuntu-xenial-16.04-amd64-server-*",
"root-device-type": "ebs"
},
"owners": ["099720109477"],
"most_recent": true
},
"region": "us-east-1",
"encrypt_boot": true,
"kms_key_id": "arn:aws:kms:us-east-1:{{ aws_user }}:key/{{ key_UUID}}",
"ami_regions": ["us-east-1"]
}
]
}
Hmm. I'll try using the ID instead of alias. Can you provide the exact link to the binary as well. I'll test again later tonight.
Make sure the version you see in your logs is 1.4.3-dev
, too. The number of times I've loaded up the wrong binary... 🤦♀
Make sure the version you see in your logs is
1.4.3-dev
, too. The number of times I've loaded up the wrong binary... 🤦♀
I'm working on Ubuntu 18.04. I verified I was using the correct binary. See below for logs:
[0;32m [cleaned] Stopping instance[0m
[1;32m==> [cleaned] Waiting for the instance to stop...[0m
[1;32m==> [cleaned] Enabling Enhanced Networking (ENA)...[0m
[1;32m==> [cleaned] Creating AMI OZ76Squ from instance [instance_id][0m
[0;32m [cleaned] AMI: [ami_id][0m
[1;32m==> [cleaned] Waiting for AMI to become ready...[0m
[1;32m==> [cleaned] Copying/Encrypting AMI ([ami_id]) to other regions...[0m
[0;32m [cleaned] Copying to: us-east-2[0m
[0;32m [cleaned] Waiting for all copies to complete...[0m
[1;31m==> [cleaned] 1 error(s) occurred:
==> [cleaned]
==> [cleaned] * Error waiting for AMI ([ami_id]) in region (us-east-2): ResourceNotReady: failed waiting for successful resource state[0m
[1;32m==> [cleaned] Deregistering the AMI and deleting unencrypted temporary AMIs and snapshots[0m
[1;32m==> [cleaned] Deregistered AMI id: [ami_id][0m
[1;32m==> [cleaned] Deleted snapshot: snap-06b88b1957ac2241d[0m
[1;32m==> [cleaned] Deregistering the AMI and deleting associated snapshots because of cancellation, or error...[0m
[1;32m==> [cleaned] Terminating the source AWS instance...[0m
[1;32m==> [cleaned] Cleaning up any extra volumes...[0m
[1;32m==> [cleaned] No volumes to clean up, skipping[0m
[1;31mBuild [cleaned] errored: 1 error(s) occurred:
- Error waiting for AMI ([ami_id]) in region (us-east-2): ResourceNotReady: failed waiting for successful resource state[0m
==> Some builds didn't complete successfully and had errors:
--> [cleaned] 1 error(s) occurred:
- Error waiting for AMI ([ami_id]) in region (us-east-2): ResourceNotReady: failed waiting for successful resource state
==> Builds finished but no artifacts were created.
And the template (note - the same template works in 1.4.0):
"region": "{{user aws_region
}}",
"encrypt_boot": "true",
"kms_key_id": "{{user kms_key_id
}}",
UPDATE:
Strange behavior. The template below fails in 1.4.3 but works (with the default KMS key bug) in 1.4.2.
"encrypt_boot": "true", "ami_regions": [ "{{user `aws_region`}}" ], "region_kms_key_ids": { "us-east-2": "{{ user `kms_key_id` }}" }
Something is definitely up with the encryption portion of the EBS builder.
Thanks for your patience on this back and forth. I've tried everything I can think of, here. I've tested with the exact format you're giving me in your template samples, making sure I'm interpolating the same variables as you. I've also tested with those vars hardcoded. I've tried with the kms Id, ARN, and alias. I cannot reproduce this. It works for me every time.
The only time I've managed to produce the ResourceNotReady: failed waiting for successful resource state
error was when I provided a kms_key_id that was for a different region than the one I said to use it for. E.g.
"region_kms_key_ids": {
"us-east-2": "{{ user `kms_key_id` }}"
}
where the value set in kms_key_id was actually a valid key but for us-east-1
I was also able to produce the error by adding a typo the kms key variable.
So... Is this possibly the issue? I'm not trying to cop out and say "user error" here but I'm well and truly stumped, and this is literally the only way I've been able to produce this error in a good five hours of testing. Another thought that I haven't yet tested is that maybe something about IAM user permissions changed since 1.4.0 and your role is more restrictive than mine? I've been casting back through my memory and I don't _think_ this is the case unless you're using spot fleets, which shouldn't affect encryption at all. As you can see, I'm at the "grasping at straws" phase of debugging.
Last question: You're using amazon-ebs and not the amazon-ebssurrogate builder or a different one? I want to make sure I've not been barking up the wrong tree this whole time.
Do you have any potential explanation of what may cause the use of the default key instead of the user supplied key in my last example?
I'm not sure I understand. There was a logic error in 1.4.2, which I resolved in the patch you tested, where the default key was always being used in the build region instead of the provided kms key.
Are you saying that in the 1.4.3-dev patch you're still seeing the default key being used? I had understood that you were seeing the same failed waiting for successful resource state
error for both of your configurations with the 1.4.3-dev build.
I'm not sure I understand. There was a logic error in 1.4.2, which I resolved in the patch you tested, where the default key was always being used in the build region instead of the provided kms key.
Are you saying that in the 1.4.3-dev patch you're still seeing the default key being used? I had understood that you were seeing the same
failed waiting for successful resource state
error for both of your configurations with the 1.4.3-dev build.
Yes. Apologies for the confusion. When I run the 1.4.3-dev build with the key mapping it runs and does not give me an error. But it uses the default key (which is a bug since I supplied a custom key). See below for the the template I used on 1.4.3-dev.
"encrypt_boot": "true",
"ami_regions": [ "{{useraws_region
}}" ],
"region_kms_key_ids":
{
"us-east-2": "{{ userkms_key_id
}}"
}
I have no idea how that's happening. I can't reproduce that behavior.
Hi SwampDragons -
Thank you so much for spending time on this. I finally figured it out - it was a combination of user error and a potential "bug". Long story short I had a hard-coded value in one of my terraform deployment scripts and the EC2 IAM role I was using had an old KMS key ID in the policy. Therefore, the EC2 instance I was running packer from did not have the proper IAM permissions for the KMS key I was referencing in my packer template.
While this was definitely my error I do believe there is something to be gained from this exercise. The behavior of packer ignoring the specified custom key and using the region's default EBS KMS key since it could not access the custom key seems improper. I believe packer should error out if it can't access the key referenced in the template and not automatically switch to the default key. Additionally, if there is a way for packer to exit with an error referencing IAM permissions issues with the provided key instead of ResourceNotReady: failed waiting for successful resource state.
Would it be possible to add a validation check before the template executes that checks to see if the provided custom KMS key can accessed?
Curious on your thoughts. Again, apologies for the user error here.
-AC
Hmm. I'll investigate and see if we can do some kind of quick query that checks for validity of kms key so we at least fail early. The problem is, I don't think we'll ever get a useful error message from Amazon--My gut says they specifically don't say the key is invalid for some kind of security through obscurity reason, since it seems like they should be failing with a useful message in these situations.
I can't find anything online or in the AWS docs that suggests there's a way to validate kms keys before using them. As far as I can tell, you just have to use them, wait, and check the error when it eventually fails. I don't think this is something we can catch in the prevalidate stage.
From the SDK docs:
// AWS parses KmsKeyId asynchronously, meaning that the action you call may
// appear to complete even though you provided an invalid identifier. This action
// will eventually report failure.
//
// The specified CMK must exist in the region that the snapshot is being copied
// to.
I also checked the SDK code to see whether there was a better error message that we could bubble up to make live easier on you if it does fail with this message. No dice. There's nothing getting returned to make it clear what's going on here. I think we're out of luck on making this a user friendly experience. :( .
If I come across something that I think could improve things, I'll reopen.
I'm going to lock this issue because it has been closed for _30 days_ ⏳. This helps our maintainers find and focus on the active issues.
If you have found a problem that seems similar to this, please open a new issue and complete the issue template so we can capture all the details necessary to investigate further.