In 1.6.2 retry logic was added in (https://github.com/hashicorp/packer/pull/9810). This appears to incorrectly retry errors where multiple possible instance types are requested, but one of them is unavailable in the selected availability zone. When this occurs the spot fleet request is successfully created, but packer believes it was an error and retries. This results in 11+ orphan spot instances being created and packer returning an error.
Example is for ap-southeast-2
Specify the following (multiple instance types is key):
"spot_price": "auto",
"spot_instance_types": "m5.xlarge, m5.large",
Try and build an instance in an AZ where one instance type is unavailable. In this case m5.large is unavailable in apse2-az3 (ap-southeast-2a for this account)
1.6.2
{
"builders": [
{
"type": "amazon-ebs",
"iam_instance_profile": "my-instance-profile",
"region": "ap-southeast-2",
"source_ami_filter": {
"filters": {
"name": "my-ami-name"
},
"owners": ["xxxxxx"],
"most_recent": true
},
"vpc_filter": {
"filters": {
"tag:Name": "My VPC"
}
},
"subnet_filter": {
"filters": {
"tag:Name": "AZ A"
}
},
"spot_price": "auto",
"spot_instance_types": "m5.xlarge, m5.large"
}
]
}
RHEL7 x86
13:10:52 ==> amazon-ebs: Prevalidating any provided VPC information
13:10:52 ==> amazon-ebs: Prevalidating AMI Name: aminamegoeshere
13:10:54 amazon-ebs: Found Image ID: amiidgoeshere
13:10:54 amazon-ebs: Found VPC ID: vpcidgoeshere
13:10:54 amazon-ebs: Found Subnet ID: subnetgoeshere
13:10:55 ==> amazon-ebs: Creating temporary keypair: packerkeypairgoeshere
13:10:55 ==> amazon-ebs: Creating temporary security group for this instance: packerkeypairgoeshere
13:10:56 ==> amazon-ebs: Authorizing access to port 22 from [0.0.0.0/0] in the temporary security groups...
13:10:58 ==> amazon-ebs: Launching a spot AWS instance...
13:10:58 ==> amazon-ebs: Interpolating tags for spot instance...
13:10:58 amazon-ebs: Adding tag: "OptOutRoleReplacement": "True"
13:10:58 amazon-ebs: Loading User Data File...
13:10:58 amazon-ebs: Creating Spot Fleet launch template...
13:14:35 ==> amazon-ebs: Error waiting for fleet request (fleetrequestgoeshere) to become ready:Your requested instance type ( m5.large) is not supported in your requested Availability Zone (ap-southeast-2a).
13:14:35 ==> amazon-ebs: No volumes to clean up, skipping
13:14:35 ==> amazon-ebs: Deleting temporary security group...
13:14:54 ==> amazon-ebs: Error cleaning up security group. Please delete the group manually: err: DependencyViolation: resource sgidgoeshere has a dependent object
[2020-09-10T03:14:54.076Z] ==> amazon-ebs: status code: 400, request id: requestidgoeshere security group ID: sgidgoeshere
13:14:54 ==> amazon-ebs: Deleting temporary keypair...
13:14:54 Build 'amazon-ebs' errored after 4 minutes 766 milliseconds: Error waiting for fleet request (fleetidgoeshere) to become ready:Your requested instance type ( m5.large) is not supported in your requested Availability Zone (ap-southeast-2a).
Hello, thanks for reaching out! We'll try to take a look for the next release.
Hey there, I did some changes to the retry mechanics for the fleet creating. Could you try out the solution and let me know if it fixes the problem?
Here the binaries: https://app.circleci.com/pipelines/github/hashicorp/packer/7216/workflows/b19c85d6-6c25-47f5-a94f-313394c573ad/jobs/81883/artifacts
Thanks for that! So it's an improvement in that it now only launches one instance, but packer seems to ignore this instance and considers the whole build a failure. By the look of the code the cause for this one might be the removal of this guard: https://github.com/hashicorp/packer/blob/788dc3259804cedc5d8861e7391440e90ef5c518/builder/amazon/common/step_run_spot_instance.go#L297
Looks like the old code ignored errors if an instance was actually launched, whereas the new code just decides to give up and die on any error. Very 2020.
Log:
09:39:24 amazon-ebs: Creating Spot Fleet launch template...
09:39:27 ==> amazon-ebs: Error waiting for fleet request (fleet-241c33e6-9edb-8ba2-a6b2-8120a1c43af4) to become ready:Your requested instance type ( c5.xlarge) is not supported in your requested Availability Zone (ap-southeast-2a).Your requested instance type ( m5.4xlarge) is not supported in your requested Availability Zone (ap-southeast-2a).Your requested instance type ( m5a.xlarge) is not supported in your requested Availability Zone (ap-southeast-2a).Your requested instance type ( m5a.large) is not supported in your requested Availability Zone (ap-southeast-2a).Your requested instance type ( m5.large) is not supported in your requested Availability Zone (ap-southeast-2a).Your requested instance type ( r5a.xlarge) is not supported in your requested Availability Zone (ap-southeast-2a).Your requested instance type ( c5.4xlarge) is not supported in your requested Availability Zone (ap-southeast-2a).Your requested instance type ( m5a.4xlarge) is not supported in your requested Availability Zone (ap-southeast-2a).Your requested instance type ( r5.xlarge) is not supported in your requested Availability Zone (ap-southeast-2a).Your requested instance type ( m5.xlarge) is not supported in your requested Availability Zone (ap-southeast-2a).Your requested instance type ( r5.4xlarge) is not supported in your requested Availability Zone (ap-southeast-2a).Your requested instance type ( r5a.4xlarge) is not supported in your requested Availability Zone (ap-southeast-2a).Your requested instance type ( c5a.xlarge) is not supported in your requested Availability Zone (ap-southeast-2a).
09:39:27 ==> amazon-ebs: No volumes to clean up, skipping
09:39:27 ==> amazon-ebs: Deleting temporary security group...
09:40:00 ==> amazon-ebs: Error cleaning up security group. Please delete the group manually: err: DependencyViolation: resource sg-0994dc33556761afc has a dependent object
[2020-09-16T23:40:00.021Z] ==> amazon-ebs: status code: 400, request id: c74841a7-460f-4671-8bb3-c51090799755; security group ID: sg-0994dc33556761afc
09:40:00 ==> amazon-ebs: Deleting temporary keypair...
09:40:00 Build 'amazon-ebs' errored after 34 seconds 831 milliseconds: Error waiting for fleet request (fleet-241c33e6-9edb-8ba2-a6b2-8120a1c43af4) to become ready:Your requested instance type ( c5.xlarge) is not supported in your requested Availability Zone (ap-southeast-2a).Your requested instance type ( m5.4xlarge) is not supported in your requested Availability Zone (ap-southeast-2a).Your requested instance type ( m5a.xlarge) is not supported in your requested Availability Zone (ap-southeast-2a).Your requested instance type ( m5a.large) is not supported in your requested Availability Zone (ap-southeast-2a).Your requested instance type ( m5.large) is not supported in your requested Availability Zone (ap-southeast-2a).Your requested instance type ( r5a.xlarge) is not supported in your requested Availability Zone (ap-southeast-2a).Your requested instance type ( c5.4xlarge) is not supported in your requested Availability Zone (ap-southeast-2a).Your requested instance type ( m5a.4xlarge) is not supported in your requested Availability Zone (ap-southeast-2a).Your requested instance type ( r5.xlarge) is not supported in your requested Availability Zone (ap-southeast-2a).Your requested instance type ( m5.xlarge) is not supported in your requested Availability Zone (ap-southeast-2a).Your requested instance type ( r5.4xlarge) is not supported in your requested Availability Zone (ap-southeast-2a).Your requested instance type ( r5a.4xlarge) is not supported in your requested Availability Zone (ap-southeast-2a).Your requested instance type ( c5a.xlarge) is not supported in your requested Availability Zone (ap-southeast-2a).
09:40:00
09:40:00 ==> Wait completed after 34 seconds 832 milliseconds
Here you go: https://app.circleci.com/pipelines/github/hashicorp/packer/7229/workflows/d8853b25-14ff-4480-89a5-f10b755af560/jobs/82065/artifacts
I added back the created output instances check to make sure Packer will keep going if one instance at least was successfully created. Could you try the binaries and let me know if the build works as expected now?
Looks good to me this time, thanks! :+1:
I'm going to lock this issue because it has been closed for _30 days_ โณ. This helps our maintainers find and focus on the active issues.
If you have found a problem that seems similar to this, please open a new issue and complete the issue template so we can capture all the details necessary to investigate further.