Packer: Possible race condition creating security groups with amazon-ebs builder

Created on 15 Aug 2015  Â·  11Comments  Â·  Source: hashicorp/packer

I'm occasionally getting the following error message when using the amazon-ebs builder with an unchanged configuration to provision an EC2 box (i.e., I do the same thing repeatedly and only get an error randomly). It looks like sometimes the instance starts launching before the security group is done being created.

==> amazon-ebs: Creating temporary security group for this instance...�
==> amazon-ebs: Authorizing access to port 22 the temporary security group...
==> amazon-ebs: Launching a source AWS instance...
==> amazon-ebs: Error launching source instance: InvalidGroup.NotFound: The security group 'sg-56862632' does not exist in VPC 'vpc-65a13300'

This was also seen in #1322. The packer configuration that's occasionally showing this error is trying to spin up the default Amazon Linux AMI in us-west-2. I'm using Packer 0.8.5 on Amazon Linux.

I'm trying to reproduce it again to get some full PACKER_LOG=1 logs. I'll post those when I catch it.

bug buildeamazon

Most helpful comment

I've encountered this today as well on Atlas.

Packer v0.10.1

amazon-ebs output will be in this color.

==> amazon-ebs: Prevalidating AMI Name...
==> amazon-ebs: Inspecting the source AMI...
==> amazon-ebs: Creating temporary keypair: packer XXX-YYY-XXX-YYYY-XXXYYY
==> amazon-ebs: Creating temporary security group for this instance...
==> amazon-ebs: Authorizing access to port 22 the temporary security group...
==> amazon-ebs: Launching a source AWS instance...
==> amazon-ebs: Error launching source instance: InvalidGroup.NotFound: The security group 'sg-2557845e' does not exist in VPC 'vpc-6882c00c'
==> amazon-ebs:     status code: 400, request id:
==> amazon-ebs: No AMIs to cleanup
==> amazon-ebs: Deleting temporary security group...
==> amazon-ebs: Deleting temporary keypair...
Build 'amazon-ebs' errored: Error launching source instance: InvalidGroup.NotFound: The security group 'sg-2557845e' does not exist in VPC 'vpc-6882c00c'
    status code: 400, request id: 

==> Some builds didn't complete successfully and had errors:
--> amazon-ebs: Error launching source instance: InvalidGroup.NotFound: The security group 'sg-2557845e' does not exist in VPC 'vpc-6882c00c'
    status code: 400, request id: 

I can build just fine when running Packer locally.

$ packer build template.json 
amazon-ebs output will be in this color.

==> amazon-ebs: Prevalidating AMI Name...
==> amazon-ebs: Inspecting the source AMI...
==> amazon-ebs: Creating temporary keypair: packer XXX-YYY-XXX-YYYY-XXXYYY
==> amazon-ebs: Creating temporary security group for this instance...
==> amazon-ebs: Authorizing access to port 22 the temporary security group...
==> amazon-ebs: Launching a source AWS instance...
    amazon-ebs: Instance ID: i-0244132392e943e9a
==> amazon-ebs: Waiting for instance (i-0244132392e943e9a) to become ready...
==> amazon-ebs: Waiting for SSH to become available..

All 11 comments

Thanks for the report! The race detector from go 1.5 identifies a lot of things in packer that I have not had a chance to clean up yet, so this is a pretty good theory. Also IIRC some amazon resources are created asynchronously and the API reports they are ready when actually they are not. I don't recall whether security groups falls into this category, but there's a chance we can improve the wait for ready / retry around this to make sure we block for dependent resources.

+1, can reproduce here as well. You'll have to poll the API until it is confirmed to exist.

I've encountered this today as well on Atlas.

Packer v0.10.1

amazon-ebs output will be in this color.

==> amazon-ebs: Prevalidating AMI Name...
==> amazon-ebs: Inspecting the source AMI...
==> amazon-ebs: Creating temporary keypair: packer XXX-YYY-XXX-YYYY-XXXYYY
==> amazon-ebs: Creating temporary security group for this instance...
==> amazon-ebs: Authorizing access to port 22 the temporary security group...
==> amazon-ebs: Launching a source AWS instance...
==> amazon-ebs: Error launching source instance: InvalidGroup.NotFound: The security group 'sg-2557845e' does not exist in VPC 'vpc-6882c00c'
==> amazon-ebs:     status code: 400, request id:
==> amazon-ebs: No AMIs to cleanup
==> amazon-ebs: Deleting temporary security group...
==> amazon-ebs: Deleting temporary keypair...
Build 'amazon-ebs' errored: Error launching source instance: InvalidGroup.NotFound: The security group 'sg-2557845e' does not exist in VPC 'vpc-6882c00c'
    status code: 400, request id: 

==> Some builds didn't complete successfully and had errors:
--> amazon-ebs: Error launching source instance: InvalidGroup.NotFound: The security group 'sg-2557845e' does not exist in VPC 'vpc-6882c00c'
    status code: 400, request id: 

I can build just fine when running Packer locally.

$ packer build template.json 
amazon-ebs output will be in this color.

==> amazon-ebs: Prevalidating AMI Name...
==> amazon-ebs: Inspecting the source AMI...
==> amazon-ebs: Creating temporary keypair: packer XXX-YYY-XXX-YYYY-XXXYYY
==> amazon-ebs: Creating temporary security group for this instance...
==> amazon-ebs: Authorizing access to port 22 the temporary security group...
==> amazon-ebs: Launching a source AWS instance...
    amazon-ebs: Instance ID: i-0244132392e943e9a
==> amazon-ebs: Waiting for instance (i-0244132392e943e9a) to become ready...
==> amazon-ebs: Waiting for SSH to become available..

This has been happening occasionally for me for a while and I've dealt with it. Unfortunately now it's consistent. I can no longer build images. I was on Packer 0.10, upgraded to Packer 0.10.1, same issue.

Since it's continually happening, please let me know if there's any thing I can do to help gather data.

Seconding @ateoto: this issue also occurs consistently for me after upgrading to 0.10.1.

Also seeing this fairly consistently in the last ~24 hours. I'm thinking perhaps AWS is having increased times creating sec groups, which would expose this issue more frequently

I am getting this issue on identical builds that worked before 0.10.1.

I am using the brew install of Packer on OSX 10.11.5

Pretty sure it's a degradation in AWS, they actually just updated the service page with

We are investigating increased API error rates in the US-EAST-1 Region.

Which is where these instances are attempting to spin up for me.

I checked again and I think @ateoto is correct. My builds have resumed as normal.

I am experiencing this more often. I think AWS has been more error prone lately, but I think Packer could work around the higher error rates.

This is on Linux using 0.10.1.

Yeah I kept hitting this too. The submitted fix should address it.

Was this page helpful?
0 / 5 - 0 ratings