Terraform: provider/aws: iam_instance_profile not yet ready when ec2 instance is launched

Created on 9 May 2015  路  46Comments  路  Source: hashicorp/terraform

When launching an EC2 instance with a new IAM instance profile, Terraform returns an error:

aws_instance.worker.0: Error: 1 error(s) occurred:

* Error launching source instance: InvalidParameterValue: Value (worker) for parameter 
  iamInstanceProfile.name is invalid. Invalid IAM Instance Profile name

I believe this is due to a problem mentioned here in the AWS docs:

After you create an IAM role, it may take several seconds for the permissions to propagate. If your first attempt to launch an instance with a role fails, wait a few seconds before trying again. For more information, see Troubleshooting Working with Roles in the Using IAM guide.

I think Terraform should catch this error and retry when it occurs.

Right now, simply re-running plan/apply solves the issue for me.

bug provideaws

Most helpful comment

Likewise, hitting this in v0.9.3. It's still resulting in a stray instance, too.

All 46 comments

I am unsure how to solve this properly. How can we be sure the instance profile is propagated? Simply checking if it exists is not enough, because the problem here is that it simply isn't visible to the created instance yet.

At the same time, the error simply returns HTTP code 500, so we can't specifically know why the creation of the instance failed, so simply retrying on that error also doesn't seem like the way to go.

Furthermore, I suspect this may actually need to be solved upstream by the Go AWS SDK.

I "solved" it for now by adding time.Sleep(7 * time.Second) to the end of aws.instanceProfileSetRoles, but this obviously doesn't scale. Also, 5 seconds still gave me errors from time to time, only at 7 did the errors completely go away.

+1 I just ran into this issue as well.

+1

aaaaand I just hit this as well. I wonder if a new terraform option for retry or (uggg) sleep should be added as a generic resource option for advanced usage. I hate it but .....

We've solved these sorts of issues with the AWS api by doing a smart retry with a back-off. Try again, then back-off, try again, back-off a little more, try again, back-off a little more, error.

@adamhjk Yep! We do the same thing, but the error message here is one we don't catch yet. We'll add that.

We solved this on thursday with a local-exec provisioner on the iam_instance_profile w/ an inline command = "sleep 5". Experimentation may be needed to find the right sleep. It sucks to put in artificial sleeps in local-execs but it currently avoids having to re-run apply so that's a win imo. I agree with the backoff patch being the correct solution. Just sharing our bandaid.

Glad to find this. Thought I was going crazy.

Proposing #2037 to fix this

Aaand I've finally hit this issue as well !
I'm on

terraform version
Terraform v0.5.3-dev (a6b8b65e6e0b5cf1f0b4fcf3f6abde3b7db21a97)

which has been built 5 days after the PR has been merged.
The above PR #2037 does not fix the issue for me.

Am I missing something obvious ?

When I apply all resources on the 1st run ( from scratch - no resources present ) TF fails because of the slow IAM propagation(see error output below)
When I apply the plan the 2nd time, naturally TF sees the resources that haven't been created ( 3 ASGs+LC ) and creates them since the IAM instance profile has propagated by the end of the first TF run.

Error output when creating all VPC resources in one run:

...
aws_launch_configuration.jenkinsmaster-cps-LC: Creating...
  associate_public_ip_address:               "" => "0"
  ebs_block_device.#:                        "" => "<computed>"
  ebs_optimized:                             "" => "<computed>"
  iam_instance_profile:                      "" => "ec2describe_profile"
  image_id:                                  "" => "ami-fd4fda8a"
  instance_type:                             "" => "t2.medium"
  key_name:                                  "" => "ops-ec2-provision"
  name:                                      "" => "jenkinsmaster-cps-LC"
  root_block_device.#:                       "" => "1"
  root_block_device.0.delete_on_termination: "" => "1"
  root_block_device.0.iops:                  "" => "<computed>"
  root_block_device.0.volume_size:           "" => "64"
  root_block_device.0.volume_type:           "" => "gp2"
  security_groups.#:                         "" => "2"
  security_groups.14195040:                  "" => "sg-84ac82e1"
  security_groups.4045803087:                "" => "sg-bbac82de"
  user_data:                                 "" => "545c0308b7b339c0a68c4b902777506379e370bb"
aws_launch_configuration.jenkinsmaster-cps-LC: Error: 1 error(s) occurred:

* Error creating launch configuration: ValidationError: Invalid IamInstanceProfile: ec2describe_profile
        status code: 400, request id: [96bbced4-03c1-11e5-99a5-b17026acf417]
aws_launch_configuration.jenkinsmaster-ops-LC: Error: 1 error(s) occurred:

* Error creating launch configuration: ValidationError: Invalid IamInstanceProfile: ec2describe_profile
        status code: 400, request id: [96bf5089-03c1-11e5-a281-51c7f0bf4f0e]
aws_elb.jmaster: Creation complete
...
Error applying plan:

3 error(s) occurred:

* 1 error(s) occurred:

* 1 error(s) occurred:

* Error creating launch configuration: ValidationError: Invalid IamInstanceProfile: ec2describe_profile
        status code: 400, request id: [96292ba9-03c1-11e5-ba98-45c0c04e3ac5]
* 1 error(s) occurred:

* 1 error(s) occurred:

* Error creating launch configuration: ValidationError: Invalid IamInstanceProfile: ec2describe_profile
        status code: 400, request id: [96bbced4-03c1-11e5-99a5-b17026acf417]
* 1 error(s) occurred:

* 1 error(s) occurred:

* Error creating launch configuration: ValidationError: Invalid IamInstanceProfile: ec2describe_profile
        status code: 400, request id: [96bf5089-03c1-11e5-a281-51c7f0bf4f0e]

Terraform does not automatically rollback in the face of errors.
Instead, your Terraform state file has been partially updated with
any resources that successfully completed. Please address the error
above and apply again to incrementally change your infrastructure.

This has always been a problem with the AWS API, some methods results when evoked are eventually consistent. I believe there are more cases then just IAM. I've definitely run into other cases over the years. So maybe a more generic approach is a good idea? Incremental back off seems like a decent idea, maybe parameterising it on a resource level, aka adding a backoff_step (time in seconds) and backoff_attempts properties to resources, then calculating the amount of time/cumulative step from there. this would at least allow people to sidestep the issue rather then it being a blocker, as there will inevitably be more cases like this, such is the nature of the AWS API. I'd definitely like to be able to change the behaviour via config over changing source and recompiling terraform.

@stefancocora do you have a sample config that reproduces this error for you consistently? #2037 addressed the simple case I was able to reproduce, but maybe it was too simple.

If you do have such a config, please share, being sure to omit any secrets!
Thanks!

As a note I do not get the same issue if the instances are booted by and autoscale group. this race condition is between launching an instance directly with the aws_instance resource that depends on a IAM instance profile directly. So if you can use autoscaling it side steps the issue as AWS must make sure instance profile is available, or enough time passes before it tries to create the minimum instances in the launch config.

PR #2037 does not fix the issue for me, either...

$ terraform --version
Terraform v0.5.3

$ terraform apply
<snip>
aws_instance.container_instance.2: Creating...
  ami:                        "" => "ami-5f59ac34"
  availability_zone:          "" => "<computed>"
  ebs_block_device.#:         "" => "<computed>"
  ephemeral_block_device.#:   "" => "<computed>"
  iam_instance_profile:       "" => "ecs-iam-instance-profile"
  instance_type:              "" => "t2.micro"
  key_name:                   "" => "my-aws-key"
  placement_group:            "" => "<computed>"
  private_dns:                "" => "<computed>"
  private_ip:                 "" => "<computed>"
  public_dns:                 "" => "<computed>"
  public_ip:                  "" => "<computed>"
  root_block_device.#:        "" => "<computed>"
  security_groups.#:          "" => "1"
  security_groups.1869473479: "" => "sg-37d13558"
  subnet_id:                  "" => "subnet-219d9867"
  tenancy:                    "" => "<computed>"
  user_data:                  "" => "cc1021014e8c036ecbe0f8a2abd196b003dac173"
  vpc_security_group_ids.#:   "" => "<computed>"
2015/06/22 16:55:51 [DEBUG] root: eval: *terraform.EvalWriteState
2015/06/22 16:55:51 [DEBUG] root: eval: *terraform.EvalApplyProvisioners
2015/06/22 16:55:51 [DEBUG] root: eval: *terraform.EvalIf
2015/06/22 16:55:51 [DEBUG] root: eval: *terraform.EvalWriteDiff
2015/06/22 16:55:51 [DEBUG] root: eval: *terraform.EvalIf
2015/06/22 16:55:51 [DEBUG] root: eval: *terraform.EvalWriteState
2015/06/22 16:55:51 [DEBUG] root: eval: *terraform.EvalApplyPost
2015/06/22 16:55:51 [DEBUG] root: eval: *terraform.EvalUpdateStateHook
aws_iam_role_policy.elb_access: Creation complete
2015/06/22 16:55:52 terraform-provider-aws: 2015/06/22 16:55:52 [DEBUG] Invalid IAM Instance Profile referenced, retrying...
2015/06/22 16:55:52 terraform-provider-aws: 2015/06/22 16:55:52 [DEBUG] Invalid IAM Instance Profile referenced, retrying...
2015/06/22 16:55:52 terraform-provider-aws: 2015/06/22 16:55:52 [DEBUG] Invalid IAM Instance Profile referenced, retrying...
2015/06/22 16:55:52 terraform-provider-aws: 2015/06/22 16:55:52 [DEBUG] Invalid IAM Instance Profile referenced, retrying...
2015/06/22 16:55:54 terraform-provider-aws: 2015/06/22 16:55:54 [DEBUG] Invalid IAM Instance Profile referenced, retrying...
2015/06/22 16:55:54 terraform-provider-aws: 2015/06/22 16:55:54 [DEBUG] Invalid IAM Instance Profile referenced, retrying...
2015/06/22 16:55:54 terraform-provider-aws: 2015/06/22 16:55:54 [DEBUG] Invalid IAM Instance Profile referenced, retrying...
2015/06/22 16:55:54 terraform-provider-aws: 2015/06/22 16:55:54 [DEBUG] Invalid IAM Instance Profile referenced, retrying...
2015/06/22 16:55:56 terraform-provider-aws: 2015/06/22 16:55:56 [DEBUG] Invalid IAM Instance Profile referenced, retrying...
2015/06/22 16:55:56 terraform-provider-aws: 2015/06/22 16:55:56 [DEBUG] Invalid IAM Instance Profile referenced, retrying...
2015/06/22 16:55:56 terraform-provider-aws: 2015/06/22 16:55:56 [DEBUG] Invalid IAM Instance Profile referenced, retrying...
2015/06/22 16:55:57 terraform-provider-aws: 2015/06/22 16:55:57 [DEBUG] Invalid IAM Instance Profile referenced, retrying...
2015/06/22 16:55:58 terraform-provider-aws: 2015/06/22 16:55:58 [DEBUG] Invalid IAM Instance Profile referenced, retrying...
2015/06/22 16:55:58 terraform-provider-aws: 2015/06/22 16:55:58 [DEBUG] Invalid IAM Instance Profile referenced, retrying...
2015/06/22 16:55:58 terraform-provider-aws: 2015/06/22 16:55:58 [DEBUG] Invalid IAM Instance Profile referenced, retrying...
2015/06/22 16:55:59 terraform-provider-aws: 2015/06/22 16:55:59 [DEBUG] Invalid IAM Instance Profile referenced, retrying...
2015/06/22 16:56:00 terraform-provider-aws: 2015/06/22 16:56:00 [DEBUG] Invalid IAM Instance Profile referenced, retrying...
2015/06/22 16:56:00 terraform-provider-aws: 2015/06/22 16:56:00 [DEBUG] Invalid IAM Instance Profile referenced, retrying...
2015/06/22 16:56:00 terraform-provider-aws: 2015/06/22 16:56:00 [DEBUG] Invalid IAM Instance Profile referenced, retrying...
2015/06/22 16:56:01 terraform-provider-aws: 2015/06/22 16:56:01 [DEBUG] Invalid IAM Instance Profile referenced, retrying...
2015/06/22 16:56:02 [DEBUG] root: eval: *terraform.EvalWriteState
2015/06/22 16:56:02 [DEBUG] root: eval: *terraform.EvalApplyProvisioners
2015/06/22 16:56:02 [DEBUG] root: eval: *terraform.EvalIf
2015/06/22 16:56:02 [DEBUG] root: eval: *terraform.EvalWriteDiff
2015/06/22 16:56:02 [DEBUG] root: eval: *terraform.EvalIf
2015/06/22 16:56:02 [DEBUG] root: eval: *terraform.EvalWriteState
2015/06/22 16:56:02 [DEBUG] root: eval: *terraform.EvalApplyPost
2015/06/22 16:56:02 [ERROR] root: eval: *terraform.EvalApplyPost, err: 1 error(s) occurred:

* Error launching source instance: InvalidParameterValue: Value (ecs-iam-instance-profile) for parameter iamInstanceProfile.name is invalid. Invalid IAM Instance Profile name
    status code: 400, request id: []
2015/06/22 16:56:02 [ERROR] root: eval: *terraform.EvalSequence, err: 1 error(s) occurred:

* Error launching source instance: InvalidParameterValue: Value (ecs-iam-instance-profile) for parameter iamInstanceProfile.name is invalid. Invalid IAM Instance Profile name
    status code: 400, request id: []
2015/06/22 16:56:02 [ERROR] root: eval: *terraform.EvalOpFilter, err: 1 error(s) occurred:

* Error launching source instance: InvalidParameterValue: Value (ecs-iam-instance-profile) for parameter iamInstanceProfile.name is invalid. Invalid IAM Instance Profile name
    status code: 400, request id: []
2015/06/22 16:56:02 [ERROR] root: eval: *terraform.EvalSequence, err: 1 error(s) occurred:

* Error launching source instance: InvalidParameterValue: Value (ecs-iam-instance-profile) for parameter iamInstanceProfile.name is invalid. Invalid IAM Instance Profile name
    status code: 400, request id: []
aws_instance.container_instance.2: Error: 1 error(s) occurred:
<snip>

@knuckolls local-exec provisioner workaround has gotten us past the issue for now.

@ljohnston What's the command you're local-exec-ing?

@jkodroff I would assume it's something like "sleep 10"

@jkodroff ...

    provisioner "local-exec" {
        command = "sleep 10"
    }

Just hit this same issue on terraform v0.6.4.

Lots of errors like this:

* aws_instance.nat: Error launching source instance: InvalidParameterValue: Value (iam-nat-profile) for parameter iamInstanceProfile.name is invalid. Invalid IAM Instance Profile name

Adding sleep 10 in local-exec provisioner did not help.

I'm hitting this error with 0.6.12

I am having the same issue with 0.6.15 and it's very intermediate.

+1

+1

+1 seem to be hitting this with latest Terraform v0.7.11

+1

+1

@pidah @bsomogyi what versions are you having this issue with?

0.7.11 having issues here

kubernetes v1.4.5

kube-up.sh
KUBE_AWS_ZONE us-east-1c
NUM_NODES 1
AWS_S3_REGION us-east-1
AWS_S3_BUCKET medssenger-kubernetes-artifacts-095076444211
INSTANCE_PREFIX
KUBERNETES_PROVIDER aws
KUBERNETES_SKIP_DOWNLOAD
PATH /usr/local/bin/kubernetes/platforms/linux/amd64:/usr/local/bin/kubernetes/cluster:/usr/lib/android-sdk-linux/tools:/usr/lib/android-sdk-linux/platform-tools:/usr/local/bin/kubernetes/platforms/linux/amd64:/usr/local/bin/kubernetes/cluster:/usr/lib/android-sdk-linux/tools:/usr/lib/android-sdk-linux/platform-tools:/usr/local/bin/kubernetes/platforms/linux/amd64:/usr/local/bin/kubernetes/cluster:/usr/lib/android-sdk-linux/tools:/usr/lib/android-sdk-linux/platform-tools:/home/stens/bin:/home/stens/.local/bin:/usr/local/go/bin:/opt/code/gopath/bin:/home/stens/nodejs/node-v4.6.0-linux-x64/bin:/home/stens/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/usr/games:/usr/local/games:/snap/bin:.:/home/stens/bin:/usr/local/java/jdk/bin
... Starting cluster in us-east-1c using provider aws
... calling verify-prereqs
... calling kube-up
Starting cluster using os distro: jessie
Uploading to Amazon S3
+++ Staging server tars to S3 Storage: medssenger-kubernetes-artifacts-095076444211/devel
upload: ../../tmp/kubernetes.Ogv43m/s3/bootstrap-script to s3://medssenger-kubernetes-artifacts-095076444211/devel/bootstrap-script
Uploaded server tars:
  SERVER_BINARY_TAR_URL: https://s3.amazonaws.com/medssenger-kubernetes-artifacts-095076444211/devel/kubernetes-server-linux-amd64.tar.gz
  SALT_TAR_URL: https://s3.amazonaws.com/medssenger-kubernetes-artifacts-095076444211/devel/kubernetes-salt.tar.gz
  BOOTSTRAP_SCRIPT_URL: https://s3.amazonaws.com/medssenger-kubernetes-artifacts-095076444211/devel/bootstrap-script
INSTANCEPROFILE arn:aws:iam::095076444211:instance-profile/kubernetes-master    2016-11-16T20:00:23Z    AIPAIYKOIB57IQONVU4HM   kubernetes-master   /
ROLES   arn:aws:iam::095076444211:role/kubernetes-master    2016-11-16T20:00:22Z    /   AROAI2JWOJIEUSRUVYMFI   kubernetes-master
ASSUMEROLEPOLICYDOCUMENT    2012-10-17
STATEMENT   sts:AssumeRole  Allow
PRINCIPAL   ec2.amazonaws.com
INSTANCEPROFILE arn:aws:iam::095076444211:instance-profile/kubernetes-minion    2016-11-16T20:00:27Z    AIPAJIP5SVUO2P2GNREOM   kubernetes-minion   /
ROLES   arn:aws:iam::095076444211:role/kubernetes-minion    2016-11-16T20:00:25Z    /   AROAICX267IM5SRP3724W   kubernetes-minion
ASSUMEROLEPOLICYDOCUMENT    2012-10-17
STATEMENT   sts:AssumeRole  Allow
PRINCIPAL   ec2.amazonaws.com
Using SSH key with (AWS) fingerprint: SHA256:jKVPPixVlF+Ro5aj3Sa8SmhIX/dihAXiZOr00J5FAao
Using VPC vpc-c08471a6
Adding tag to dopt-a9e504ce: Name=kubernetes-dhcp-option-set
Adding tag to dopt-a9e504ce: KubernetesCluster=kubernetes
Using DHCP option set dopt-a9e504ce
Using existing subnet with CIDR 172.20.0.0/24
Using subnet subnet-7079c85d
Using Internet Gateway igw-3ab5085d
Associating route table.
Associating route table rtb-b7c49cd1 to subnet subnet-7079c85d
Adding route to route table rtb-b7c49cd1
Using Route Table rtb-b7c49cd1
Generating certs for alternate-names: IP:34.192.2.212,IP:172.20.0.9,IP:10.0.0.1,DNS:kubernetes,DNS:kubernetes.default,DNS:kubernetes.default.svc,DNS:kubernetes.default.svc.cluster.local,DNS:kubernetes-master
Starting Master

An error occurred (InvalidParameterValue) when calling the RunInstances operation: Value (kubernetes-master) for parameter iamInstanceProfile.name is invalid. Invalid IAM Instance Profile name

SOLUTION - teardown cluster and retry ... internally shielded from enduser this error fixes itself so no need to manually muck about to create any IAM gadgetry

    kube-down.sh
    kube-up.sh

We've been using 0.7.5 and have been hitting the issue. So, we tried 0.7.11 today and it had the same problem.

I finally got past it using the sleep hack, started at 5 seconds and had to bump it all the way to 90s before deployment would succeed

resource "aws_iam_instance_profile" "ec2_instance_profile" {
  name  = "instance_profile"
  roles = ["${aws_iam_role.ec2_instance_role.name}"]

  provisioner "local-exec" {
    command = "sleep 90"
  }
}

Same issue here

Just hit this with Terraform v0.8.6.

Perhaps worth mentioning is that this, at least in my case, usually leads to a stray EC2 instance running on EC2 that isn't picked up by Terraform and requires manual termination/cleanup.

I haven't hit this in terraform but when using the API directly. I often have to put sleeps in for things that often take a little bit of time to propagate across aws services.

I've noticed that instance profiles take quite a while (relative to other resources) to find seen as a valid profile to attach to an instance.

I just wanted to note that CloudFormation appears to wait (sleep) a full 2 minutes (!) after an instance profile is created! (Assuming this is still accurate, over a year later) https://forums.aws.amazon.com/thread.jspa?messageID=593651

Found in v0.9.1

Likewise, hitting this in v0.9.3. It's still resulting in a stray instance, too.

Still hitting this bug also on 0.8.x so I am not sure why it's closed

Found in v0.9.4 also.

Also hitting this in v0.9.0.4. Running a second time does solve the issue, but would be great to have a longer term fix.

Yep the second run worked for me as well.

I do have similar issue

 aws_instance.kub_master: Error launching source instance: InvalidParameterValue: Value (instance_profile_master) for parameter iamInstanceProfile.name is invalid. Invalid IAM Instance Profile name

Putting sleep for 2 min works for me.

  provisioner "local-exec" {
    command = "sleep 120"
  }

Ref: https://forums.aws.amazon.com/thread.jspa?messageID=593651

Still present in 0.9.10

I think this issue occurs because when terraform tries to find aws_iam_instance_profile.example_profile.name it is already available to it locally from the config that we provide and it need not wait for the whole resource to actually appear before if tries to spawn ec2 with that profile name. This can be avoided if we explicitly use a computed value of aws_iam_instance_profile.example_profile like arn. That way terraform needs to fetch it before it can use it and it will only be available to terraform once creation is complete.
Adding this part in my aws_instance worked for me. Not sure if this will work for everybody.

provisioner "local-exec" {
  command = "echo ${aws_iam_instance_profile.example_profile.arn}"
}

Let me know if this works for anyone else too. Better than waiting for some random amount of time.

I think this issue occurs because when terraform tries to find aws_iam_instance_profile.example_profile.name it is already available to it locally from the config

It's worse than that. aws iam wait instance-profile-exists will poll AWS and return successfully. However, the poll will prematurely return true before permissions have propagated, so an immediate aws ec2 run-instances still fails, despite taking care to poll for it >_<

Why this issue is closed? It is still exist in v0.11.7.

So far I'm using this ways, which is weird:

resource "aws_iam_instance_profile" "k8s-master" {
  name = "k8s-master"
  role = "${aws_iam_role.k8s-master.name}"

  provisioner "local-exec" {
    command = "aws iam wait instance-profile-exist k8s-master && sleep 5"
  }
}

Hi all,

Issues with the terraform aws provider should be opened in the aws provider repository.

Because this closed issue is generating notifications for subscribers, I am going to lock it and encourage anyone experiencing issues with the aws provider to open tickets there.

Please continue to open issues here for any other terraform issues you encounter, and thanks!

Was this page helpful?
0 / 5 - 0 ratings

Related issues

zeninfinity picture zeninfinity  路  3Comments

ronnix picture ronnix  路  3Comments

franklinwise picture franklinwise  路  3Comments

rjinski picture rjinski  路  3Comments

rnowosielski picture rnowosielski  路  3Comments