Terraform-provider-aws: Eventual consistency issue with aws_iam_service_linked_role

Created on 22 Feb 2019 · 10Comments · Source: hashicorp/terraform-provider-aws

Community Note

Please vote on this issue by adding a 👍 reaction to the original issue to help the community and maintainers prioritize this request
Please do not leave "+1" or "me too" comments, they generate extra noise for issue followers and do not help prioritize the request
If you are interested in working on this issue or have submitted a pull request, please leave a comment

Terraform Version

Terraform 0.11.11
AWS provider 1.59.0

Affected Resource(s)

aws_iam_service_linked_role
aws_kms_key
aws_autoscaling_group

Expected Behavior

Terraform should create the defined Service Linked Role, and then create the AWS KMS key which has a policy referencing that role, and the AutoScaling Group that uses that role.

Actual Behavior

Terraform creates the role, but will often produce an error at the latter steps.

For example:

aws_kms_key.ami_key: MalformedPolicyDocumentException: Policy contains a statement with one or more invalid principles.

or an equivalent error while trying to create the AutoScaling Group, stating the role does not.

Running Terraform again works correctly, as the role has already been created.

This appears to be an issue with eventual consistency, similar to ones with IAM Roles which have already been solved. Suspect the solution is adding retries to the kms_key and autoscaling_group if they receive the error.

bug servicautoscaling serviciam servickms

Source

ineffyble

👍4

All 10 comments

Can you please provide example configuration(s) so we can write covering acceptance tests? Thanks.

bflad on 22 Feb 2019

Attached is a file that should replicate the problem. Note that it's not occurring 100% of the time, but most of the time, running terraform apply for the first time produces one or both of the following errors:

```

aws_autoscaling_group.asg: 1 error(s) occurred:

aws_autoscaling_group.asg: Error creating AutoScaling Group: ValidationError: ARN specified for Service-Linked Role does not exist.
status code: 400, request id: a130e5ba-3723-11e9-89d5-d53908f3cb41
aws_kms_key.kms: 1 error(s) occurred:
aws_kms_key.kms: MalformedPolicyDocumentException: Policy contains a statement with one or more invalid principals.
status code: 400, request id: edb01b4f-1fc0-43fe-9b4d-d57c2fbdab48
````
tf.txt

ineffyble on 23 Feb 2019

❤1

Hi @ineffyble 👋 Thank you so much for the minimal reproduction configuration. That should work great for reproducing this against both resources and ensuring the fixes cover them.

The maintainers will be heads down with version 2.0.0 development and testing work for the next week or so, but hopefully we can address this shortly thereafter (unless someone from the community picks this up 😄).

bflad on 23 Feb 2019

Came here looking for the migrated copy of this issue

KMS key creation fails if the policy references a recently created IAM entity

6576

This ticket seems closest.

This is a royal PITA.

Potential workarounds ;

Manually insert a null_resource that triggers on changes to the role unique_id attribute and executes a provisioner that does sleep 10 or similar. Depend on this resource in resources that need the role ARN to exist.
- Obviously this is platform dependent and sucky
??

awilkins on 9 Mar 2020

The fix for this resource to wait up to 2 minutes for IAM change propagation (fairly standard across the provider) has been merged and will release with version 2.60.0 of the Terraform AWS Provider, later this week. 👍

bflad on 30 Apr 2020

❤1

This has been released in version 2.60.0 of the Terraform AWS provider. Please see the Terraform documentation on provider versioning or reach out if you need any assistance upgrading.

For further feature requests or bug reports with this functionality, please create a new GitHub issue following the template for triage. Thanks!

hashibot[bot] on 1 May 2020

👍1

Hi folks, sorry for a tangential question, but we're having this issue for regular IAM roles. @ineffyble mentioned this was already solved for those. Could you provide details how? Thanks!

dinvlad on 6 May 2020

Looks like version 2.60.0 of the provider now unifies the timeout period for IAM waits. The implication is that dependent resources each have code to wait on this ... what might be quite elegant is if the waiting is in the resource that is the source of the attribute itself and dependent resources can thus just wait for it to deliver ; that would work across the framework for all resources that have a wait involved.

awilkins on 6 May 2020

@awilkins thanks - I see in the release notes it mentions a few dependent resources that support waiting. This is nice, however in our case I tried to use depends_on from unrelated resources (i.e. any non-IAM resources whose management depends on the role being created first). I understand this is probably an edge case, but would be nice to get some resolution to it.

It sounds like a great idea to move the wait into the aws_iam_role_policy etc. resources themselves like you mentioned (at least as an option, like wait_for_propagation = true). This way, _any_ resources that depend on them (either by reference or via depend_on), can trust that the policy has been propagated.

dinvlad on 7 May 2020

I'm going to lock this issue because it has been closed for _30 days_ ⏳. This helps our maintainers find and focus on the active issues.

If you feel this issue should be reopened, we encourage creating a new issue linking back to this one for added context. Thanks!

hashibot[bot] on 30 May 2020

Was this page helpful?

0 / 5 - 0 ratings

Related issues

Client VPN Endpoint - Add Ingress Authorisation

Bwanabanana · 46Comments

Get Private IP of instance launched by autoscaling group

hashibot · 50Comments

Stop instances

hashibot · 38Comments

Doesn't ask MFA token code when using assume_role with MFA required

jsi-p · 33Comments

AWS Lex

jch254 · 37Comments