Terraform 0.11.11
AWS provider 1.59.0
Terraform should create the defined Service Linked Role, and then create the AWS KMS key which has a policy referencing that role, and the AutoScaling Group that uses that role.
Terraform creates the role, but will often produce an error at the latter steps.
For example:
aws_kms_key.ami_key: MalformedPolicyDocumentException: Policy contains a statement with one or more invalid principles.
or an equivalent error while trying to create the AutoScaling Group, stating the role does not.
Running Terraform again works correctly, as the role has already been created.
This appears to be an issue with eventual consistency, similar to ones with IAM Roles which have already been solved. Suspect the solution is adding retries to the kms_key and autoscaling_group if they receive the error.
Can you please provide example configuration(s) so we can write covering acceptance tests? Thanks.
Attached is a file that should replicate the problem. Note that it's not occurring 100% of the time, but most of the time, running terraform apply for the first time produces one or both of the following errors:
```
aws_kms_key.kms: 1 error(s) occurred:
aws_kms_key.kms: MalformedPolicyDocumentException: Policy contains a statement with one or more invalid principals.
status code: 400, request id: edb01b4f-1fc0-43fe-9b4d-d57c2fbdab48
````
tf.txt
Hi @ineffyble ๐ Thank you so much for the minimal reproduction configuration. That should work great for reproducing this against both resources and ensuring the fixes cover them.
The maintainers will be heads down with version 2.0.0 development and testing work for the next week or so, but hopefully we can address this shortly thereafter (unless someone from the community picks this up ๐).
Came here looking for the migrated copy of this issue
This ticket seems closest.
This is a royal PITA.
Potential workarounds ;
null_resource that triggers on changes to the role unique_id attribute and executes a provisioner that does sleep 10 or similar. Depend on this resource in resources that need the role ARN to exist.The fix for this resource to wait up to 2 minutes for IAM change propagation (fairly standard across the provider) has been merged and will release with version 2.60.0 of the Terraform AWS Provider, later this week. ๐
This has been released in version 2.60.0 of the Terraform AWS provider. Please see the Terraform documentation on provider versioning or reach out if you need any assistance upgrading.
For further feature requests or bug reports with this functionality, please create a new GitHub issue following the template for triage. Thanks!
Hi folks, sorry for a tangential question, but we're having this issue for regular IAM roles. @ineffyble mentioned this was already solved for those. Could you provide details how? Thanks!
Looks like version 2.60.0 of the provider now unifies the timeout period for IAM waits. The implication is that dependent resources each have code to wait on this ... what might be quite elegant is if the waiting is in the resource that is the source of the attribute itself and dependent resources can thus just wait for it to deliver ; that would work across the framework for all resources that have a wait involved.
@awilkins thanks - I see in the release notes it mentions a few dependent resources that support waiting. This is nice, however in our case I tried to use depends_on from unrelated resources (i.e. any non-IAM resources whose management depends on the role being created first). I understand this is probably an edge case, but would be nice to get some resolution to it.
It sounds like a great idea to move the wait into the aws_iam_role_policy etc. resources themselves like you mentioned (at least as an option, like wait_for_propagation = true). This way, _any_ resources that depend on them (either by reference or via depend_on), can trust that the policy has been propagated.
I'm going to lock this issue because it has been closed for _30 days_ โณ. This helps our maintainers find and focus on the active issues.
If you feel this issue should be reopened, we encourage creating a new issue linking back to this one for added context. Thanks!