There are a number of reports of errors of the form "timeout while waiting for state to become [some state name]" when using Terraform 0.6.12 which likely have the same root cause. This is a meta-issue to group them all together along with the work to fix them.
When fixing data races in Terraform as part of #4700, a bug was either introduced or exposed which causes genuine errors in the resource.Retry mechanism to be elided, and in some cases potentially successful operations returned the same error. A more complete description is in #5460, which introduces one potential fix.
More work is currently being done by @phinze to understand the root cause of this bug and the extent of the impact. A fix will be in the next release of Terraform.
Hey @jen20, over at https://github.com/hashicorp/terraform/issues/5534 I reported that ASG creation always timed out for me. Discovered that specifying the supposedly optional vpc_zone_identifier argument fixed the timeout issues, using 0.6.12. I also built a newer terraform from 14ca7e31561d1c81d769399cfac3145ae85d9b1f—that version still timed out without vpc_zone_identifier.
(By chance I happened upon an issue to update documentation regarding the optional aws_autoscaling_group [vpc_zone_identifier] argument while researching my timeout issue—would have never thought to specify it otherwise!)
This is a pretty big deal in my opinion. I just spent three full days (as in 24 hours) trying to debug this (and a whole string of problems in our build tooling that made that difficult, like terraform being run via wrappers that crashed when TF_LOG was set to DEBUG or higher...). I have a configuration that otherwise works, except for
* aws_launch_configuration.vault: Error creating launch configuration: timeout while waiting for state to become '[success]'
After three days and at my wits end, I took the CreateLaunchConfiguration parameters from the DEBUG-level output, and pasted them verbatim into a Python script using boto3, the Python SDK client, only adjusting for the formatting differences between the DEBUG output and proper Python syntax. The result?
botocore.exceptions.ClientError: An error occurred (AccessDenied) when calling the CreateLaunchConfiguration operation: User: arn:aws:sts::<account>:assumed-role/myrole/i-f7e1126c is not authorized to perform: autoscaling:CreateLaunchConfiguration
That's a pretty important, and easily-corrected, error to be hidden. It's also an error that likely isn't going to go away with any sane number of retries.
Hey folks, sorry for the lack of update on this issue and for the trouble that these confusing error messages have caused!
Terraform v0.6.13 did include several core improvements that prevented timeout errors from masking other error messages in the majority of configs. I'm fairly certain that on the latest Terraform (v0.6.14) you should see access denied and validation type error messages displayed right away rather than being retried and masked in a timeout error.
Please do report back if you're still seeing any trouble on the latest version, and I'll be sure to follow up. :+1:
I am still seeing timeouts with 0.6.14 while waiting for a Redshift cluster state to become available:
* aws_redshift_cluster.data_events_redshift_read_cluster: [WARN] Error waiting for Redshift Cluster state to be "available": timeout while waiting for state to become '[available]'
When I check the cluster state it is in "creating" and it is eventually created successfully.
v0.6.16 here and I'm seeing the same issue but with RDS clusters.
terraform --version
Terraform v0.6.16
* aws_rds_cluster.main: [WARN] Error waiting for RDS Cluster state to be "available": timeout while waiting for state to become '[available]'
@phinze
I'm still seeing this as of 0.7.3. Trying to spin up an EC2 instance, my run fails with aws_instance.vault-ssh: Error launching source instance: timeout while waiting for state to become 'success' but if I look at the CloudTrail logs:
u'errorCode': u'Server.InsufficientInstanceCapacity',
u'errorMessage': u'We currently do not have sufficient t2.medium capacity in the Availability Zone you requested (us-east-1c). Our system will be working on provisioning additional capacity. You can currently get t2.medium capacity by not specifying an Availability Zone in your request or choosing us-east-1e, us-east-1a, us-east-1d.',
I just got the same behaviour as jantman -- a timeout, but in CloudTrail logs showing that there's no capacity for the instance size we wanted.
The same on 0.7.13 when trying to run the first example from documentation (https://www.terraform.io/intro/getting-started/build.html). I was trying different AMIs and instance types and still no luck:(
I'm going to lock this issue because it has been closed for _30 days_ ⏳. This helps our maintainers find and focus on the active issues.
If you have found a problem that seems similar to this, please open a new issue and complete the issue template so we can capture all the details necessary to investigate further.