We are getting this error from time to time, seems a race condition when AWS is slower than usual
19:29:48 [security-groups] aws_security_group.marker: Creating...
19:29:48 [security-groups] description: "" => "Tiger security group"
19:29:48 [security-groups] egress.#: "" => "<computed>"
19:29:48 [security-groups] ingress.#: "" => "<computed>"
19:29:48 [security-groups] name: "" => "pse-integration-marker"
19:29:48 [security-groups] owner_id: "" => "<computed>"
19:29:48 [security-groups] tags.#: "" => "3"
19:29:48 [security-groups] tags.Name: "" => "pse-integration-marker"
19:29:48 [security-groups] tags.cloudbees:pse:cluster: "" => "pse-integration"
19:29:48 [security-groups] tags.tiger:cluster: "" => "pse-integration"
19:29:48 [security-groups] vpc_id: "" => "vpc-9a974bfd"
19:29:49 [security-groups] aws_security_group.marker: Creation complete
19:29:49 [security-groups] Error applying plan:
19:29:49 [security-groups]
19:29:49 [security-groups] 1 error(s) occurred:
19:29:49 [security-groups]
19:29:49 [security-groups] * Resource 'aws_security_group.marker' does not have attribute 'id' for variable 'aws_security_group.marker.id'
the terraform.tfstate file seems corrupt, with no info about the security group created
{
"version": 1,
"serial": 0,
"modules": [
{
"path": [
"root"
],
"outputs": {},
"resources": {}
}
]
}
0.6.15
resource "aws_security_group" "marker" {
name = "pse-integration-marker"
description = "Tiger security group"
tags = {
Name = "pse-integration-marker"
"tiger:cluster" = "pse-integration"
"cloudbees:pse:cluster" = "pse-integration"
}
vpc_id = "vpc-9a974bfd"
}
output "marker_security_group" {
value = "${aws_security_group.marker.id}"
}
And the terraform.tfstate and terraform.tfstate.backup that matches that log is (I have the rest of the files)
{
"version": 1,
"serial": 2,
"modules": [
{
"path": [
"root"
],
"outputs": {
"controller_security_group": "",
"elb_security_group": "",
"elbi_security_group": "",
"marker_security_group": "",
"worker_security_group": ""
},
"resources": {}
}
]
}
I think we are having the same issue. Except its in 0.7.2 so it seems unresolved still.
Confirming this is still an issue in 0.7.4 as well.
And still in 0.7.7.
Thanks for that debug output, @carlossg. Here's what I think is the most relevant subset of it:
2016/06/15 14:44:41 [DEBUG] apply: aws_security_group.elb: executing Apply
2016/06/15 14:44:41 [DEBUG] terraform-provider-aws: 2016/06/15 14:44:41 [DEBUG] Security Group create configuration: {
2016/06/15 14:44:41 [DEBUG] terraform-provider-aws: Description: "PSE security group",
2016/06/15 14:44:41 [DEBUG] terraform-provider-aws: GroupName: "pse-integration-elb",
2016/06/15 14:44:41 [DEBUG] terraform-provider-aws: VpcId: "vpc-cb07aaac"
2016/06/15 14:44:41 [DEBUG] terraform-provider-aws: }
2016/06/15 14:44:41 [DEBUG] terraform-provider-aws: 2016/06/15 14:44:41 [DEBUG] Security Group create configuration: {
2016/06/15 14:44:41 [DEBUG] terraform-provider-aws: Description: "Tiger security group",
2016/06/15 14:44:41 [DEBUG] terraform-provider-aws: GroupName: "pse-integration-marker",
2016/06/15 14:44:41 [DEBUG] terraform-provider-aws: VpcId: "vpc-cb07aaac"
2016/06/15 14:44:41 [DEBUG] terraform-provider-aws: }
2016/06/15 14:44:41 [DEBUG] terraform-provider-aws: 2016/06/15 14:44:41 [INFO] Security Group ID: sg-f2caad89
2016/06/15 14:44:41 [DEBUG] terraform-provider-aws: 2016/06/15 14:44:41 [DEBUG] Waiting for Security Group (sg-f2caad89) to exist
2016/06/15 14:44:41 [DEBUG] terraform-provider-aws: 2016/06/15 14:44:41 [DEBUG] Waiting for state to become: [exists]
2016/06/15 14:44:41 [DEBUG] terraform-provider-aws: 2016/06/15 14:44:41 [TRACE] Waiting 100ms before next try
2016/06/15 14:44:41 [DEBUG] terraform-provider-aws: 2016/06/15 14:44:41 [INFO] Security Group ID: sg-f1caad8a
2016/06/15 14:44:41 [DEBUG] terraform-provider-aws: 2016/06/15 14:44:41 [DEBUG] Waiting for Security Group (sg-f1caad8a) to exist
2016/06/15 14:44:41 [DEBUG] terraform-provider-aws: 2016/06/15 14:44:41 [DEBUG] Waiting for state to become: [exists]
2016/06/15 14:44:41 [DEBUG] terraform-provider-aws: 2016/06/15 14:44:41 [TRACE] Waiting 100ms before next try
2016/06/15 14:44:41 [DEBUG] terraform-provider-aws: 2016/06/15 14:44:41 [TRACE] Waiting 200ms before next try
2016/06/15 14:44:41 [DEBUG] terraform-provider-aws: 2016/06/15 14:44:41 [TRACE] Waiting 200ms before next try
2016/06/15 14:44:42 [DEBUG] terraform-provider-aws: 2016/06/15 14:44:42 [TRACE] Waiting 400ms before next try
2016/06/15 14:44:42 [DEBUG] terraform-provider-aws: 2016/06/15 14:44:42 [DEBUG] Revoking default egress rule for Security Group for sg-f1caad8a
2016/06/15 14:44:42 [DEBUG] terraform-provider-aws: 2016/06/15 14:44:42 [DEBUG] Revoking default egress rule for Security Group for sg-f2caad89
2016/06/15 14:44:42 [DEBUG] terraform-provider-aws: 2016/06/15 14:44:42 [DEBUG] Waiting for state to become: [success]
2016/06/15 14:44:42 [DEBUG] terraform-provider-aws: 2016/06/15 14:44:42 [TRACE] Waiting 500ms before next try
2016/06/15 14:44:43 [DEBUG] terraform-provider-aws: 2016/06/15 14:44:43 [DEBUG] Creating tags: [{
2016/06/15 14:44:43 [DEBUG] terraform-provider-aws: Key: "cloudbees:pse:cluster",
2016/06/15 14:44:43 [DEBUG] terraform-provider-aws: Value: "pse-integration"
2016/06/15 14:44:43 [DEBUG] terraform-provider-aws: } {
2016/06/15 14:44:43 [DEBUG] terraform-provider-aws: Key: "tiger:cluster",
2016/06/15 14:44:43 [DEBUG] terraform-provider-aws: Value: "pse-integration"
2016/06/15 14:44:43 [DEBUG] terraform-provider-aws: } {
2016/06/15 14:44:43 [DEBUG] terraform-provider-aws: Key: "Name",
2016/06/15 14:44:43 [DEBUG] terraform-provider-aws: Value: "pse-integration-marker"
2016/06/15 14:44:43 [DEBUG] terraform-provider-aws: }] for sg-f2caad89
2016/06/15 14:44:43 [DEBUG] terraform-provider-aws: 2016/06/15 14:44:43 [DEBUG] Found a remote Rule that wasn't empty: (map[string]interface {}{"from_port":0, "to_port":0, "protocol":"-1", "cidr_blocks":[]string{"0.0.0.0/0"}})
aws_security_group.marker: Creation complete
2016/06/15 14:44:43 [DEBUG] terraform-provider-aws: 2016/06/15 14:44:43 [DEBUG] Security Group create configuration: {
2016/06/15 14:44:43 [DEBUG] terraform-provider-aws: Description: "Tiger security group",
2016/06/15 14:44:43 [DEBUG] terraform-provider-aws: GroupName: "pse-integration-worker",
2016/06/15 14:44:43 [DEBUG] terraform-provider-aws: VpcId: "vpc-cb07aaac"
2016/06/15 14:44:43 [DEBUG] terraform-provider-aws: }
2016/06/15 14:44:43 [DEBUG] terraform-provider-aws: 2016/06/15 14:44:43 [INFO] Security Group ID: sg-ebcaad90
2016/06/15 14:44:43 [DEBUG] terraform-provider-aws: 2016/06/15 14:44:43 [DEBUG] Waiting for Security Group (sg-ebcaad90) to exist
2016/06/15 14:44:43 [DEBUG] terraform-provider-aws: 2016/06/15 14:44:43 [DEBUG] Waiting for state to become: [exists]
2016/06/15 14:44:43 [DEBUG] terraform-provider-aws: 2016/06/15 14:44:43 [TRACE] Waiting 100ms before next try
2016/06/15 14:44:44 [DEBUG] terraform-provider-aws: 2016/06/15 14:44:44 [TRACE] Waiting 200ms before next try
2016/06/15 14:44:44 [DEBUG] terraform-provider-aws: 2016/06/15 14:44:44 [TRACE] Waiting 400ms before next try
2016/06/15 14:44:44 [DEBUG] terraform-provider-aws: 2016/06/15 14:44:44 [DEBUG] Revoking default egress rule for Security Group for sg-ebcaad90
Since Terraform doesn't always include the resource id in the log output it's kinda hard to be sure which log lines belong to the processing of which security group here, but I noticed a few things that seem like plausible leads:
sg-f2caad89) got far enough to set its tags.resourceAwsSecurityGroupUpdate, which gets called at the end of resourceAwsSecurityGroupCreate to complete the creation of the secondary objects in AWS, including the tags.d.SetId("") within the suspect codepath. Setting the id to empty would cause the described symptom of a resource being dropped from the state. No specific logging is generated in that codepath, so it's plausible but not proven that we're exiting without error there.SGStateRefreshFunc were to get either a NotFound error from the AWS API or a nil return value from DescribeSecurityGroups. The latter seems unlikely, so I'm going to assume that for some reason we're getting that NotFound error.So with all of this said, eventual consistency issues on the AWS end do seem to be a likely cause here; in an earlier step we verified that the security group had indeed been created, but perhaps it takes a while before the API will consistently report its creation.
Assuming all of this is the correct explanation (which I wasn't able to verify, due to not being able to repro :frowning_face:), it feels to me like the best fix here would be for the Update function to treat a missing security group as an error rather than implicitly dropping the object from the state. This would not entirely fix the problem without also adding in some retry behavior, but it would at least stop the Update function from overstepping its bounds here (it's doing a task here that is normally reserved for the Read function) and cause Terraform to not lose track of the existing security group.
Over in #9719 I made some changes to make Terraform fail in a different way when this situation arises: rather than quietly dropping the resource from the state, it will instead halt with an error and write the partial resource to the state, at least allowing the operation to be retried in a subsequent run of Terraform.
I also added some logging for the case where we find during Read that the security group doesn't exist.
Neither of these things are going to actually address the problem described here, but they will hopefully confirm the theory that the EC2 API is giving us inconsistent results and we can then figure out the right way to be more resilient to that inconsistency.
Just want to add another data point here. I was mysteriously getting the following error consistently (i.e. _not_ an AWS eventual consistency issue):
Resource 'aws_iam_role.ecs_service_autoscaling_role' does not have attribute 'id' for variable 'aws_iam_role.ecs_service_autoscaling_role.id'
I finally discovered that the real issue was that aws_iam_role.ecs_service_autoscaling_role wasn't actually getting created. In fact, Terraform was failing with this error:
* aws_iam_role.ecs_service_autoscaling_role: "name" cannot be longer than 64 characters
But because the execution kept running, the error message I saw wasn't helpful.
I just saw this as well. One of my route tables wasn't created, so the dependent resources error-ed out. Running terraform apply again fixed it. The output during the first apply looked fine; both route tables said Creating..., with the values I'd normally expect.
I've got similar error for aws_rds_cluster on destroy with Terraform v0.11.11, e.g.
Releasing state lock. This may take a few moments...
Error: Error applying plan:
1 error(s) occurred:
* local.environment_json: local.environment_json: Resource 'aws_rds_cluster.myproject_database' does not have attribute 'database_name' for variable 'aws_rds_cluster.myproject_database.database_name'
despite my resource got database_name. In another run, it complains about master_password:
* local.environment_json: local.environment_json: Resource 'aws_rds_cluster.myproject_database' does not have attribute 'master_password' for variable 'aws_rds_cluster.myproject_database.master_password'
And somebody got similar issue at: https://docs.cloudposse.com/troubleshooting/error-applying-terraform-plan/
I got the same issue with an aws_rds_instance. The problem was that I passed in an AWS KMS alias and not a valid ARN. It seems like the provider validates and the error is not catched correctly or something similar.
Found out about the Alias vs ARN issue by running TF_LOG=debug terraform plan
Hi all,
We had a few different root causes leading to errors like this in Terraform 0.11 and earlier. Eventual consistency was one such problem, but the general concern was that in earlier versions Terraform would not perform thorough checks on the consistency of what is returned by a provider, and thus a provider behaving oddly would usually lead to a confusing downstream error with insufficient context.
Terraform 0.12 includes some fixes for known issues in this area, and it also includes improved safety checks so that provider inconsistencies can be caught earlier and reported with more context. The specific codepath that generated the errors discussed in this issue doesn't exist anymore in Terraform 0.12, so we're going to close this one out under the assumption that all of the reports here were caused by issues that we found and fixed in the Terraform 0.12 cycle.
If you are using Terraform 0.12 and are still running into weird errors that feel similar to those here (although the exact text will be different, due to the rewrite of this portion), please do open a new issue for it so we can capture some updated reproduction information against the new codepaths. Thanks for reporting this!
I'm going to lock this issue because it has been closed for _30 days_ โณ. This helps our maintainers find and focus on the active issues.
If you have found a problem that seems similar to this, please open a new issue and complete the issue template so we can capture all the details necessary to investigate further.
Most helpful comment
And still in 0.7.7.