Terraform 0.9.8
aws_db_instance sqlserver restored from a snapshot gets stuck in "Still creating..." even though it was fully created and available in RDS console after 20 minutes.
Resource:
resource "aws_db_instance" "foo" {
count = "${var.flags["enable_foo"]}"
identifier = "foo-${var.environment}-${count.index}"
allocated_storage = "4000"
allow_major_version_upgrade = "false"
apply_immediately = "true"
auto_minor_version_upgrade = "false"
availability_zone = "eu-west-1a"
backup_retention_period = "7"
backup_window = "15:15-15:45"
copy_tags_to_snapshot = "true"
db_subnet_group_name = "${aws_db_subnet_group.db.id}"
engine = "sqlserver-ee"
engine_version = "13.00.4422.0.v1"
final_snapshot_identifier = "foo-${replace(var.environment, "_", "-")}-${count.index}-snapshot-tf"
instance_class = "${var.instance_sizes["db_foo"]}"
iops = "12000"
license_model = "license-included"
maintenance_window = "sun:11:00-sun:11:30"
multi_az = "false"
option_group_name = "${replace(var.environment, "_", "-")}-sqlserver-og"
password = "supersecretpassword"
publicly_accessible = "false"
skip_final_snapshot = "false"
snapshot_identifier = "arn:aws:rds:eu-west-1:111111111111:snapshot:foo-snapshot"
storage_type = "io1"
timezone = "UTC"
username = "sa"
vpc_security_group_ids = ["${aws_security_group.foo.id}"]
tags {
Name = "foo-${replace(var.environment, "_", "-")}-${count.index}"
environment = "${var.environment}"
component = "database"
service = "foo"
}
timeouts {
create = "6h"
}
}
Output:
aws_db_instance.foo: Still creating... (10s elapsed)
aws_db_instance.foo: Still creating... (20s elapsed)
....
aws_db_instance.foo: Still creating... (1h59m53s elapsed)
aws_db_instance.foo: Still creating... (2h0m3s elapsed)
...
etc...
however the RDS restore was complete in 20minutes and the RDS marked as available in the AWS console, but terraform doesnt see it :(
Further information, the timeout error eventually happened and was thus:
Failed to save state: Failed to upload state: NoCredentialProviders: no valid providers in chain. Deprecated.
For verbose messaging see aws.Config.CredentialsChainVerboseErrors
Failed to persist state to backend.
The error shown above has prevented Terraform from writing the updated state
to the configured backend. To allow for recovery, the state has been written
to the file "errored.tfstate" in the current working directory.
Running "terraform apply" again at this point will create a forked state,
making it harder to recover.
To retry writing this state, use the following command:
terraform state push errored.tfstate
Also there was no file errored.tfstate written anyway - another bug i will raise separately
the "no errored.tfstate" problem separately
raised as https://github.com/hashicorp/terraform/issues/15688
I believe the above bug is due to terraform not refreshing the assumed role it uses to store state in an s3 backend: my terraform init command was:
terraform init --backend-config=bucket=my-terraform-bucket --backend-config=key=terraform/terraform.tfstate --backend-config=role_arn=arn:aws:iam::111111111111:role/state_storing_role
when combined with:
terraform {
backend "s3" {
region = "eu-west-1"
encrypt = "true"
}
}
And a terraform apply which takes >1h, I believe it cannot then push the state. Terraform should refresh the session token, details of how to do this are in the aws docs.
I can confirm the same behaviour is happening on terraform 0.10. I'm specifying role_arn to the s3 backend, and it fails with:
Failed to save state: Failed to upload state: NoCredentialProviders: no valid providers in chain. Deprecated.
For verbose messaging see aws.Config.CredentialsChainVerboseErrors
One approach to fix it would be implementing a "keepalive" API call to AWS (e.g. getting the md5 checksum of the tfstate in s3 every minute), which would trigger the go aws-sdk logic into refreshing the STS token, thus preventing the issue from happening.
I've also experienced this issue, especially when creating high number of resources with apply command. For example RDS database, Cloudfront and elasticache - each of them takes about 10 minutes to provision.
I'm using aws-vault for credentials management so I'm using temporary credentials that assume selected role, so I think this is pretty much the same setup as you've used.
Did somebody figure out how to solve this issue?
I think this is the same issue as #1351. One workaround is to use something like aws-vault (unrelated to hashicorp's vault) that provides tokens via the metadata API and refreshes in the background for you. (FWIW, that's what I'm telling @Latacora customers the answer is, together with "consider not having humans near terraform" :-) Keep in mind that there are some details like binding on a privileged port there; you probably still want to encapsulate that in a VM or container or whatever -- something with a separate networking namespace :))
I came across the same issue. I refreshed the token under .aws/credentials and still had this issue.
Marking this issue as stale due to inactivity. This helps our maintainers find and focus on the active issues. If this issue receives no comments in the next 30 days it will automatically be closed. Maintainers can also remove the stale label.
If this issue was automatically closed and you feel this issue should be reopened, we encourage creating a new issue linking back to this one for added context. Thank you!
I'm going to lock this issue because it has been closed for _30 days_ โณ. This helps our maintainers find and focus on the active issues.
If you feel this issue should be reopened, we encourage creating a new issue linking back to this one for added context. Thanks!
Most helpful comment
I believe the above bug is due to terraform not refreshing the assumed role it uses to store state in an s3 backend: my terraform init command was:
when combined with:
And a terraform apply which takes >1h, I believe it cannot then push the state. Terraform should refresh the session token, details of how to do this are in the aws docs.