Terraform: Intermittent remote S3 state failure

Created on 16 Dec 2016  ·  38Comments  ·  Source: hashicorp/terraform

Terraform Version

0.8

Affected Resource(s)

remote state on s3

Debug Output

When running terraform plan/apply or destroy.

Error reloading remote state: RequestError: send request failed
caused by: Get https://exxxxxxx.s3.amazonaws.com/development/consul/terraform.tfstate: x509: certificate signed by unknown authority

Expected Behavior

Should get remote state.

bug core waiting-response

Most helpful comment

Same issue with Terraform v0.11.7 on Alpine. I fixed it installing the following package:
apk --update add ca-certificates

All 38 comments

I'm using 0.8.1 and I have the same problem without using S3 remote state file. I get this error running get, plan, apply and destroy but is randomly. Some examples:

$ terraform get
Get: s3::https://s3.amazonaws.com/mybucket/my-custom-module.zip (update)
Error loading Terraform: Error downloading modules: RequestError: send request failed
caused by: Get https://mybucket.s3.amazonaws.com/my-custom-module.zip: x509: certificate signed by unknown authority

and

$ terraform destroy
Do you really want to destroy?
  Terraform will delete all your managed infrastructure.
  There is no undo. Only 'yes' will be accepted to confirm.

  Enter a value: yes

Error refreshing state: 1 error(s) occurred:

* RequestError: send request failed
caused by: Post https://sts.amazonaws.com/: x509: certificate signed by unknown authority

Same here.

It only happens in the ca-central-1 region for us but, to be fair, it's the only region we've been working on in the last few days so it may just be happenstance.

All regions for me - primarily us-east-1 but also seeing in us-west-* too.

Hey James, so I ran this in a loop for the past ~60 minutes (of configure, reset state, configure) on Mac and Linux and I was never able to see an issue. It has probably configured and synced remote state about 300 times during that time (sleep 12 seconds, 5 times per minute).

I've also heard of other people getting issues recently, though, so I'm not discounting your claim. I just don't know what causes it. I still continue to doubt its any change we made since we haven't touched any of the remote state code nor HTTP client initialization code.

Any ideas?

I think it is new since 0.8. I've never seen it with 0.7.13. If it was just me I'd put it down to AWS bucket weirdness but the fact that a few people see it too makes me suspect there's a wider issue, again perhaps not TF but still an issue, here.

We changed to Go 1.7.4 which had _very_ few changes, the only one of which I can imagine affecting this being: https://github.com/golang/go/issues/18141

I'm not saying thats the issue at fault, but thats the _only_ change between 0.7.13 and current that has anything to do with TLS in our code. We probably did update the AWS SDK during that time too, so its possible the issue is in the AWS SDK.

At any rate, we're not doing any special TLS configuration for the AWS SDK or Go directly so the issue is likely in one of those two. I'd lean towards the former just because I find it unlikely that something like this is broken in Go itself.

I just ran ten minutes of terraform plan in a loop and saw it about 10% of the time. Here's a snippet of debug:

2016/12/16 13:09:59 [DEBUG] vertex "root", got dep: "module.web.plan-destroy"
2016/12/16 13:09:59 [DEBUG] vertex "root", got dep: "module.web.output.asg_name"
2016/12/16 13:09:59 [DEBUG] vertex "root", got dep: "module.web.aws_elb.web"
2016/12/16 13:09:59 [DEBUG] vertex "root", got dep: "module.web.aws_route53_record.web"
2016/12/16 13:09:59 [DEBUG] vertex "root", got dep: "var.instance_type"
2016/12/16 13:09:59 [DEBUG] vertex "root", got dep: "module.web.aws_autoscaling_group.web"
2016/12/16 13:09:59 [ERROR] Shadow graph error: 1 error(s) occurred:

* RequestError: send request failed
caused by: Post https://sts.amazonaws.com/: x509: certificate signed by unknown authority
2016/12/16 13:09:59 [DEBUG] plugin: waiting for all plugin processes to complete...
Error refreshing state: 1 error(s) occurred:

* RequestError: send request failed
caused by: Post https://sts.amazonaws.com/: x509: certificate signed by unknown authority

@jamtur01 I just compiled TF 0.8.1 with Go 1.7.3. Do you mind giving this a shot?

Since you have a reliable repro I just want to eliminate the "wtf" that Go might be causing this.

https://dl.dropboxusercontent.com/u/46819/terraform_081_go173.zip
SHA256 is 4f3a039d4ffae4a3bdc0390c14f258e44a22bc32a77894c74bb120b3f285293e

(Note for the future: I probably deleted the file since it was just in my dropbox)

Was having the same issue with certificates consistently on 0.8.1. Tried the build with go 1.7.3 linked above and was able to successfully work with remote state again.

I can confirm that the issue manifests itself in a custom compiled version of Terraform 0.7.13 compiled with go 1.7.4.

@mitchellh Tried that build with the ten minute test. No errors!

@jamtur01 Yep, okay, so it is Go 1.7.4 causing this. Bradfitz also offered up a solution that is already a CL for Go (not merged yet though). Ouch! We'll try to resolve this one way or another for 0.8.2, either dropping back to Go 1.7.3 or finding a way to have cgo-enabled builds for Darwin.

The same thing applies to Illumos builds of Terraform by the look of it - both 0.8 and 0.8.1 exhibit the issue running on SmartOS.

@mitchellh I can add a bit more confirmation.

Installed terraform 0.8.1 via brew and got the x509 issue on sts and s3. It's compiled with 1.7.4
Installed the tf binary from official source, worked fine.

0.8.2 will be released today built with Go 1.7.3. That reverts the "security fixes" made in Go 1.7.4 unfortunately but hopefully 0.8.3 will be built with Go 1.8 which will bring all this back with a longer term fix from the Go team.

@mitchellh unfortunately it still happens on 0.8.2. I just executed terraform plan in a loop and 5 out of 30 attempts ended with either

Error refreshing state: 1 error(s) occurred:

* RequestError: send request failed
caused by: Post https://sts.amazonaws.com/: x509: certificate signed by unknown authority

or

Error reloading remote state: RequestError: send request failed
caused by: Get https://xxxxx-yyyyyy.s3-eu-west-1.amazonaws.com/commons/terraform.tfstate: x509: certificate signed by unknown authority

I had exactly the same issues on 0.8.1, but never seen that on 0.7.x

$ terraform -v
Terraform v0.8.2

@mitchellh can confirm that it's still a problem with 0.8.2
Also, 0.8.2 was released/tagged to the releases page on GH but isn't avail as a binary on the downloads page of terraform.io . Is that to confirm my suspicion that it's still a known problem?

Hi @myoung34! Could you try force refreshing the download page? I see the download for 0.8.2 there.

Weird. It's there, never thought i'd fail to the cache.

Compiled master Terraform v0.8.3-dev (e2f2f9c78e9784eb125beb64c1fe938f9d14183c) against Go 1.7.3 manually and all is good for now

Same here:
```data.terraform_remote_state.vpc: Refreshing state...
Error refreshing state: 1 error(s) occurred:

  • RequestError: send request failed
    caused by: Post https://sts.amazonaws.com/: x509: certificate signed by unknown authority
    [terragrunt] 2017/01/06 14:08:38 Attempting to release lock for state file dev-rancher-server-db in DynamoDB
    [terragrunt] 2017/01/06 14:08:39 Lock released!
    [terragrunt] 2017/01/06 14:08:39 exit status 1
    exit status 1
    rancher-server-db ❯ terraform -v
    Terraform v0.8.2
    rancher-server-db ❯ sw_vers
    ProductName: Mac OS X
    ProductVersion: 10.12.2
    BuildVersion: 16C67
    ```

I have been seeing this in 0.8.1, I have not seen this in 7.11 as I Run both versions for different environments... Just saying, seems to be an issue with terraforms latest releases.

Same issue with Terraform 0.9.5 and go 1.8. Any one find a reproducible solution?

➜ terraform --version
Terraform v0.9.5
➜ go version
go version go1.8 darwin/amd64
➜ terraform plan
Failed to load backend:
Error configuring the backend "s3": RequestError: send request failed
caused by: Post https://sts.amazonaws.com/: x509: certificate signed by unknown authority

Please update the configuration in your Terraform files to fix this error.
If you'd like to update the configuration interactively without storing
the values in your configuration, run "terraform init".

In my case it was an issue with my SSL certs that curl was using. I fixed it by setting CURL_CA_BUNDLE to a copy of this file, locally.

I'm seeing the same issue with Terraform 0.10.8. Every now and then (1 out of 20 or 30 times, perhaps?) I get a net/http: TLS handshake timeout error when talking to S3. Given that other people are seeing it, this bug should probably be re-opened, as I doubt S3 is that flaky :)

Also intermittently experiencing this issue using Terraform 0.10.4.

I'm seeing it intermittently on 0.11.1/OSX

Same issue with 0.11.1

Same issue with go version go1.8.3 darwin/amd64 and terraform Terraform v0.11.2

As a temporary bandaid you can add skip_credentials_validation = true to your backend configuration block.

@hgallo0 Can you elaborate on your AWS credentials setup? Are you using access/secret keys, using a profile, assumed role, STS with MFA?

Hi @denniswebb thanks for your quick reply. I am using access/secret key currently stored in my ~/.aws/credentials. no STS or MFA

I'm seeing this exact issue as well. I'm setting profile in my backend config. The profile is in my ~/.aws/credentials file and the credentials work. Setting skip_credentials_validation = true fixed the issue. My version info is below.

❯ terraform --version
Terraform v0.11.3

just started happening for me too.

osx 10.12.6.
s3 us-west-2 backend
terraform v0.11.3 (installed via brew)

the only recent local updates i can think of was installing a specific version of golang to use some new kubernetes incubator packages (external-dns). the terraform issue it is intermittent and i can't seem to figure out why. i did notice that if i switch networks it seems to clear up if only temporarily. like get on a vpn and try from there, or hop back off the vpn and try again. no idea if that's just a coincidence or not. maybe something to do with golang and stale dns/cache something something i'm grasping for answers.

Same here

$ terraform init .
Initializing modules...
Initializing the backend...

Error configuring the backend "s3": RequestError: send request failed
caused by: Post https://sts.amazonaws.com/: dial tcp: i/o timeout

Please update the configuration in your Terraform files to fix this error
then run this command again.

Terraform v0.11.7
OS X.

Has there a fix for this as I am also seeing this with 0.11.7?

Same issue with Terraform v0.11.7 on Alpine. I fixed it installing the following package:
apk --update add ca-certificates

I'm seeing these issues quite often on OS X 0.11.7 too. Should this issue be reopened?

@brikis98 I think this is the same issue being discussed here: https://github.com/terraform-providers/terraform-provider-aws/issues/4709. If so, add your comment / upvote to that issue since it's still open. I believe this needs to be solved in the provider, not in terraform core.

I'm going to lock this issue because it has been closed for _30 days_ ⏳. This helps our maintainers find and focus on the active issues.

If you have found a problem that seems similar to this, please open a new issue and complete the issue template so we can capture all the details necessary to investigate further.

Was this page helpful?
0 / 5 - 0 ratings