Terraform-provider-aws: Since provider version 1.34.0 `terraform plan` is really slow

Created on 10 Sep 2018  ·  28Comments  ·  Source: hashicorp/terraform-provider-aws

Community Note

  • Please vote on this issue by adding a 👍 reaction to the original issue to help the community and maintainers prioritize this request
  • Please do not leave "+1" or "me too" comments, they generate extra noise for issue followers and do not help prioritize the request
  • If you are interested in working on this issue or have submitted a pull request, please leave a comment

Terraform Version

0.11.8

Problem

Not sure if it's me or something under the hood. I was running 0.11.8/1.27.0 and upgraded provider to 1.35.0. Immediate after upgrade I ran terraform plan and it took very long time to finish. I blamed network at first, but after some time I ran it again with same result. I've downgraded plugin back to 1.27.0 and it is fast again ... Not sure if it is isolated issue or something was introduced in one of the versions following 1.27.0 :/

I run it on the same subset of resources and state plan (remote S3). Also not sure if it's network-slow or locally-slow

  • 1.27.0 time terraform plan
real    0m43.978s
user    0m6.544s
sys 0m2.043s
  • 1.35.0 time terraform plan
real    14m30.323s
user    0m8.626s
sys 0m3.522s

I'll try to upgrade version-by-version and see when it starts happening for me.

provider upstream

Most helpful comment

The upstream Go issues mentioned above seem to indicate it was a Go 1.11 regression with DNS handling so it should be machine specific based on DNS configuration. The machine's Go runtime shouldn't matter unless you are manually building the provider and running that custom built provider.

It seems Go 1.11.1 is slated to include a fix, so we'll keep this issue open to track the officially released version of the AWS provider that is built via that updated Go version.

All 28 comments

I have upgraded terraform to 0.11.8 and did provider upgrade version-by-version starting from 1.27.0. And the "slowness" problem start occurring with 1.34.0. Which according to CHANGELOG.md was build agains new/recent golang release.

I've checked on a different machine, with fresh install of both terraform (0.11.8) and provider (1.35.0, I did test also 1.27.0 and 1.34.0) and to my surprise it works as expected. Only difference is an OS: macOS El Capitan (slow one) vs macOS High Sierra (where it works as expected).

I'll try to check if there is anything I can do on the El Capitan machine to fix it. Any clues why it might be an issue on that platform?

Have you tried running Terraform with debugging enabled? Maybe the debug log will offer some clues. Please feel free to Gist it and post them here. If you are worried about any sensitive data, it can be encrypted with the HashiCorp GPG Key or redacted as necessary.

FYI, I am exactly in the same situation. Terraform 0.11.8 and provider.aws 1.35 is really slow. Any plan or apply action takes 10 times longer than before.
Just downgraded to 1.33 as suggested by @rmldsky and execution time is now as fast as before.

@bflad ah, did not though about this (duh!), will do tomorrow.

@supasteev0 are you also on macOS El Capitan?

@rmldsky No, I'm on Linux Fedora 25.

@bflad I did run terraform with DEBUG with both 1.33.0 and 1.34.0. I still try to figure out if I could post it (redacted or encrypted). On the first glance on both files it seems there are a lot more waiting fors when using 1.34.0:

$ cat debug-1.34.0.log | grep waiting | grep -ve ^2 | wc -l
      19
[...]
$ cat debug.log | grep waiting | grep -ve ^2 | wc -l
       2

Sample occurrence looks something like below and usually lasts 20sec:

2018-09-12T08:53:35.353+0200 [DEBUG] plugin.terraform-provider-aws_v1.34.0_x4: ---[ REQUEST POST-SIGN ]-----------------------------
2018-09-12T08:53:35.353+0200 [DEBUG] plugin.terraform-provider-aws_v1.34.0_x4: POST / HTTP/1.1
2018-09-12T08:53:35.353+0200 [DEBUG] plugin.terraform-provider-aws_v1.34.0_x4: Host: iam.amazonaws.com
2018-09-12T08:53:35.353+0200 [DEBUG] plugin.terraform-provider-aws_v1.34.0_x4: User-Agent: aws-sdk-go/1.15.21 (go1.11; darwin; amd64) APN/1.0 HashiCorp/1.0 Terraform/0.11.8
2018-09-12T08:53:35.353+0200 [DEBUG] plugin.terraform-provider-aws_v1.34.0_x4: Content-Length: 66
2018-09-12T08:53:35.353+0200 [DEBUG] plugin.terraform-provider-aws_v1.34.0_x4: Authorization: AWS4-HMAC-SHA256 Credential=REDACTED/20180912/us-east-1/iam/aws4_request, SignedHeaders=content-length;content-type;host;x-amz-date, Signature=56f0063b9785f3b282e07dc
f6cc8c2a54213f2629f386d70fef4ccee05b524a2
2018-09-12T08:53:35.353+0200 [DEBUG] plugin.terraform-provider-aws_v1.34.0_x4: Content-Type: application/x-www-form-urlencoded; charset=utf-8
2018-09-12T08:53:35.353+0200 [DEBUG] plugin.terraform-provider-aws_v1.34.0_x4: X-Amz-Date: 20180912T065335Z
2018-09-12T08:53:35.353+0200 [DEBUG] plugin.terraform-provider-aws_v1.34.0_x4: Accept-Encoding: gzip
2018-09-12T08:53:35.353+0200 [DEBUG] plugin.terraform-provider-aws_v1.34.0_x4:
2018-09-12T08:53:35.353+0200 [DEBUG] plugin.terraform-provider-aws_v1.34.0_x4: Action=GetUser&UserName=REDACTED&Version=2010-05-08
2018-09-12T08:53:35.353+0200 [DEBUG] plugin.terraform-provider-aws_v1.34.0_x4: -----------------------------------------------------
default_80"
", waiting for: "aws_sns_topic.REDACTED"
alarms.aws_cloudwatch_metric_alarm.high_memory_usage_alarm", waiting for: "module.REDACTED.var.instance_id"
2018-09-12T08:53:55.722+0200 [DEBUG] plugin.terraform-provider-aws_v1.34.0_x4: 2018/09/12 08:53:55 [DEBUG] [aws-sdk-go] DEBUG: Response dynamodb/DescribeTable Details:
2018-09-12T08:53:55.722+0200 [DEBUG] plugin.terraform-provider-aws_v1.34.0_x4: ---[ RESPONSE ]--------------------------------------
2018-09-12T08:53:55.722+0200 [DEBUG] plugin.terraform-provider-aws_v1.34.0_x4: HTTP/1.1 200 OK
2018-09-12T08:53:55.722+0200 [DEBUG] plugin.terraform-provider-aws_v1.34.0_x4: Connection: close
2018-09-12T08:53:55.722+0200 [DEBUG] plugin.terraform-provider-aws_v1.34.0_x4: Content-Length: 490
2018-09-12T08:53:55.723+0200 [DEBUG] plugin.terraform-provider-aws_v1.34.0_x4: Content-Type: application/x-amz-json-1.0
2018-09-12T08:53:55.723+0200 [DEBUG] plugin.terraform-provider-aws_v1.34.0_x4: Date: Wed, 12 Sep 2018 06:53:52 GMT
2018-09-12T08:53:55.723+0200 [DEBUG] plugin.terraform-provider-aws_v1.34.0_x4: Server: Server
2018-09-12T08:53:55.723+0200 [DEBUG] plugin.terraform-provider-aws_v1.34.0_x4: X-Amz-Crc32: 1392297593
2018-09-12T08:53:55.723+0200 [DEBUG] plugin.terraform-provider-aws_v1.34.0_x4: X-Amzn-Requestid: GOPTVLS03C4KAS05BVB2QP58JNVV4KQNSO5AEMVJF66Q9ASUAAJG
2018-09-12T08:53:55.723+0200 [DEBUG] plugin.terraform-provider-aws_v1.34.0_x4:
2018-09-12T08:53:55.723+0200 [DEBUG] plugin.terraform-provider-aws_v1.34.0_x4:
2018-09-12T08:53:55.723+0200 [DEBUG] plugin.terraform-provider-aws_v1.34.0_x4: -----------------------------------------------------

What is strange(?) is that those lines with waiting look like they appear in randomly, a bit like out of context, injected there between those "timestamped" lines, but maybe it's OK.

Admittedly, its a little hard to discern what is occurring with only the provided snippet of logs, but hopefully we can narrow this down! 😅

What is strange(?) is that those lines with waiting look like they appear in randomly, a bit like out of context, injected there between those "timestamped" lines, but maybe it's OK.

The interleaved output is occurring since by default Terraform runs with a parallelism of 10. Maybe tuning the -parallelism flag during plan might help narrow down the log output to what is happening?

I would also check in the debug logs to see if the AWS Go SDK is returning [aws-sdk-go] DEBUG: Retrying Request, which would indicate its retrying API requests. We do tune the max retries very high by default, which can be lowered via the max_retries provider configuration and could return error conditions faster -- although I'm a little doubtful that this is occurring in this scenario unless the cause is some sort of throttling response from an API.

@bflad just FYI not to leave you hanging, I'll get back on this issue, this week.

@rmldsky any update?

@bflad sorry, took longer then expected for me to get back on it. Please see this encrypted (keybase/hashicorp) log output:
https://gist.githubusercontent.com/rmldsky/6398a435bcb4722569751d3b4eccc16f/raw/6d5ab449698965caad6805b8c262c78624fecf6f/encrypted-debug-1.37.0.log

Let me know if you can access this file and decrypt it.

Also I did it with 1.37.0 and it's still the same case, even though I used different subset of resources than before:

1.33.0

real    0m25.638s
user    0m1.586s
sys 0m0.624s

vs
1.37.0

real    8m12.766s
user    0m2.187s
sys 0m1.195s

If you'd like me to run it with some other flags please let me know.

@rmldsky hmm something isn't quite right with that file or maybe I'm doing something wrong? GPG is only decrypting the first 7077 lines and reporting some errors:

$ wget https://gist.githubusercontent.com/rmldsky/6398a435bcb4722569751d3b4eccc16f/raw/6d5ab449698965caad6805b8c262c78624fecf6f/encrypted-debug-1.37.0.log
--2018-09-26 08:18:08--  https://gist.githubusercontent.com/rmldsky/6398a435bcb4722569751d3b4eccc16f/raw/6d5ab449698965caad6805b8c262c78624fecf6f/encrypted-debug-1.37.0.log
Resolving gist.githubusercontent.com (gist.githubusercontent.com)... 151.101.32.133
Connecting to gist.githubusercontent.com (gist.githubusercontent.com)|151.101.32.133|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 921600 (900K) [text/plain]
Saving to: ‘encrypted-debug-1.37.0.log’

encrypted-debug-1.37.0.log           100%[======================================================================>] 900.00K  3.76MB/s    in 0.2s

2018-09-26 08:18:08 (3.76 MB/s) - ‘encrypted-debug-1.37.0.log’ saved [921600/921600]

$ gpg -d -o decrypted-debug-1.37.0.log encrypted-debug-1.37.0.log
gpg: encrypted with 2048-bit RSA key, ID D5F43B6432EA77F3, created 2014-02-26
      "HashiCorp Security <[email protected]>"
gpg: no valid OpenPGP data found.
gpg: block_filter 0x00007fac2f500b90: read error (size=8384,a->size=8454)
gpg: block_filter 0x00007fac302002c0: read error (size=16286,a->size=24484)
gpg: WARNING: encrypted message has been manipulated!
gpg: block_filter: pending bytes!
gpg: block_filter: pending bytes!
$ wc -l decrypted-debug-1.37.0.log
    7076 decrypted-debug-1.37.0.log

Got it! So its interesting that there seems to be 20 seconds of delay between every request and the response, e.g.

2018-09-26T11:34:39.358+0200 [DEBUG] plugin.terraform-provider-aws_v1.37.0_x4: 2018/09/26 11:34:39 [DEBUG] [aws-sdk-go] DEBUG: Request ec2/DescribeInstanceAttribute Details:
...
2018-09-26T11:35:00.537+0200 [DEBUG] plugin.terraform-provider-aws_v1.37.0_x4: 2018/09/26 11:35:00 [DEBUG] [aws-sdk-go] DEBUG: Response ec2/DescribeInstanceAttribute Details:

Version 1.34.0 of the AWS provider was the first release that started using Go 1.11 (instead of Go 1.10) for builds, so maybe something changed in connection handling? Is there anything potentially strange about your networking environment (dual IPv4+IPv6 enabled, proxies, etc)?

@bflad no, nothing special. I also tried this on the same machine on different network and it was same story. Also not really sure what to look for in the network setup. Same network and machine on 1.33.0 and it is all fine.

I'll try to find someone in the office to test it out on different OS.

These upstream Go bug reports seem interesting:

Can you verify if your local DNS configuration has multiple name servers or if IPv6 is enabled locally?

Great catch. I had few DNS entries and left only 8.8.8.8:
1.37.0

real    0m26.757s
user    0m1.687s
sys 0m0.745s

Also I've check on a friend's machine (1.37.0, macOS Mojave BUT with go runtime 1.10.?) and it was as fast as 1.33.0

The upstream Go issues mentioned above seem to indicate it was a Go 1.11 regression with DNS handling so it should be machine specific based on DNS configuration. The machine's Go runtime shouldn't matter unless you are manually building the provider and running that custom built provider.

It seems Go 1.11.1 is slated to include a fix, so we'll keep this issue open to track the officially released version of the AWS provider that is built via that updated Go version.

Hello guys.
Just wanted to share my feedback, after @rmldsky answer about DNS entries.

I also had multiple DNS entries (thank you wifi networks ...).
I deleted them all and left only 8.8.8.8 : Terraform now behaves much quicker with latest aws provider plugin (1.38.0, how often are you guys releasing a new version ??).

Anyway, thank you for the feedback, problem solved on my side, but might be worth looking into this Go version issue indeed.

Go 1.11.1 has been released, which should include this fix. I'll switch our build processes to use the updated Go version and hopefully this will be part of the version 1.39.0 release.

The DNS issue starting in version 1.34.0 (built with Go 1.11.0) should now be resolved in version 1.39.0 (built with Go 1.11.1), released just now. 🎉

This has been released in version 1.39.0 of the provider. Please see the Terraform documentation on provider versioning or reach out if you need any assistance upgrading. As an example:

provider "aws" {
    version = "~> 1.39.0"
}
# ... other configuration ...

I confirm version 1.39.0 has solved the issue on my side. Terraform is now working flawlessly, even with multiple DNS entries in /etc/resolv.conf.

This does not seem fixed for me. With multiple DNS entries, here's what I'm seeing on a dirt simple Terraform example:

provider "aws" {
  region = "eu-west-1"
}

data "aws_region" "current" {}

data "template_file" "foo" {
  template = "foo"
}

output "foo" {
  value = "${data.template_file.foo.rendered}"
}

output "region" {
  value = "${data.aws_region.current.name}"
}

Time:

$ time terraform apply
data.template_file.foo: Refreshing state...
data.aws_region.current: Refreshing state...

Apply complete! Resources: 0 added, 0 changed, 0 destroyed.

Outputs:

foo = foo
region = eu-west-1

real    0m13.912s
user    0m0.568s
sys    0m0.165s

And the exact same code with one DNS entry:

$ time terraform apply
data.template_file.foo: Refreshing state...
data.aws_region.current: Refreshing state...

Apply complete! Resources: 0 added, 0 changed, 0 destroyed.

Outputs:

foo = foo
region = eu-west-1

real    0m3.698s
user    0m0.595s
sys 0m0.163s

Terraform version:

$ terraform --version
Terraform v0.11.8
+ provider.aws v1.43.0
+ provider.template v1.0.0

@brikis98 was it a case for you with previous (< 1.43.0) provider versions? Could you check with ie. 1.39.0?

I had this same problem. My issue was one of the entries in my DNS config wasn't resolving. Emptying it out to be solely 1.1.1.1 has fixed the issue. terraform plan went from 3 minutes and 30 seconds to 20 seconds total! Thank you, folks!

@nathanielks response fixed my probleme

Locking this issue to encourage fresh bug reports, if necessary. 👍 As noted above, please double check your DNS resolvers!

Was this page helpful?
0 / 5 - 0 ratings