Terraform-provider-helm: terraform stuck at refresh state during plan

Created on 14 May 2020 · 7Comments · Source: hashicorp/terraform-provider-helm

I have recently added istio charts and i already have promethius, autoscaler, external secreates, flux and fluent-d cloudwatch. So total ...~30 pods

Community Note

Please vote on this issue by adding a 👍 reaction to the original issue to help the community and maintainers prioritize this request
Please do not leave "+1" or other comments that do not add relevant new information or questions, they generate extra noise for issue followers and do not help prioritize the request
If you are interested in working on this issue or have submitted a pull request, please leave a comment

Terraform Version and Provider Version

provider.aws v2.61.0
provider.helm v1.1.1
provider.kubectl (unversioned)
provider.kubernetes v1.11.2
provider.null v2.1.2
provider.random v2.2.1

Provider Version

provider.helm v1.1.1
provider.kubectl (unversioned)
provider.kubernetes v1.11.2
provider.null v2.1.2
provider.random v2.2.1

Affected Resource(s)

helm_release
helm_repository
"https://kubernetes-charts.storage.googleapis.com/"
"https://charts.fluxcd.io"

Terraform Configuration Files

# Copy-paste your Terraform configurations here - for large Terraform configs,
# please use a service like Dropbox and share a link to the ZIP file. For
# security, you can also encrypt the files using our GPG public key.

Debug Output

2020/05/14 12:15:14 [TRACE] dag/walk: vertex "provider.helm (close)" is waiting for "module.eks_addons.helm_release.cluster_autoscaler"
2020/05/14 12:15:19 [TRACE] dag/walk: vertex "root" is waiting for "provider.helm (close)"
2020/05/14 12:15:19 [TRACE] dag/walk: vertex "provider.helm (close)" is waiting for "module.eks_addons.helm_release.cluster_autoscaler"
2020/05/14 12:15:24 [TRACE] dag/walk: vertex "root" is waiting for "provider.helm (close)"
2020/05/14 12:15:24 [TRACE] dag/walk: vertex "provider.helm (close)" is waiting for "module.eks_addons.helm_release.cluster_autoscaler"
2020/05/14 12:15:29 [TRACE] dag/walk: vertex "root" is waiting for "provider.helm (close)"
2020/05/14 12:15:29 [TRACE] dag/walk: vertex "provider.helm (close)" is waiting for "module.eks_addons.helm_release.cluster_autoscaler"
2020/05/14 12:15:34 [TRACE] dag/walk: vertex "root" is waiting for "provider.helm (close)"
2020/05/14 12:15:34 [TRACE] dag/walk: vertex "provider.helm (close)" is waiting for "module.eks_addons.helm_release.cluster_autoscaler"
2020/05/14 12:15:39 [TRACE] dag/walk: vertex "root" is waiting for "provider.helm (close)"
2020/05/14 12:15:39 [TRACE] dag/walk: vertex "provider.helm (close)" is waiting for "module.eks_addons.helm_release.cluster_autoscaler"
2020/05/14 12:15:44 [TRACE] dag/walk: vertex "root" is waiting for "provider.helm (close)"
2020/05/14 12:15:44 [TRACE] dag/walk: vertex "provider.helm (close)" is waiting for "module.eks_addons.helm_release.cluster_autoscaler"

Panic Output

Expected Behavior

Actual Behavior

Steps to Reproduce

terraform apply

Important Factoids

References

GH-1234

acknowledged bug needs-investigation

Source

amitsehgal

👍13

Most helpful comment

I can confirm what @krzysztof-miemiec reported here, the blocking goes away if either the klog call gets removed or if I manually disable the stderr/stdout redirection in the plugin:
https://github.com/hashicorp/go-plugin/blob/master/server.go#L461

I tried to understand how the pipes of stderr/stdout is consumed over the grpc connection but did not manage to understand it from the code so far.

pepov on 25 May 2020

👍4

All 7 comments

Recently Terraform v0.12.25 was released, which contained a fix for concurrency bug (https://github.com/hashicorp/terraform/blob/v0.12.25/CHANGELOG.md). If you happen to use that version, can you try downgrading to v0.12.24 or lower?
I experienced random crashes before (https://github.com/terraform-providers/terraform-provider-helm/issues/494) and also encountered the issue mentioned by you after an upgrade to new Terraform.

krzysztof-miemiec on 16 May 2020

... can you try downgrading to v0.12.24...

@krzysztof-miemiec I can consistently reproduce with 0.12.24 and 0.12.25.

Also, @amitsehgal is this a dup of #458 ?

eyablonowitz on 17 May 2020

Having zero experience with Go, I began to try to debug this issue. I installed GoLand, learned how to use fmt.Printf and built a dumb, yet working "test" pipeline that overrides helm provider used by my TF module & rewrites checksum in lockfile 🙈.

And I found out that it hangs in this place (sorry for no line numbers or specific stack trace):

if err := g.cfg.KubeClient.IsReachable(); err != nil {
    @ vendor/helm.sh/helm/v3/pkg/action/get.go: func (g *Get) Run(name string)
res, err := get.Run(name)
    @ helm/resource_release.go: func getRelease(cfg *action.Configuration, name string)

I noticed in this https://github.com/terraform-providers/terraform-provider-helm/issues/458#issuecomment-622148508 that there's a problem with IAM Authenticator (I also use it with my EKS setup). Will try to find out more.

Edit:

I traced it down to this specific line (os.Stderr.write() hangs inside klog):

klog.Warningf("constructing many client instances from the same exec auth config can cause performance problems during cert rotation and can exhaust available network connections; %d clients constructed calling %q", onRotateListLength, a.cmd)

when loading client for schedulingv1 (somewhere in kubernetes client initialization)
in k8s.io/client-go/plugin/pkg/client/auth/exec.(*Authenticator).UpdateTransportConfig

What's weird is that when I comment out this line, or even change that to log.printf the whole helm provider does not hang.

krzysztof-miemiec on 18 May 2020

I tried to understand how the pipes of stderr/stdout is consumed over the grpc connection but did not manage to understand it from the code so far.

pepov on 25 May 2020

👍4

This is resolved in the latest release

jrhouston on 26 Jun 2020

This is how i fixed similar issue :

Set export TF_LOG=TRACE which is the most verbose logging.
un terraform plan ....
In the log, I got the root cause of the issue and it was :

dag/walk: vertex "module.kubernetes_apps.provider.helmfile (close)" is waiting for "module.kubernetes_apps.helmfile_release_set.metrics_server"

From logs, I identify the state which is the cause of the issue: module.kubernetes_apps.helmfile_release_set.metrics_server.
I deleted its state :

terraform state rm module.kubernetes_apps.helmfile_release_set.metrics_server

Now run terraform plan again should fix the issue.

This is not the best solution, that's why I contacted the owner of this provider to fix the issue without this workaround.

abdennour on 21 Jul 2020

I'm going to lock this issue because it has been closed for _30 days_ ⏳. This helps our maintainers find and focus on the active issues.

If you feel this issue should be reopened, we encourage creating a new issue linking back to this one for added context. If you feel I made an error 🤖 🙉 , please reach out to my human friends 👉 [email protected]. Thanks!