Terraform-provider-helm: terraform stuck at refresh state during plan

Created on 14 May 2020  ยท  7Comments  ยท  Source: hashicorp/terraform-provider-helm

I have recently added istio charts and i already have promethius, autoscaler, external secreates, flux and fluent-d cloudwatch. So total ...~30 pods

Community Note

  • Please vote on this issue by adding a ๐Ÿ‘ reaction to the original issue to help the community and maintainers prioritize this request
  • Please do not leave "+1" or other comments that do not add relevant new information or questions, they generate extra noise for issue followers and do not help prioritize the request
  • If you are interested in working on this issue or have submitted a pull request, please leave a comment

Terraform Version and Provider Version

  • provider.aws v2.61.0
  • provider.helm v1.1.1
  • provider.kubectl (unversioned)
  • provider.kubernetes v1.11.2
  • provider.null v2.1.2
  • provider.random v2.2.1

Provider Version

  • provider.helm v1.1.1
  • provider.kubectl (unversioned)
  • provider.kubernetes v1.11.2
  • provider.null v2.1.2
  • provider.random v2.2.1

    Affected Resource(s)

Terraform Configuration Files

# Copy-paste your Terraform configurations here - for large Terraform configs,
# please use a service like Dropbox and share a link to the ZIP file. For
# security, you can also encrypt the files using our GPG public key.

Debug Output


2020/05/14 12:15:14 [TRACE] dag/walk: vertex "provider.helm (close)" is waiting for "module.eks_addons.helm_release.cluster_autoscaler"
2020/05/14 12:15:19 [TRACE] dag/walk: vertex "root" is waiting for "provider.helm (close)"
2020/05/14 12:15:19 [TRACE] dag/walk: vertex "provider.helm (close)" is waiting for "module.eks_addons.helm_release.cluster_autoscaler"
2020/05/14 12:15:24 [TRACE] dag/walk: vertex "root" is waiting for "provider.helm (close)"
2020/05/14 12:15:24 [TRACE] dag/walk: vertex "provider.helm (close)" is waiting for "module.eks_addons.helm_release.cluster_autoscaler"
2020/05/14 12:15:29 [TRACE] dag/walk: vertex "root" is waiting for "provider.helm (close)"
2020/05/14 12:15:29 [TRACE] dag/walk: vertex "provider.helm (close)" is waiting for "module.eks_addons.helm_release.cluster_autoscaler"
2020/05/14 12:15:34 [TRACE] dag/walk: vertex "root" is waiting for "provider.helm (close)"
2020/05/14 12:15:34 [TRACE] dag/walk: vertex "provider.helm (close)" is waiting for "module.eks_addons.helm_release.cluster_autoscaler"
2020/05/14 12:15:39 [TRACE] dag/walk: vertex "root" is waiting for "provider.helm (close)"
2020/05/14 12:15:39 [TRACE] dag/walk: vertex "provider.helm (close)" is waiting for "module.eks_addons.helm_release.cluster_autoscaler"
2020/05/14 12:15:44 [TRACE] dag/walk: vertex "root" is waiting for "provider.helm (close)"
2020/05/14 12:15:44 [TRACE] dag/walk: vertex "provider.helm (close)" is waiting for "module.eks_addons.helm_release.cluster_autoscaler"

Panic Output

Expected Behavior

Actual Behavior

Steps to Reproduce

  1. terraform apply

Important Factoids

References

  • GH-1234
acknowledged bug needs-investigation

Most helpful comment

I can confirm what @krzysztof-miemiec reported here, the blocking goes away if either the klog call gets removed or if I manually disable the stderr/stdout redirection in the plugin:
https://github.com/hashicorp/go-plugin/blob/master/server.go#L461

I tried to understand how the pipes of stderr/stdout is consumed over the grpc connection but did not manage to understand it from the code so far.

All 7 comments

Recently Terraform v0.12.25 was released, which contained a fix for concurrency bug (https://github.com/hashicorp/terraform/blob/v0.12.25/CHANGELOG.md). If you happen to use that version, can you try downgrading to v0.12.24 or lower?
I experienced random crashes before (https://github.com/terraform-providers/terraform-provider-helm/issues/494) and also encountered the issue mentioned by you after an upgrade to new Terraform.

... can you try downgrading to v0.12.24...

@krzysztof-miemiec I can consistently reproduce with 0.12.24 and 0.12.25.

Also, @amitsehgal is this a dup of #458 ?

Having zero experience with Go, I began to try to debug this issue. I installed GoLand, learned how to use fmt.Printf and built a dumb, yet working "test" pipeline that overrides helm provider used by my TF module & rewrites checksum in lockfile ๐Ÿ™ˆ.

And I found out that it hangs in this place (sorry for no line numbers or specific stack trace):

if err := g.cfg.KubeClient.IsReachable(); err != nil {
    @ vendor/helm.sh/helm/v3/pkg/action/get.go: func (g *Get) Run(name string)
res, err := get.Run(name)
    @ helm/resource_release.go: func getRelease(cfg *action.Configuration, name string)

I noticed in this https://github.com/terraform-providers/terraform-provider-helm/issues/458#issuecomment-622148508 that there's a problem with IAM Authenticator (I also use it with my EKS setup). Will try to find out more.

Edit:

I traced it down to this specific line (os.Stderr.write() hangs inside klog):

klog.Warningf("constructing many client instances from the same exec auth config can cause performance problems during cert rotation and can exhaust available network connections; %d clients constructed calling %q", onRotateListLength, a.cmd)

when loading client for schedulingv1 (somewhere in kubernetes client initialization)
in k8s.io/client-go/plugin/pkg/client/auth/exec.(*Authenticator).UpdateTransportConfig

What's weird is that when I comment out this line, or even change that to log.printf the whole helm provider does not hang.

I can confirm what @krzysztof-miemiec reported here, the blocking goes away if either the klog call gets removed or if I manually disable the stderr/stdout redirection in the plugin:
https://github.com/hashicorp/go-plugin/blob/master/server.go#L461

I tried to understand how the pipes of stderr/stdout is consumed over the grpc connection but did not manage to understand it from the code so far.

This is resolved in the latest release

This is how i fixed similar issue :

  • Set export TF_LOG=TRACE which is the most verbose logging.
  • un terraform plan ....
  • In the log, I got the root cause of the issue and it was :
dag/walk: vertex "module.kubernetes_apps.provider.helmfile (close)" is waiting for "module.kubernetes_apps.helmfile_release_set.metrics_server"
  • From logs, I identify the state which is the cause of the issue: module.kubernetes_apps.helmfile_release_set.metrics_server.

  • I deleted its state :

terraform state rm module.kubernetes_apps.helmfile_release_set.metrics_server
  • Now run terraform plan again should fix the issue.

This is not the best solution, that's why I contacted the owner of this provider to fix the issue without this workaround.

I'm going to lock this issue because it has been closed for _30 days_ โณ. This helps our maintainers find and focus on the active issues.

If you feel this issue should be reopened, we encourage creating a new issue linking back to this one for added context. If you feel I made an error ๐Ÿค– ๐Ÿ™‰ , please reach out to my human friends ๐Ÿ‘‰ [email protected]. Thanks!

Was this page helpful?
0 / 5 - 0 ratings