❯ terraform version
Terraform v0.12.24
+ provider.azurerm v2.5.0
+ provider.null v2.1.2
+ provider.random v2.2.1
+ provider.tls v2.1.1
I am seeing this behavior across a few charts, but its a bit random.
# Copy-paste your Terraform configurations here - for large Terraform configs,
# please use a service like Dropbox and share a link to the ZIP file. For
# security, you can also encrypt the files using our GPG public key.
resource in question;
resource "helm_release" "mongodb-sharded" {
name = "mongodb-sharded"
chart = "mongodb-sharded"
repository = "https://charts.bitnami.com/bitnami"
timeout = 600
}
https://gist.github.com/lukekhamilton/8e52b1e403a89557062796a4d25af24d
When I run the terraform apply I expect it to install the helm chart.
When I run the terraform apply this helm chart and others arent installing and getting stuck. However when I delete the release and then install manually it works without issue.
❯ helm ls -a
NAME NAMESPACE REVISION UPDATED STATUS CHART APP VERSION
mongodb-sharded default 1 2020-04-16 02:46:09.166067202 +0000 UTC pending-install mongodb-sharded-1.1.2 4.2.5
terraform applyI am running this on a very standard AKS clusters cluster.
Further more. A deploying I have running right now is showing me this:
❯ helm ls -a
NAME NAMESPACE REVISION UPDATED STATUS CHART APP VERSION
mongodb-sharded default 1 2020-04-16 03:18:01.305438381 +0000 UTC pending-install mongodb-sharded-1.1.2 4.2.5
And also this:
❯ kubectl get pods
NAME READY STATUS RESTARTS AGE
mongodb-sharded-configsvr-0 1/1 Running 0 3m38s
mongodb-sharded-mongos-6c8fb46c44-nml8b 0/1 Running 1 3m38s
mongodb-sharded-shard0-data-0 0/1 Running 0 3m38s
mongodb-sharded-shard1-data-0 0/1 Running 0 3m38s
Experiencing this with the prometheus operator (as does this issue potentially https://github.com/helm/charts/issues/21913).
Same symptoms as above. Everything in kubectl is running, helm chart stuck on 'pending-install'. As soon as terraform times out, helm chart goes to 'failed'. Installing with helm manually works no problem.
One thing that changed the behavior for me was to up the size of the VM for the node pools then it worked without issue. However, for the life of me, I can't find any outputted loges anywhere to help debug what is actually happening...
I'm trying to reproduce this one, so far this is working for me:
resource "helm_release" "example" {
name = "example"
repository = "https://kubernetes-charts.storage.googleapis.com"
chart = "prometheus-operator"
}
Is there more info you can share about your environments @lukekhamilton @arlyon ? A full tf config that reproduces this issue would be super helpful for me. We have test accounts on all the major cloud providers and even a bare metal cluster we can run on.
@jrhouston Given enough tries, it goes through fine, but it is quite inconsistent. I have in response to this problem split my terraform configs into two separate modules with isolated states (one for monitoring / plumbing and one for 'apps') for now.
You can find an example here: https://github.com/arlyon/infra-code (pre split). Note that this has expired cloudflare keys populated in some of the configs, but I don't think it'll cause problems.
I'm experiencing a similar issue, but with Bitnami's nginx, as follows:
# Using helm provider ~> 1.1.1
data "helm_repository" "bitnami" {
name = "bitnami"
url = "https://charts.bitnami.com/bitnami"
}
resource "helm_release" "nginx" {
name = "my-nginx"
repository = data.helm_repository.bitnami.metadata[0].name
chart = "nginx"
version = "5.2.3"
namespace = "default"
timeout = 50
}
It keeps printing lines such as: helm_release.nginx: Still creating... [50s elapsed] until hitting the timeout.
Once it hits the timeout, is when the deployment entry appears on helm:
$ helm ls --namespace my-namespace
NAME NAMESPACE REVISION UPDATED STATUS CHART APP VERSION
my-nginx default 1 2020-04-22 16:16:25.7100555 +0100 BST failed nginx-5.2.3 1.17.10
marked as failed.
However, the pods are up, ready and running, and the services have been created and work.
I'm sharing it here because I believe it may be related (seems to relate with the perception of pod readiness) and it's also a relatively quick scenario to spin up/down as necessary.
Also, a workaround for this use case is to add wait = false to the helm_release.
We're going to work on this with low priority as we collect more data since the issue is hard to reproduce on demand.
I have a similar problem on GKE
My helm_release has a high timeout of 3000, and sometimes its fails with message:
Kubernetes Cluster Unreachable
BUT most of the times that helm_release would be deployed in the backend.
I've had the issue with kubedb, both on AKS as on a bare-metal cluster. They have this as an open issue: https://github.com/kubedb/project/issues/504
I'm also experiencing this, deploying the Gloo helm chart via terraform. In my case, I have a tf-modules repo that I import.. Gloo deploys just fine, but terraform keeps waiting for it to finish and eventually times out, though the deployment ended minutes ago...
I ran into this issue and I solved this by deleting jobs that can't be "recreated" like Kubernetes Jobs which were created as part of a helm deployment. Then the Terraform Helm module worked as expected for my use case.
Most helpful comment
Experiencing this with the prometheus operator (as does this issue potentially https://github.com/helm/charts/issues/21913).
Same symptoms as above. Everything in kubectl is running, helm chart stuck on 'pending-install'. As soon as terraform times out, helm chart goes to 'failed'. Installing with helm manually works no problem.