Terraform: helm-release, tiller modular dependency, resources not deleted in order-ends up in error

Created on 13 Apr 2019 · 13Comments · Source: hashicorp/terraform

Hi There,

This may not be a bug. I saw couple related issues around module dependency, not sure anything is same as this. I need some input on workarounds or am curious how is everyone getting around this, please share your feedback or thoughts. If there is an existing (similar)issue in which case I appreciate if you could direct me to that.

I am able to create helm releases onto EKS cluster with Terraform Helm Release along with service-account and cluster-role-binding for Tiller. Terraform destroy is not successful and ends in errors.

Terraform Version

Terraform v0.11.11
Helm v2.11.0

...

Terraform Configuration Files

provider "helm" {
  service_account = "${module.tiller.service_account}"

  kubernetes {
    host                   = "--"
    cluster_ca_certificate = "--"
    token                  = "--"
  }
}

module "tiller" {
  source = "../terraformmodules/helm/tiller"
  --other configurations--
}

module "cluster-autoscaler" {
  source       = "../terraformmodules/helm-releases/cluster-autoscaler" 
  --other configurations--
}

module "metrics-server" {
  source       = "../terraformmodules/helm-releases/metrics-server"
  --other configurations--
}

#Resources created in tiller module:
resource "kubernetes_service_account" "tiller" {
  metadata {
    name      = "tiller"
    namespace = "kube-system"
  }
}

resource "kubernetes_cluster_role_binding" "tiller" {
  metadata {
    name = "tiller"
  }

  role_ref {
    kind      = "ClusterRole"
    name      = "cluster-admin"
    api_group = "rbac.authorization.k8s.io"
  }

  subject {
    kind      = "ServiceAccount"
    name      = "tiller"
    namespace = "kube-system"
    api_group = ""
  }
}

#Resources created in helm-releases module
resource "helm_release" "cluster-autoscaler" {
  name      = "cluster-autoscaler"
  chart     = "${path.module}/chart"
  namespace = "kube-system"
  --set other chart values --
}

resource "helm_release" "metrics-server" {
  name      = "metrics-server"
  chart     = "${path.module}/chart"
  namespace = "kube-system"
}

Debug Output

Crash Output

Expected Behavior

Terraform destroy is successful deleting resources in following order,

helm-release
k8s-cluster-role-binding

Actual Behavior

Error: Error applying plan:

2 error(s) occurred:

module.metrics-server.helm_release.metrics-server (destroy): 1 error(s) occurred:

helm_release.metrics-server: rpc error: code = Unknown desc = configmaps is forbidden: User "system:serviceaccount:kube-system:tiller" cannot list configmaps in the namespace "kube-system"

module.cluster-autoscaler.helm_release.cluster-autoscaler (destroy): 1 error(s) occurred:

helm_release.cluster-autoscaler: rpc error: code = Unknown desc = configmaps is forbidden: User "system:serviceaccount:kube-system:tiller" cannot list configmaps in the namespace "kube-system"

Terraform does not automatically rollback in the face of errors.
Instead, your Terraform state file has been partially updated with
any resources that successfully completed. Please address the error
above and apply again to incrementally change your infrastructure.

Steps to Reproduce

terraform apply

terraform destroy

Additional Context

References

Other comments:
Cluster-role-binding is deleted before helm-releases, or these are tried to be deleted in parallel.

Terraform does not support depends_on for modules yet, although there are ways to create dependency between modules,, one that am aware is- output of one module passed to other module via variable/value. Unfortunately Terraform helm_release resource does not have attribute/argument for cluster-role-binding which would have helped create necessary dependency.

Question:
How is everyone getting around this, especially if you using modules for tiller and helm-release. I am aware one option is to create helm-release resources with depends_on for tiller module. Are there any non-hacky workarounds to create explicit dependency between these two modules?

bug core v0.11

Source

sheelachoudhari

👍5

Most helpful comment

Here is our workaround:

provider "helm" {
  service_account = kubernetes_cluster_role_binding.tiller.metadata.0.name
  namespace       = kubernetes_service_account.tiller.metadata.0.namespace
  install_tiller  = true
  kubernetes {
    config_path = local_file.kubeconfig.filename
  }
}
resource "kubernetes_service_account" "tiller" {
  metadata {
    name      = "tiller"
    namespace = "kube-system"
  }
}

resource "kubernetes_cluster_role_binding" "tiller" {
  metadata {
    name = kubernetes_service_account.tiller.metadata.0.name
  }
  role_ref {
    api_group = "rbac.authorization.k8s.io"
    kind      = "ClusterRole"
    name      = "cluster-admin"
  }
  subject {
    kind      = "ServiceAccount"
    name      = "default"
    namespace = "kube-system"
  }
  subject {
    kind      = "ServiceAccount"
    name      = kubernetes_service_account.tiller.metadata.0.name
    namespace = kubernetes_service_account.tiller.metadata.0.namespace
  }
}

The "hack" is to give to the kubernetes_cluster_role_binding the same name as the service account itself, so that you can reference it in the provider instead of the service account.

If you do not do that, there is no reference to the role binding and it will get removed too soon.

I hope this help, and that the depends_on for provider is around the corner :)

pierresteiner on 14 Sep 2019

🎉6

All 13 comments

FWIW what you've described is essentially what we do, but we do it at the module level for the helm provider.

In other words, we have a tiller module and we define the namespace we want tiller to be deployed into. Then, in our helm provider config, we set the helm namespace to module.tiller.namespace.

jpreese on 14 Apr 2019

Thank you for your response John.

I believe I have same configurations as you mentioned. By default helm-provider tiller namespace is 'kube-system', so in my case the Service account namespace to install Tiller with and helm-provider Tiller namespace are both set to 'kube-system'.

Although, I explicitly set provider namespace with tiller namespace - tf destroy still ends with same error 'configmaps is forbidden: User "system:serviceaccount:kube-system:tiller" cannot list configmaps in the namespace "kube-system"'

provider "helm" {
  service_account = "${module.tiller.service_account}"
  namespace       = "${module.tiller.service_account_namespace}"

  kubernetes {
    host                   = "--"
    cluster_ca_certificate = "--"
    token                  = "--"
  }
}

Does this look similar to what you are doing? or am I missing something

sheelachoudhari on 16 Apr 2019

are you including install_tiller = "false" ?

provider "helm" {
  version            = "~> 0.9"
  namespace          = "${module.tiller.namespace}"

  install_tiller = "false"
}

jpreese on 18 Apr 2019

I see the same issue on GKE. It appears to me that the problem is the order of destruction is removing the role privileges that Tiller needs to destroy the chart. This is what I just saw on my cluster (ran terraform destroy --parallelism 1):

Plan: 0 to add, 0 to change, 7 to destroy.

Do you really want to destroy all resources?
  Terraform will destroy all your managed infrastructure, as shown above.
  There is no undo. Only 'yes' will be accepted to confirm.

  Enter a value: yes
yes

module.kubernetes.kubernetes_role_binding.helm_role_binding: Destroying... [id=default/tiller-binding]
module.kubernetes.kubernetes_role_binding.helm_role_binding: Destruction complete after 1s
module.kubernetes.kubernetes_cluster_role_binding.helm_cluster_role_binding: Destroying... [id=tiller-cluster-binding]
module.kubernetes.kubernetes_cluster_role_binding.helm_cluster_role_binding: Destruction complete after 0s
module.kubernetes.kubernetes_role.helm_role: Destroying... [id=default/tiller-manager]
module.kubernetes.kubernetes_role.helm_role: Destruction complete after 0s
module.kubernetes.kubernetes_cluster_role_binding.provider_identity[0]: Destroying... [id=provider-admin]
module.kubernetes.kubernetes_cluster_role_binding.provider_identity[0]: Destruction complete after 0s
module.helm.helm_release.chart: Destroying... [id=test-local]

Error: rpc error: code = Unknown desc = configmaps is forbidden: User "system:serviceaccount:default:tiller" cannot list resource "configmaps" in API group "" in the namespace "default"

It looks like the first steps in the plan removed the role ref that Tiller needed. In my case, I have an implicit dependency in the helm_release resource on the service account, but no explicit dependency on the role bindings. Would it change the destruction order to export the role bindings from the module that creates them, and import them as explicit dependencies on the helm release?

ejschoen on 13 Jun 2019

I have the same issue as @ejschoen on AKS.

ljckennedy on 16 Jun 2019

Same issue in GCP. So far my work around is to destroy the helm deployments manually first.
terraform destroy --target=module.cluster.helm_release.prometheus-operator and then do an overall destroy.

james-knott on 26 Jun 2019

same problem with aws EKS. @ejschoen did you ever find a solution?

chriscorn-takt on 31 Aug 2019

I don't think I've seen the issue recently, but will pay attention the next time I tear down a cluster.

ejschoen on 31 Aug 2019

A provider-level depends_on would solve this in a not so hacky way, I'd assume.
https://github.com/hashicorp/terraform/issues/2430

This issue is really annoying, any workarounds that don't involve manually specifying targets?

theomessin on 6 Sep 2019

👍2

Here is our workaround:

provider "helm" {
  service_account = kubernetes_cluster_role_binding.tiller.metadata.0.name
  namespace       = kubernetes_service_account.tiller.metadata.0.namespace
  install_tiller  = true
  kubernetes {
    config_path = local_file.kubeconfig.filename
  }
}
resource "kubernetes_service_account" "tiller" {
  metadata {
    name      = "tiller"
    namespace = "kube-system"
  }
}

resource "kubernetes_cluster_role_binding" "tiller" {
  metadata {
    name = kubernetes_service_account.tiller.metadata.0.name
  }
  role_ref {
    api_group = "rbac.authorization.k8s.io"
    kind      = "ClusterRole"
    name      = "cluster-admin"
  }
  subject {
    kind      = "ServiceAccount"
    name      = "default"
    namespace = "kube-system"
  }
  subject {
    kind      = "ServiceAccount"
    name      = kubernetes_service_account.tiller.metadata.0.name
    namespace = kubernetes_service_account.tiller.metadata.0.namespace
  }
}

The "hack" is to give to the kubernetes_cluster_role_binding the same name as the service account itself, so that you can reference it in the provider instead of the service account.

If you do not do that, there is no reference to the role binding and it will get removed too soon.

I hope this help, and that the depends_on for provider is around the corner :)

pierresteiner on 14 Sep 2019

🎉6

@pierresteiner thank you very much, this is exactly the fix I needed 👍 🎊

containerpope on 14 Sep 2019

I am going to close this issue due to inactivity - it looks like you have a workaround, and much has changed in terraform since this was opened.

If there is still an issue with terraform v0.13, and there isn't already an issue open that describes your problem, please open a new issue and fill out the template. Thanks!

mildwonkey on 27 Aug 2020

I'm going to lock this issue because it has been closed for _30 days_ ⏳. This helps our maintainers find and focus on the active issues.

If you have found a problem that seems similar to this, please open a new issue and complete the issue template so we can capture all the details necessary to investigate further.