Terraform: helm-release, tiller modular dependency, resources not deleted in order-ends up in error

Created on 13 Apr 2019  ·  13Comments  ·  Source: hashicorp/terraform

Hi There,

This may not be a bug. I saw couple related issues around module dependency, not sure anything is same as this. I need some input on workarounds or am curious how is everyone getting around this, please share your feedback or thoughts. If there is an existing (similar)issue in which case I appreciate if you could direct me to that.

I am able to create helm releases onto EKS cluster with Terraform Helm Release along with service-account and cluster-role-binding for Tiller. Terraform destroy is not successful and ends in errors.

Terraform Version

Terraform v0.11.11
Helm v2.11.0

...

Terraform Configuration Files

provider "helm" {
  service_account = "${module.tiller.service_account}"

  kubernetes {
    host                   = "--"
    cluster_ca_certificate = "--"
    token                  = "--"
  }
}

module "tiller" {
  source = "../terraformmodules/helm/tiller"
  --other configurations--
}

module "cluster-autoscaler" {
  source       = "../terraformmodules/helm-releases/cluster-autoscaler" 
  --other configurations--
}

module "metrics-server" {
  source       = "../terraformmodules/helm-releases/metrics-server"
  --other configurations--
}

#Resources created in tiller module:
resource "kubernetes_service_account" "tiller" {
  metadata {
    name      = "tiller"
    namespace = "kube-system"
  }
}

resource "kubernetes_cluster_role_binding" "tiller" {
  metadata {
    name = "tiller"
  }

  role_ref {
    kind      = "ClusterRole"
    name      = "cluster-admin"
    api_group = "rbac.authorization.k8s.io"
  }

  subject {
    kind      = "ServiceAccount"
    name      = "tiller"
    namespace = "kube-system"
    api_group = ""
  }
}

#Resources created in helm-releases module
resource "helm_release" "cluster-autoscaler" {
  name      = "cluster-autoscaler"
  chart     = "${path.module}/chart"
  namespace = "kube-system"
  --set other chart values --
}

resource "helm_release" "metrics-server" {
  name      = "metrics-server"
  chart     = "${path.module}/chart"
  namespace = "kube-system"
}

Debug Output

Crash Output

Expected Behavior


Terraform destroy is successful deleting resources in following order,

helm-release
k8s-cluster-role-binding

Actual Behavior


Error: Error applying plan:

2 error(s) occurred:

module.metrics-server.helm_release.metrics-server (destroy): 1 error(s) occurred:

helm_release.metrics-server: rpc error: code = Unknown desc = configmaps is forbidden: User "system:serviceaccount:kube-system:tiller" cannot list configmaps in the namespace "kube-system"

module.cluster-autoscaler.helm_release.cluster-autoscaler (destroy): 1 error(s) occurred:

helm_release.cluster-autoscaler: rpc error: code = Unknown desc = configmaps is forbidden: User "system:serviceaccount:kube-system:tiller" cannot list configmaps in the namespace "kube-system"

Terraform does not automatically rollback in the face of errors.
Instead, your Terraform state file has been partially updated with
any resources that successfully completed. Please address the error
above and apply again to incrementally change your infrastructure.

Steps to Reproduce

  • terraform apply
  • terraform destroy
  • Additional Context

    References

    Other comments:
    Cluster-role-binding is deleted before helm-releases, or these are tried to be deleted in parallel.

    Terraform does not support depends_on for modules yet, although there are ways to create dependency between modules,, one that am aware is- output of one module passed to other module via variable/value. Unfortunately Terraform helm_release resource does not have attribute/argument for cluster-role-binding which would have helped create necessary dependency.

    Question:
    How is everyone getting around this, especially if you using modules for tiller and helm-release. I am aware one option is to create helm-release resources with depends_on for tiller module. Are there any non-hacky workarounds to create explicit dependency between these two modules?

    bug core v0.11

    Most helpful comment

    Here is our workaround:

    provider "helm" {
      service_account = kubernetes_cluster_role_binding.tiller.metadata.0.name
      namespace       = kubernetes_service_account.tiller.metadata.0.namespace
      install_tiller  = true
      kubernetes {
        config_path = local_file.kubeconfig.filename
      }
    }
    resource "kubernetes_service_account" "tiller" {
      metadata {
        name      = "tiller"
        namespace = "kube-system"
      }
    }
    
    resource "kubernetes_cluster_role_binding" "tiller" {
      metadata {
        name = kubernetes_service_account.tiller.metadata.0.name
      }
      role_ref {
        api_group = "rbac.authorization.k8s.io"
        kind      = "ClusterRole"
        name      = "cluster-admin"
      }
      subject {
        kind      = "ServiceAccount"
        name      = "default"
        namespace = "kube-system"
      }
      subject {
        kind      = "ServiceAccount"
        name      = kubernetes_service_account.tiller.metadata.0.name
        namespace = kubernetes_service_account.tiller.metadata.0.namespace
      }
    }
    

    The "hack" is to give to the kubernetes_cluster_role_binding the same name as the service account itself, so that you can reference it in the provider instead of the service account.

    If you do not do that, there is no reference to the role binding and it will get removed too soon.

    I hope this help, and that the depends_on for provider is around the corner :)

    All 13 comments

    FWIW what you've described is essentially what we do, but we do it at the module level for the helm provider.

    In other words, we have a tiller module and we define the namespace we want tiller to be deployed into. Then, in our helm provider config, we set the helm namespace to module.tiller.namespace.

    Thank you for your response John.

    I believe I have same configurations as you mentioned. By default helm-provider tiller namespace is 'kube-system', so in my case the Service account namespace to install Tiller with and helm-provider Tiller namespace are both set to 'kube-system'.

    Although, I explicitly set provider namespace with tiller namespace - tf destroy still ends with same error 'configmaps is forbidden: User "system:serviceaccount:kube-system:tiller" cannot list configmaps in the namespace "kube-system"'

    provider "helm" {
      service_account = "${module.tiller.service_account}"
      namespace       = "${module.tiller.service_account_namespace}"
    
      kubernetes {
        host                   = "--"
        cluster_ca_certificate = "--"
        token                  = "--"
      }
    }
    

    Does this look similar to what you are doing? or am I missing something

    are you including install_tiller = "false" ?

    provider "helm" {
      version            = "~> 0.9"
      namespace          = "${module.tiller.namespace}"
    
      install_tiller = "false"
    }
    

    I see the same issue on GKE. It appears to me that the problem is the order of destruction is removing the role privileges that Tiller needs to destroy the chart. This is what I just saw on my cluster (ran terraform destroy --parallelism 1):

    Plan: 0 to add, 0 to change, 7 to destroy.
    
    Do you really want to destroy all resources?
      Terraform will destroy all your managed infrastructure, as shown above.
      There is no undo. Only 'yes' will be accepted to confirm.
    
      Enter a value: yes
    yes
    
    module.kubernetes.kubernetes_role_binding.helm_role_binding: Destroying... [id=default/tiller-binding]
    module.kubernetes.kubernetes_role_binding.helm_role_binding: Destruction complete after 1s
    module.kubernetes.kubernetes_cluster_role_binding.helm_cluster_role_binding: Destroying... [id=tiller-cluster-binding]
    module.kubernetes.kubernetes_cluster_role_binding.helm_cluster_role_binding: Destruction complete after 0s
    module.kubernetes.kubernetes_role.helm_role: Destroying... [id=default/tiller-manager]
    module.kubernetes.kubernetes_role.helm_role: Destruction complete after 0s
    module.kubernetes.kubernetes_cluster_role_binding.provider_identity[0]: Destroying... [id=provider-admin]
    module.kubernetes.kubernetes_cluster_role_binding.provider_identity[0]: Destruction complete after 0s
    module.helm.helm_release.chart: Destroying... [id=test-local]
    
    Error: rpc error: code = Unknown desc = configmaps is forbidden: User "system:serviceaccount:default:tiller" cannot list resource "configmaps" in API group "" in the namespace "default"
    

    It looks like the first steps in the plan removed the role ref that Tiller needed. In my case, I have an implicit dependency in the helm_release resource on the service account, but no explicit dependency on the role bindings. Would it change the destruction order to export the role bindings from the module that creates them, and import them as explicit dependencies on the helm release?

    I have the same issue as @ejschoen on AKS.

    Same issue in GCP. So far my work around is to destroy the helm deployments manually first.
    terraform destroy --target=module.cluster.helm_release.prometheus-operator and then do an overall destroy.

    same problem with aws EKS. @ejschoen did you ever find a solution?

    I don't think I've seen the issue recently, but will pay attention the next time I tear down a cluster.

    A provider-level depends_on would solve this in a not so hacky way, I'd assume.
    https://github.com/hashicorp/terraform/issues/2430

    This issue is really annoying, any workarounds that don't involve manually specifying targets?

    Here is our workaround:

    provider "helm" {
      service_account = kubernetes_cluster_role_binding.tiller.metadata.0.name
      namespace       = kubernetes_service_account.tiller.metadata.0.namespace
      install_tiller  = true
      kubernetes {
        config_path = local_file.kubeconfig.filename
      }
    }
    resource "kubernetes_service_account" "tiller" {
      metadata {
        name      = "tiller"
        namespace = "kube-system"
      }
    }
    
    resource "kubernetes_cluster_role_binding" "tiller" {
      metadata {
        name = kubernetes_service_account.tiller.metadata.0.name
      }
      role_ref {
        api_group = "rbac.authorization.k8s.io"
        kind      = "ClusterRole"
        name      = "cluster-admin"
      }
      subject {
        kind      = "ServiceAccount"
        name      = "default"
        namespace = "kube-system"
      }
      subject {
        kind      = "ServiceAccount"
        name      = kubernetes_service_account.tiller.metadata.0.name
        namespace = kubernetes_service_account.tiller.metadata.0.namespace
      }
    }
    

    The "hack" is to give to the kubernetes_cluster_role_binding the same name as the service account itself, so that you can reference it in the provider instead of the service account.

    If you do not do that, there is no reference to the role binding and it will get removed too soon.

    I hope this help, and that the depends_on for provider is around the corner :)

    @pierresteiner thank you very much, this is exactly the fix I needed 👍 🎊

    I am going to close this issue due to inactivity - it looks like you have a workaround, and much has changed in terraform since this was opened.

    If there is still an issue with terraform v0.13, and there isn't already an issue open that describes your problem, please open a new issue and fill out the template. Thanks!

    I'm going to lock this issue because it has been closed for _30 days_ ⏳. This helps our maintainers find and focus on the active issues.

    If you have found a problem that seems similar to this, please open a new issue and complete the issue template so we can capture all the details necessary to investigate further.

    Was this page helpful?
    0 / 5 - 0 ratings