Terraform-provider-kubernetes: v2.0.1: Resources cannot be created. Does kubectl refference to kube config properly?

Created on 22 Jan 2021 · 15Comments · Source: hashicorp/terraform-provider-kubernetes

Terraform version: v0.14.4
Kubernetes provider version: v2.0.1
Helm provider version: v1.3.2

Steps to Reproduce

I use a GitLab pipeline to deploy helm charts on my Kubernetes cluster by using the helm terraform provider.

provider "helm" {
  kubernetes {
    config_path = "~/.kube/config"
  }
}

Since version v2.0.1 of the Kubernetes provider the Helm provider is not able to access to the kube config file properly. The error message looks like:

module.helm.helm_release.nginx-ingress-internal: Creating...
Error: configmaps is forbidden: User "system:serviceaccount:gitlab-prod:default" cannot create resource "configmaps" in API group "" in the namespace "nginx-ingress"
Error: namespaces is forbidden: User "system:serviceaccount:gitlab-prod:default" cannot create resource "namespaces" in API group "" at the cluster scope
Error: namespaces is forbidden: User "system:serviceaccount:gitlab-prod:default" cannot create resource "namespaces" in API group "" at the cluster scope
Error: namespaces is forbidden: User "system:serviceaccount:gitlab-prod:default" cannot create resource "namespaces" in API group "" at the cluster scope
Error: namespaces is forbidden: User "system:serviceaccount:gitlab-prod:default" cannot create resource "namespaces" in API group "" at the cluster scope
Error: namespaces is forbidden: User "system:serviceaccount:gitlab-prod:default" cannot create resource "namespaces" in API group "" at the cluster scope

The reason why I use Helm provider v1.3.2 is described in this bug report:
https://github.com/hashicorp/terraform-provider-helm/issues/662

Temporary solution

Revert back to version v1.13.3

Community Note

Please vote on this issue by adding a 👍 reaction to the original issue to help the community and maintainers prioritize this request
If you are interested in working on this issue or have submitted a pull request, please leave a comment

bug waiting-response

Source

tantweiler

😕3

All 15 comments

@tantweiler could you share your whole config and a trace log (https://www.terraform.io/docs/internals/debugging.html)? The error message does not seem to be related to a credential error

aareet on 22 Jan 2021

Offhand, this looks related to RBAC rules in the cluster (which may have been installed by the helm chart). This command might help diagnose the permissions issues relating to the service account in the error message.

$ kubectl auth can-i create namespace --as=system:serviceaccount:gitlab-prod:default
$ kubectl auth can-i --list --as=system:serviceaccount:gitlab-prod:default

You might be able to compare that list with other users on the cluster:

kubectl auth can-i --list --namespace=default --as=system:serviceaccount:default:default

$ kubectl auth can-i create configmaps
yes

$ kubectl auth can-i create configmaps --namespace=nginx-ingress --as=system:serviceaccount:gitlab-prod:default
no

And investigate related clusterroles:

$ kube describe clusterrolebinding system:basic-user
Name:         system:basic-user
Labels:       kubernetes.io/bootstrapping=rbac-defaults
Annotations:  rbac.authorization.kubernetes.io/autoupdate: true
Role:
  Kind:  ClusterRole
  Name:  system:basic-user
Subjects:
  Kind   Name                  Namespace
  ----   ----                  ---------
  Group  system:authenticated


$ kubectl describe clusterrole system:basic-user
Name:         system:basic-user
Labels:       kubernetes.io/bootstrapping=rbac-defaults
Annotations:  rbac.authorization.kubernetes.io/autoupdate: true
PolicyRule:
  Resources                                      Non-Resource URLs  Resource Names  Verbs
  ---------                                      -----------------  --------------  -----
  selfsubjectaccessreviews.authorization.k8s.io  []                 []              [create]
  selfsubjectrulesreviews.authorization.k8s.io   []                 []              [create]

My guess is that the chart or Terraform config in question is responsible for creating the service account, and the [cluster] roles and rolebindings, but it might be doing so in the wrong order, or not idempotently (so you get different results on re-install vs the initial install). But we would need to see a configuration that reproduces this error. In my testing of version 2 of the providers on AKS, EKS, GKE, and minikube, I haven't seen this issue come up.

Feel free to browse these working examples of building specific clusters and using them with Kubernetes and Helm providers. Giving the config a skim might give you some ideas for troubleshooting further.

dak1n1 on 22 Jan 2021

I have the same error, today suddenly all the CD pipelines to my Kubernetes cluster stopped working

elpapi42 on 22 Jan 2021

same here, at my case its look like the provider failed to read the kubeconfig file and use the proper context

alon-dotan-starkware on 23 Jan 2021

@alon-dotan-starkware @tantweiler @ElPapi42 can you share some info about your environment and how your cluster is being provisioned and how your kubeconfig is generated so we can try and reproduce this?

AFAIK we didn't change anything about the way the kubeconfig gets loaded, just that you have to explicitly specify the path to the file now.

jrhouston on 23 Jan 2021

At second glance, this error looks like it is trying to use the default service account. I see this error when I run terraform inside a pod that doesn't have a service account associated with it. When I assign a serviceaccount with the correct permissions then I don't get the error anymore.

Are you running terraform inside a Kubernetes pod but intending to use a config file inside of the container instead of the serviceaccount token?

jrhouston on 24 Jan 2021

Hello everyone,

Let me explain in a bit more detail what I'm doing here. We run a GitLab instance within a GKE cluster. I created a pipeline in GitLab that deploys cloud infrastructure on different hyper-scalers (GCP and Azure). For authenticating against each hyper-scaler and to be able to install any kind of infrastructure component, we use service accounts (GKE) or service principals (Azure) with administrative rights. Let's have look on the helm part of the pipeline:

test_helmplan:
  stage: test_helmplan
  only:
    - master
  artifacts:
    paths:
      - ${HELM_PATH_TEST}/planfilehelm
      - ${HELM_PATH_TEST}/.terraform
    expire_in: 5 hrs 
  script:
    - export TF_VAR_subscription_id_test=$SUBSCRIPTION_ID_TEST
    - export TF_VAR_client_id_test=$CLIENT_ID_TEST
    - export TF_VAR_client_secret_test=$CLIENT_SECRET_TEST
    - export TF_VAR_tenant_id=$TENANT_ID
    - cd ${HELM_PATH_TEST}
    - echo ${GCP_BUCKET} > devops-tf-bucket.json
    - az login --service-principal -u ${CLIENT_ID_TEST} -p ${CLIENT_SECRET_TEST} --tenant ${TENANT_ID}
    - az aks get-credentials --resource-group ${RG_NAME_TEST} --name ${AKS_CLUSTER_NAME_TEST}
    - sed -i "s~_CI_PROJECT_PATH_~$CI_PROJECT_PATH~g" main.tf
    - terraform init
    - terraform plan -out="planfilehelm"

test_helmdeploy:
  stage: test_helmdeploy
  only:
    - master
  environment:
    name: ${AKS_CLUSTER_NAME_TEST}-environment
    on_stop: test_destroy
  script:
    - cd ${HELM_PATH_TEST}
    - echo ${GCP_BUCKET} > devops-tf-bucket.json
    - az login --service-principal -u ${CLIENT_ID_TEST} -p ${CLIENT_SECRET_TEST} --tenant ${TENANT_ID}
    - az aks get-credentials --resource-group ${RG_NAME_TEST} --name ${AKS_CLUSTER_NAME_TEST}
    - terraform init
    - terraform apply -input=false -auto-approve "planfilehelm"

This is my provider section in the main.tf file now where I point to kubernetes version 1.13.3 (and also to helm v1.3.2 which has a state issue which I mentioned in my first comment) since I don't run into this issue with that version:

provider "helm" {
  kubernetes {
    config_path = "~/.kube/config"
  }
}

provider "kubernetes" {}

terraform {
  backend "gcs" {
    bucket  = "my-awesome-bucket"
    prefix  = "_CI_PROJECT_PATH_-helm-test"
    credentials = "devops-tf-bucket.json"
  }
  required_providers {
    helm = {
      version = "= 1.3.2"
    }
    kubernetes = {
      version = "= 1.13.3"
    }
  }
}

So some guys here mentioned that there might be a an issue with not enough rights or an RBAC issue. Again, the CLIENT_ID and the CLIENT_SECRET that we use to authenticate against the Azure cloud has administrative rights! With provider version v1.13.3 everything is working fine but with v2.0.1 something has changed.

tantweiler on 24 Jan 2021

@jrhouston
here are a bit more details,
we using TF to deploy mixed resources (helm, native k8s) to local and remote k8s clusters, we using kubectx and kubes to switch the contexts,
here is example of helm and k8s deployment:

```{
"resource": {
"kubernetes_role_binding": {
"aerospike": {
"metadata": {
"name": "aerospike",
"namespace": "${var.namespace}"
},
"role_ref": {
"api_group": "rbac.authorization.k8s.io",
"kind": "Role",
"name": "aerospike"
},
"subject": [
{
"kind": "ServiceAccount",
"name": "aerospike",
"namespace": "${var.namespace}"
}
]
}
}
}
}

helm chart:

{
"resource": {
"helm_release": {
"aerospike": {
"provider": "helm",
"chart": "aerospike",
"name": "aerospike",
"namespace": "${var.namespace}",
"repository": "s3://xxxxxxxxx/helm-repo/charts",
"set": {
"name": "namespace",
"type": "string",
"value": "${var.namespace}"
},
"values": [
"${file(\"${path.module}/files/values.yaml\")}"
],
"version": "5.1.0"
}
}
}
}

providers.tf.json:

{
"provider": {
"helm": {
"version": "2.0.0",
"kubernetes": {
"config_path": "~/.kube/config"
}
}
},
}

with provider version > 1.13.0 I got the following error:

Error: Get "http://localhost/apis/rbac.authorization.k8s.io/v1/namespaces/alon/rolebindings/aerospike": dial tcp [::1]:80: connect: connection refused
```

which looks like the k8s provider cant identify the right context and cluster config from ~/.kube/config file

alon-dotan-starkware on 24 Jan 2021

@aareet I uploaded two logfiles for each Kubernetes provider version to paste.in.

Here is the output for v2.0.1 which does not work:

https://paste.in/SuWloh

And here is the output for v1.13.3 which does work:

https://paste.in/9MjaPm

tantweiler on 24 Jan 2021

@tantweiler In your example I see your provider kubernetes block is empty, but your provider helm block has a config_path set. You need to set it in both provider blocks as both providers need to know the path to the kubeconfig. Did you try that?

jrhouston on 24 Jan 2021

👍1

@jrhouston holy moly! That did the trick! I always thought that the config path only had to be defined for the helm provider which uses kubernetes but kubernetes itself uses the default which is ~/.kube/config. In my pipeline I use the kubernetes provider to install the namespaces first and then the helm releases. But the job already crashed at the point where it tried to created those namespaces. But then there was a change in v2.0.1 somehow that the provider does not look into the default kube config file anymore. v1.13.3 does this for sure.

From now on I will definitely define the config path for the kubernetes provider as well!

tantweiler on 25 Jan 2021

@jrhouston you said you didn't change "anything about the way the kubeconfig gets loaded". But the changelog says something different:

2.0.0 (January 21, 2021)

BREAKING CHANGES:

Remove default of ~/.kube/config for config_path (#1052)

Honestly I don't understand that. ~/.kube/config is the standard! So why removing a standard that everyone is actually using?

tantweiler on 25 Jan 2021

👍1

@tantweiler we discuss it in the upgrade guide - one of the reasons was that it was causing confusion for folks who manage multiple clusters with Terraform

aareet on 25 Jan 2021

@jrhouston you said you didn't change "anything about the way the kubeconfig gets loaded". But the changelog says something different

I worded this poorly, sorry for the confusion! We changed how you configure the path to the config file in the provider block (i.e you have to set it or use the KUBE_CONFIG_PATH environment variable now) but we didn't change anything about how the provider reads the file and gets contexts and so on– we continue to defer to client-go's loader for that.

Honestly I don't understand that. ~/.kube/config is the standard! So why removing a standard that everyone is actually using?

I responded to another user with this question on the helm provider with some backstory here: https://github.com/hashicorp/terraform-provider-helm/issues/647#issuecomment-748993380

We also talk about this in the Upgrade Guide here: https://registry.terraform.io/providers/hashicorp/kubernetes/latest/docs/guides/v2-upgrade-guide#changes-in-v200

And we made an issue soliciting community reactions about these changes here: https://github.com/hashicorp/terraform-provider-kubernetes/issues/909

If you feel strongly about this change please open a new issue advocating to change it back and we can discuss it!

tl;dr is that there was a set of users who would get caught out by the implicit default of ~/.kube/config and using KUBECONFIG and have their terraform config be applied to the wrong cluster when they were managing multiple environments.

I see what's happened here though, is that because you run terraform inside Kubernetes but didn't supply a path to a config file the loader has defaulted to using the in-cluster config. Perhaps this is an argument for adding an in_cluster attribute to make that explicit too.

jrhouston on 25 Jan 2021

I'm going to lock this issue because it has been closed for _30 days_ ⏳. This helps our maintainers find and focus on the active issues.

If you feel this issue should be reopened, we encourage creating a new issue linking back to this one for added context. If you feel I made an error 🤖 🙉 , please reach out to my human friends 👉 [email protected]. Thanks!