Terraform-aws-eks: kubernetes provider does not respect local config, can operate on other clusters

Created on 26 Jan 2020 · 11Comments · Source: terraform-aws-modules/terraform-aws-eks

I have issues

The k8s provider does not seem to reliably work when load_config_file = false, as it would be when using this module to create a new cluster. I frequently see Unauthorized and/or attempts to call the endpoint on localhost. In one execution, I actually found that this module deleted the aws-auth config map from a cluster that was defined as the default context in my kubeconfig but was not in any way related to my terraform run.

I'm submitting a...

[x] bug report

What is the current behavior?

Without defining your cluster in .kubeconfig, the provider cannot reliably be configured. This seems to make sense based on a bunch of bugs in the terraform-provider-kubernetes project, most specifically:

https://github.com/terraform-providers/terraform-provider-kubernetes/issues/521

Others seem to confirm tons of bugs when setting load_config_file = false. This one seemed most relevant since it also pointed toward long standing issues with terraform itself using interpolated values to configure a provider:

https://github.com/hashicorp/terraform/issues/4149

In my execution where it operated on a different cluster, the deletion of the config map was successful, but it attempted to apply the config map back to an endpoint listening on localhost. In other executions, the deletion attempt ran against localhost. This suggests that there is a timing issue where terraform itself has deferred configuring the provider concurrently with the attempt to manage the config map in this module.

If this is a bug, how to reproduce? Please include a code sample if relevant.

I failed to create a cluster using the provided example, unless after cluster creation, I then added the cluster config to my local kubeconfig.

What's the expected behavior?

Regardless of any upstream issues with terraform and the k8s provider, this module should _never_ operate on a cluster it didn't define. Defining a local kubeconfig file and pointing the provider to that may be the best option.

Are you able to fix this problem and submit a PR? Link here if you have already.

I'm not sure what the best fix is here. Given the various bugs in GitHub for this, I personally feel like the only workaround here is to set manage_aws_auth = false always until the upstream provider addresses these issues.

Environment details

Affected module version: 8.0.0
OS: macOS
Terraform version: 0.12.20

Any other relevant info

Source

cdaniluk

👍6

Most helpful comment

attempts to call the endpoint on localhost

I've seen this also.

max-rocket-internet on 27 Jan 2020

👍5

All 11 comments

attempts to call the endpoint on localhost

I've seen this also.

max-rocket-internet on 27 Jan 2020

👍5

@cdaniluk

Which version of kubernetes provider are you using ?
Do you have env variables which conflict with the provider or kubernetes go-client (KUBERNETES_xxx) ?
Are you running your terraform plan and apply from another kubernetes cluster ? If yes, this workaround would probably help https://github.com/terraform-providers/terraform-provider-kubernetes/issues/679#issuecomment-552119320
If not, can you share you provider definition and debug output.

barryib on 27 Jan 2020

Which version of kubernetes provider are you using ?
v1.10.0_x4
Do you have env variables which conflict with the provider or kubernetes go-client (KUBERNETES_xxx) ?

$ set |grep KUBE
$

Are you running your terraform plan and apply from another kubernetes cluster ? If yes, this workaround would probably help terraform-providers/terraform-provider-kubernetes#679 (comment)

I think this would address one of the cases I've seen (Unauthorized). But not the calls to the localhost endpoint. I see Unauthorized on subsequent calls after the cluster is provisioned but calls to localhost when provisioning a new cluster.

If not, can you share you provider definition and debug output.

Here's my provider config.. literally using straight from the example:

provider "kubernetes" {
  host                   = data.aws_eks_cluster.cluster.endpoint
  cluster_ca_certificate = base64decode(data.aws_eks_cluster.cluster.certificate_authority.0.data)
  token                  = data.aws_eks_cluster_auth.cluster.token
  load_config_file       = false
  version                = "~> 1.10"
}

Let me know what debug output you would like to see.

Also fwiw I figured out why the provider was talking to another cluster entirely (sort of). I imported the config map from my cluster with the provider config above. It used my default kubeconfig context, which at the time was another cluster. Thus when this module went to delete the config map to recreate it (which in and of itself is scary!), it deleted it from the original context, then attempted to recreate in the undefined / localhost context.

tbh given the handful of bugs open in the k8s provider, I think the ideal fix would be to support exporting the config map to a file as in previous releases for those of us who are scared at the thought of directly managing a resource that can permanently revoke your access to the cluster. I'm running manage_aws_auth = false right now, but that means I have to hand generate the config maps for new clusters. I'd be happy to submit a PR along those lines.

cdaniluk on 28 Jan 2020

I think this would address one of the cases I've seen (Unauthorized). But not the calls to the localhost endpoint. I see Unauthorized on subsequent calls after the cluster is provisioned but calls to localhost when provisioning a new cluster.

Did you try it ?

Let me know what debug output you would like to see.

The kubernetes provider output.

barryib on 28 Jan 2020

@cdaniluk (cc @max-rocket-internet) I just posted on the thread you linked to. I was running into a similar problem: either Terraform would complain about a missing kubeconfig file or I would accidentally trigger updates on other clusters (because my KUBECONFIG environment variable was being used despite explicitly setting up a Terraform kubernetes provider).

I found out later that it was actually the helm provider that I had not explicitly set up that was causing all the problems. Because I didn't set up my helm provider with the appropriate Kubernetes settings, helm would complain that it couldn't load the default ~/.kube/config file, and when I happen to have KUBECONFIG set up, it would use that to spin up new pods.

If you are also using helm, you might want to give that a shot.

Best of luck!

bacchuswng on 31 Jan 2020

Using helm.. but not with tf and have not set up a helm provider. This chart doesn't seem to do so implicitly.

@max-rocket-internet I need to set up an environment I can safely test this in and haven't had a chance to do so yet. Will try to over the weekend.

cdaniluk on 31 Jan 2020

I have the same issue, no KUBE_ env variables.

nick4fake on 19 Feb 2020

@cdaniluk if you have the EKS cluster resource being created or updated in the same apply operation as the Kubernetes provider, things won't work as you expect. This is due to a an issue in Terraform itself.

Please see here https://www.terraform.io/docs/providers/kubernetes/index.html#stacking-with-managed-kubernetes-cluster-resources and the TF docs link in that paragraph.

alexsomesan on 29 Feb 2020

❤1

@cdaniluk if you have the EKS cluster resource being created or updated in the same apply operation as the Kubernetes provider, things won't work as you expect. This is due to a an issue in Terraform itself.

I'm aware of the limitation, which is why it's all the more confusing that this module is basically making it impossible to bootstrap a cluster, all in the name of loading a simple configmap that is easily loaded by hand. In previous versions, you could use a null provider to script injecting the config map, and it all worked just fine. Now, not only does the k8s provider behave inconsistently (as indicated in like 50 open issues in that repo, some of which are due to tf and some of which are due to the provider itself), but we can't bootstrap a new cluster. The old version of the module allowed this.

I really think adding a flag to dump the config map to filesystem instead of trying to load it via provider would be ideal. I don't think the tf issues are going to go away anytime soon, as any issues open for dynamic provider configs, interpolation, etc have been open forever and aren't on any roadmap.

cdaniluk on 2 Mar 2020

👍2

In case it's of use to anyone, I'm using the following to generate the aws-auth configmap in conjunction with manage_aws_auth = false:

worker_iam_role="$(terraform state pull | jq -r '.resources[] | select(.type=="aws_iam_role") | select(.name=="workers") | .instances[0].attributes.arn')"

cat << YAML > aws-auth-configmap.yaml
apiVersion: v1
kind: ConfigMap
metadata:
  name: aws-auth
  namespace: kube-system
data:
  mapRoles: |
    - rolearn: $worker_iam_role
      username: system:node:{{EC2PrivateDNSName}}
      groups:
        - system:bootstrappers
        - system:nodes
YAML