Terraform version: 0.12.24
Kubernetes provider version: 2.0.1
Kubernetes version: v1.16.15-eks-ad4801
data "aws_eks_cluster" "c" {
name = var.k8s_name
}
data "aws_eks_cluster_auth" "c" {
name = var.k8s_name
}
provider "kubernetes" {
host = data.aws_eks_cluster.c.endpoint
cluster_ca_certificate = base64decode(data.aws_eks_cluster.c.certificate_authority.0.data)
token = data.aws_eks_cluster_auth.c.token
}
What should have happened?
Resources should have been created/modified/deleted.1
What actually happened?
Error: the server has asked for the client to provide credentials
Error: Failed to update daemonset: Unauthorized
Error: Failed to update deployment: Unauthorized
Error: Failed to update deployment: Unauthorized
Error: Failed to update service account: Unauthorized
Error: Failed to update service account: Unauthorized
Error: Failed to delete Job! API error: Unauthorized
Error: Failed to update service account: Unauthorized
Error: the server has asked for the client to provide credentials
Error: the server has asked for the client to provide credentials
Error: Failed to update deployment: Unauthorized
Error: Failed to update service account: Unauthorized
Error: the server has asked for the client to provide credentials
Error: Failed to delete Job! API error: Unauthorized
Error: Failed to update daemonset: Unauthorized
No, we're just using EKS.
Hi, same problem here with Terraform v0.14.5, but different error message:
Error: Kubernetes cluster unreachable: invalid configuration: no configuration has been provided, try setting KUBERNETES_MASTER environment variable
And the configuration is the same as with previous version provider.
provider "kubernetes" {
host = data.aws_eks_cluster.eks.endpoint
cluster_ca_certificate = base64decode(data.aws_eks_cluster.eks.certificate_authority[0].data)
token = data.aws_eks_cluster_auth.eks.token
}
Can you try running terraform refresh to see if that pulls in a new token? The token generated by aws_eks_cluster_auth is only valid for 15 minutes. For this reason, we recommend using an exec plugin to keep the token up to date automatically. Here's an example of that configuration:
provider "kubernetes" {
host = data.aws_eks_cluster.eks.endpoint
cluster_ca_certificate = base64decode(data.aws_eks_cluster.eks.certificate_authority[0].data)
exec {
api_version = "client.authentication.k8s.io/v1alpha1"
args = ["eks", "get-token", "--cluster-name", var.cluster_name]
command = "aws"
}
}
Alternatively, running the Kubernetes provider in separate terraform apply from the EKS cluster creation should work every time. (I'm not sure offhand if your EKS cluster is being created in the same apply, but just guessing since it's a common configuration).
There's also a working EKS example you can compare with your configs. There are some improvements coming soon for the example, since we're working on related authentication issues.
@dak1n1 I am considering this as a temporary workaround.
Can you try running
terraform refreshto see if that pulls in a new token? The token generated byaws_eks_cluster_authis only valid for 15 minutes. For this reason, we recommend using an exec plugin to keep the token up to date automatically. Here's an example of that configuration:provider "kubernetes" { host = data.aws_eks_cluster.eks.endpoint cluster_ca_certificate = base64decode(data.aws_eks_cluster.eks.certificate_authority[0].data) exec { api_version = "client.authentication.k8s.io/v1alpha1" args = ["eks", "get-token", "--cluster-name", var.cluster_name] command = "aws" } }Alternatively, running the Kubernetes provider in separate
terraform applyfrom the EKS cluster creation should work every time. (I'm not sure offhand if your EKS cluster is being created in the same apply, but just guessing since it's a common configuration).There's also a working EKS example you can compare with your configs. There are some improvements coming soon for the example, since we're working on related authentication issues.
Not sure about the 15mins issue, as we've been using this provider for almost a year now and the token validity has never been a problem. In fact, downgrading the provider to <2.0 works as expected.
I'll try force refreshing the token and report back the results.
Not sure about the 15mins issue, as we've been using this provider for almost a year now and the token validity has never been a problem. In fact, downgrading the provider to <2.0 works as expected.
I'll try force refreshing the token and report back the results.
Thanks! And about the downgrade fixing this -- that makes sense. Depending on your provider configuration, prior to 2.0, the Kubernetes provider may have actually been reading the KUBECONFIG environment variable (despite your valid configuration which includes a token and does not reference the kubeconfig file). This was a source of confusion that we were aiming to alleviate. The authentication workflow still needs some work though.
Not sure about the 15mins issue, as we've been using this provider for almost a year now and the token validity has never been a problem. In fact, downgrading the provider to <2.0 works as expected.
I'll try force refreshing the token and report back the results.
Thanks! And about the downgrade fixing this -- that makes sense. Depending on your provider configuration, prior to 2.0, the Kubernetes provider may have actually been reading the
KUBECONFIGenvironment variable (despite your valid configuration which includes a token and does not reference the kubeconfig file). This was a source of confusion that we were aiming to alleviate. The authentication workflow still needs some work though.
The KUBECONFIG issue is not present in our environment as we run Terraform in GitLab CI and never use that file to authenticate to clusters from it.
Terraform version: 0.14.5
Kubernetes provider version: 2.0.2
Kubernetes version: v1.18.9
I tried an apply with a clean state using the exec instead of the token in the kubernetes provider on the initial run when the eks cluster is created. I get the same Error: Unauthorized results for both when trying to apply my kubernetes resources.
Using the exec
```terraform {
required_providers {
aws = {
source = "hashicorp/aws"
version = "~> 3.26.0"
}
kubernetes = {
source = "hashicorp/kubernetes"
version = "~> 2.0.2"
}
}
}
provider "aws" {
region = var.region
}
data "aws_eks_cluster_auth" "cluster_token" {
name = module.eks.name
}
provider "kubernetes" {
host = module.eks.endpoint
cluster_ca_certificate = base64decode(module.eks.certificate)
exec {
api_version = "client.authentication.k8s.io/v1alpha1"
args = ["eks", "get-token", "--cluster-name", module.eks.name]
command = "aws"
}
}
```
The kubernetes resources are created correctly on a retry of the pipeline as stated in the comments above; using the token or exec method.
Terraform version: 0.14.5 Kubernetes provider version: 2.0.2 Kubernetes version: v1.18.9I tried an apply with a clean state using the exec instead of the token in the kubernetes provider on the initial run when the eks cluster is created. I get the same
Error: Unauthorizedresults for both when trying to apply my kubernetes resources.Using the exec
required_providers { aws = { source = "hashicorp/aws" version = "~> 3.26.0" } kubernetes = { source = "hashicorp/kubernetes" version = "~> 2.0.2" } } } provider "aws" { region = var.region } data "aws_eks_cluster_auth" "cluster_token" { name = module.eks.name } provider "kubernetes" { host = module.eks.endpoint cluster_ca_certificate = base64decode(module.eks.certificate) exec { api_version = "client.authentication.k8s.io/v1alpha1" args = ["eks", "get-token", "--cluster-name", module.eks.name] command = "aws" } }The kubernetes resources are created correctly on a retry of the pipeline as stated in the comments above; using the token or exec method.
@loungerider Thanks for testing this. I believe the issue in your case has to do with certain parameters passed into the Kubernetes provider which are unknown at the time of the provider initialization. I'm guessing module.eks.endpoint is unknown at plan time, but also the data source is probably being read too soon.
In the data source, the value of name = module.eks.name is likely known before the cluster is ready. So the data source will read the cluster too early, and pass invalid credentials into the Kubernetes provider. I'll show you an example that will make the data source wait until the cluster is ready:
data "aws_eks_cluster" "default" {
name = module.eks.cluster_id
}
# This data source is only needed if you're passing the token into the provider using `token =`.
data "aws_eks_cluster_auth" "default" {
name = module.eks.cluster_id
}
provider "kubernetes" {
# This defers provider initialization until the cluster is ready
host = data.aws_eks_cluster.default.endpoint
cluster_ca_certificate = base64decode(data.aws_eks_cluster.default.certificate_authority[0].data)
# This keeps the token up-to-date during subsequent applies, even if they run longer than the token TTL.
exec {
api_version = "client.authentication.k8s.io/v1alpha1"
args = ["eks", "get-token", "--cluster-name", module.eks.name]
command = "aws"
}
}
I'm assuming you're using the EKS module here, which has an output that waits for the cluster API to be ready (cluster_id). That's why the data source needs to know about cluster_id. Another option would be to add a depends_on explicitly to wait for this field (depends_on = [module.eks.cluster_id])
I also added a data source to read the cluster's hostname and CA cert data, so it will be able to read the new hostname and certs, if those ever change, such as on the first apply, or during cluster replacement.
Although a single apply scenario like this is less reliable than running apply twice, it is possible to do, it just has these gotchas to be aware of.
@dak1n1 I'm getting the same errors with the following:
Terraform version: 0.14.6
Kubernetes provider version: 2.0.2
EKS version: v1.18.9 -> v1.19.6
As you can see the the only change I'm attempting is to upgrade EKS from 1.18 to 1.19. With out posting all the code the relevant portions:
resource "null_resource" "wait_for_cluster" {
depends_on = [aws_eks_cluster.cluster]
provisioner "local-exec" {
command = "for i in `seq 1 60`; do if `command -v wget > /dev/null`; then wget --no-check-certificate -O - -q $ENDPOINT/healthz >/dev/null && exit 0 || true; else curl -k -s $ENDPOINT/healthz >/dev/null && exit 0 || true;fi; sleep 5; done; echo TIMEOUT && exit 1"
interpreter = ["/bin/sh", "-c"]
environment = {
ENDPOINT = aws_eks_cluster.cluster.endpoint
}
}
}
data "aws_eks_cluster" "eks_cluster" {
name = aws_eks_cluster.cluster.name
depends_on = [null_resource.wait_for_cluster]
}
data "aws_eks_cluster_auth" "eks_cluster" {
name = aws_eks_cluster.cluster.name
depends_on = [null_resource.wait_for_cluster]
}
provider "kubernetes" {
host = data.aws_eks_cluster.eks_cluster.endpoint
cluster_ca_certificate = base64decode(data.aws_eks_cluster.eks_cluster.certificate_authority.0.data)
token = data.aws_eks_cluster_auth.eks_cluster.token
}
provider "helm" {
kubernetes {
host = data.aws_eks_cluster.eks_cluster.endpoint
token = data.aws_eks_cluster_auth.eks_cluster.token
cluster_ca_certificate = base64decode(data.aws_eks_cluster.eks_cluster.certificate_authority.0.data)
}
}
My module follows the same conventions as the module you mentioned above except that I'm using the token instead of the exec method. We use Terraform Cloud for our workflow and I don't believe the AWS CLI is installed on those workers. The docs also warn against trying to install extra software on workers and even if you decide to ignore that advise doing so is kinda hacky to say the least. So IMO using the aws cli to generate creds should not be a solution to this issue.
I've tried running this multiple times and always get errors like these:
Error: Kubernetes cluster unreachable: invalid configuration: no configuration has been provided, try setting KUBERNETES_MASTER environment variable
Error: Get "http://localhost/apis/rbac.authorization.k8s.io/v1/namespaces/default/rolebindings/edit": dial tcp 127.0.0.1:80: connect: connection refused
My first question would be, is the token being stored somewhere in the state? I would assume the data source would be refreshed every run in case something changed (in this case I assume the token would be new with every run) therefore the 15 minute expiration should only be an issue on initial cluster creation where the token is created before the cluster. In the case above I would assume that should never happen due to the dependency chain of aws_eks_cluster -> null_resource -> aws_eks_cluster_auth.
If the token is refreshed every time then why am I seeing this error when specifying an upgrade to an already provisioned cluster. The upgrade is not changing the cluster name, it should change in place. The existing cluster should be there, so the token should be created and the provider should be able to read the cluster state and make an appropriate plan. I also find it very curious that I don't see any errors like this related to resources provisioned by the helm provider. I don't know if maybe that's because the errors in the kubernetes provider are ending the plan before it gets to helm or if there is something different in how Helm is doing things that dodges this issue.
I may try downgrading my provider to < 2.0 to see if this works there. If that's the case it's not a hidden KUBECONFIG file issue as you mentioned above because we run this on TFC and don't generate a KUBECONFIG file in our TF code for clusters. If I do try this I will try to remember to post results here.
Did some further digging and we may be barking in the wrong place: https://github.com/hashicorp/terraform-provider-aws/issues/10269#issuecomment-777906069
@jw-maynard I'm glad you found that other issue! It sounds like the EKS cluster could be getting replaced rather than updated in-place. Could you do a terraform plan to confirm this? (There should be a line that tells you if a change "forces replacement").
What I saw in your configuration is what we call a "single apply" scenario (that is, a configuration which contains both the EKS cluster (aws_eks_cluster.cluster) and the Kubernetes resources that will live on that cluster. In a single apply scenario, any replacement of an underlying Kubernetes cluster will cause the Kubernetes provider to fail to initialize, unless you do a specific workaround that I'll mention below.
This is a known limitation in Terraform core, which I recently saw described well in this comment. It's a problem any time you have a provider that depends on a resource (in this case, the Kubernetes provider is dependent on information from aws_eks_cluster.cluster, which is read from the data source... but that information is not available when the provider is initialized, because, presumably, the cluster is getting replaced).
If an underlying Kubernetes cluster is going to be replaced, and you already have Kubernetes resources provisioned using the Kubernetes provider, you'll have to work around this issue by doing a terraform state rm on the module containing all the Kubernetes resources (there's an example here). That way the Kubernetes resources will be recreated on the new cluster, and the terraform plan will succeed. Otherwise, the provider tries to initialize using an empty credentials block, since it does not yet know the credentials associated with the cluster being replaced.
This workaround is only needed in single-apply scenarios where you have the cluster and the Kubernetes resources sharing a single state. In general, it's more reliable to keep the Kubernetes resources in a separate state from the EKS cluster resource (for example, a different workspace in TFC, or a different root module). Two applies will work every time, but a single apply involves some work-arounds, depending on the scenario.
@dak1n1 It never gets that far because the plan errors but I know that version upgrades in EKS are an update in place scenario for sure. I guess they could have introduced a bug in the aws provider but I don't think so.
I did a lot of digging around in logs at the TRACE level for this plan and found some differences in how a successful plan handles the two data sources compared to how it handles them in a plan where I try to upgrade the version. Unfortunately I'm not familiar enough with the inner workings of TF and it's providers to know if this is fixable in the provider or not. I'm happy to share my findings privately with anyone at HashiCorp who's willing to listen. Single apply scenarios seem to be something that a fair number of people would like to be able to do when working with Kubernetes on cloud providers.
I can share what I think it the difference in the two runs. The failed one ends up in here for both EKS data sources (I'm just sharing aws_eks_cluster_auth but aws_eks_cluster has a the same log line:
2021/02/21 20:39:29 [TRACE] evalReadDataPlan: module.kubernetes_cluster.data.aws_eks_cluster_auth.eks_cluster configuration is fully known, but we're forcing a read plan to be created
This appears to becoming from here https://github.com/hashicorp/terraform/blob/618a3edcd13f5231a77a699b7ba2a3fba352b7a3/terraform/eval_read_data_plan.go#L65 which tells me that n.forcePlanRead(ctx) is True. Since the successful runs hit a log that comes from L107 (linked below) it seems to point to the failures running into something inside the if block from L63 to L103 and falling apart there.
A working run where the version is not updated I don't see the above at all but I see this:
2021/02/21 20:37:10 [TRACE] EvalReadData: module.kubernetes_cluster.data.aws_eks_cluster_auth.eks_cluster configuration is complete, so reading from provider
2021/02/21 20:37:10 [TRACE] GRPCProvider: ReadDataSource
2021-02-21T20:37:10.945Z [INFO] plugin.terraform-provider-aws_v3.29.0_x5: 2021/02/21 20:37:10 [DEBUG] Reading EKS Cluster: {
Name: "kubernetes01"
}: timestamp=2021-02-21T20:37:10.943Z
2021/02/21 20:37:10 [WARN] Provider "registry.terraform.io/hashicorp/aws" produced an unexpected new value for module.kubernetes_cluster.data.aws_eks_cluster_auth.eks_cluster.
- .token: inconsistent values for sensitive attribute
Then a call to eks/DescribeCluster. This EvalReadData appears to be logged inside the readDataSource here https://github.com/hashicorp/terraform/blob/618a3edcd13f5231a77a699b7ba2a3fba352b7a3/terraform/eval_read_data_plan.go#L107
So in the failed state it seems like the data source is not even updating for some reason. Odd considering the cluster would be updated in place. The fact that there's no read of the data source in the failure when something is changing just makes me feel like there's a logical bug somewhere maybe in core, but I don't feel knowledgeable enough to articulate it in an issue over there.
All that being said I am aware of the pitfalls with single apply scenarios and this certainly maybe one of those issues. The unfortunate part is that like they do with the EKS module you posted above, there are some things in EKS that require managing resource inside the cluster (aws-auth being a notable one) and it seems clunky to have to use two modules to fully provision one resource (EKS) to our specs.
@dak1n1 This config worked for me. Thanks!
```
data "aws_eks_cluster" "default" {
name = module.eks.name
depends_on = [module.eks.name]
}
data "aws_eks_cluster_auth" "default" {
name = module.eks.name
}
provider "kubernetes" {
host = data.aws_eks_cluster.default.endpoint
cluster_ca_certificate = base64decode(data.aws_eks_cluster.default.certificate_authority[0].data)
exec {
api_version = "client.authentication.k8s.io/v1alpha1"
args = ["eks", "get-token", "--cluster-name", module.eks.name]
command = "aws"
}
}
Using exec is not a viable solution when running in terraform cloud using remote execution. Our current thinking is to implement a workaround to essentially taint the aws_eks_cluster_auth data source so it gets refreshed for every plan. It would be ideal if the kubernetes provider had native support for getting and refreshing managed kubernetes service authentication tokens / credentials in order to support environments in which the only guaranteed tooling is terraform itself.
We faced with the same issue when running destroy (introduced in Terraform 0.14). Actually multiple providers affected helm, kubernetes, kubernetes-alpha. In 0.14 data sources are no longer refreshed on destroy, which is causing provider issues, it was implemented as part of:
https://github.com/hashicorp/terraform/issues/15386
Related issue is (which is closed):
https://github.com/hashicorp/terraform/issues/27172
For example any providers using datasource aws_eks_cluster_auth will fail on destroy:
data "aws_eks_cluster_auth" "cluster" {
name = var.cluster_name
}
The proposed workaround is to run plan or refresh (which may not be the best solution for every team).
Most helpful comment
Using
execis not a viable solution when running in terraform cloud using remote execution. Our current thinking is to implement a workaround to essentially taint theaws_eks_cluster_authdata source so it gets refreshed for every plan. It would be ideal if the kubernetes provider had native support for getting and refreshing managed kubernetes service authentication tokens / credentials in order to support environments in which the only guaranteed tooling is terraform itself.