Terraform-provider-kubernetes: Getting x509-certificate-signed-by-unknown-authority

Created on 8 Feb 2021  ·  10Comments  ·  Source: hashicorp/terraform-provider-kubernetes

Hi Everyone,

I have been able to successfully access an eks cluster created via eks terraform module with a caveat. I am unable to access the cluster securely.

Version Information

Terraform v0.14.5
+ provider registry.terraform.io/hashicorp/aws v3.26.0
+ provider registry.terraform.io/hashicorp/kubernetes v2.0.2
+ provider registry.terraform.io/hashicorp/local v2.0.0
+ provider registry.terraform.io/hashicorp/null v3.0.0
+ provider registry.terraform.io/hashicorp/random v3.0.1
+ provider registry.terraform.io/hashicorp/template v2.2.0
+ provider registry.terraform.io/hashicorp/tls v3.0.0

As far as I can understand, kubernetes provider is not accepting the cert generated during eks instantiation as safe/valid.

Until I pass insecure = true, I am unable to access the cluster. Please find below my scripts.

_k8s-provider.tf:_

provider "kubernetes" {
    host = data.aws_eks_cluster.cluster.endpoint
    #cluster_ca_certificate = base64decode(data.aws_eks_cluster.cluster.certificate_authority.0.data)
    token = data.aws_eks_cluster_auth.cluster.token
    config_path = "./kubeconfig_${var.cluster_name}"
    insecure = true
}

_eks-cluster.tf:_

data "aws_eks_cluster" "cluster" {
  name = module.eks.cluster_id
}

data "aws_eks_cluster_auth" "cluster" {
  name = module.eks.cluster_id
}

module "eks" {
    source = "terraform-aws-modules/eks/aws"
    version = "14.0.0"
    cluster_version = var.cluster_version
    cluster_name = var.cluster_name
    subnets = module.vpc.private_subnets
    cluster_endpoint_private_access = true
    cluster_create_timeout = "1h"
    vpc_id = module.vpc.vpc_id
    worker_groups = [
        {
            name = "atomstate_worker_group_one"
            instance_type = "t2.small"
            asg_desired_capacity = 1
            additional_security_group_ids = [ aws_security_group.worker_group_one.id ]
        }
    ]
    workers_group_defaults = {
        root_volume_type = "gp2"
    }
    wait_for_cluster_interpreter = ["C:\\Program Files\\Git\\bin\\sh.exe", "-c"]
    wait_for_cluster_cmd = "until curl -sk $ENDPOINT >/dev/null; do sleep 4; done"
}

As you can see, I had to comment out cluster_ca_certificate attribute and mention insecure as true.

Steps to reproduce

  1. Use the versions as highlighted above.
  2. Create EKS cluster using VPC and EKS terraform modules.
  3. Make insecure as false and don't comment out cluster_ca_cert.
  4. terraform apply.
  5. Get the x509 certificate error.

Expected Behavior
Access the cluster securely without x509 certifcation error.

Actual Behavior
Accessing the cluster insecurely with insecure set to true.

References
https://discuss.hashicorp.com/t/x509-certificate-signed-by-unknown-authority/8671

bug

All 10 comments

This could happen if the value of data.aws_eks_cluster.cluster.certificate_authority.0.data is unknown when the provider is initialized. Can you try running terraform refresh and see if that pulls in a new value for the CA cert? Alternatively, a targeted apply could help:

terraform apply -target=module.eks

I have a similar configuration, but I was fetching the certificate from the EKS module like this:

https://github.com/hashicorp/terraform-provider-kubernetes/blob/master/_examples/eks/main.tf#L56

I think your configuration is a better approach though. I'll update the example config using your approach and let you know the results.

@dak1n1 I will try your suggestion mate and let you know if it has worked. Cheers!

@dak1n1 - Is instance size the reason behind this error? Because this error is not cropping up when I change the instance to a bigger one...like m4.large instead of a smaller one like t2.small.

I am just assuming, and it is a wild guess.

now my code block is

provider "kubernetes" {
  host                   = data.aws_eks_cluster.cluster.endpoint
  cluster_ca_certificate = base64decode(data.aws_eks_cluster.cluster.certificate_authority.0.data)
  token                  = data.aws_eks_cluster_auth.cluster.token
  config_path            = "./kubeconfig_${var.cluster_name}"
  insecure               = false
}

That is interesting... In my testing, I've been able to update the instance size of an EKS cluster without having the cluster get re-created, so updating the instance size shouldn't cause data.aws_eks_cluster.cluster to become unknown, and therefore shouldn't trigger the Kubernetes provider to have any authentication or certificate issues.

BTW, I did incorporate the data source you used into our EKS example, since it's a more reliable way to refer to the certificates than using the module outputs. So that part works well.

I tried out instance size t2.small and even changed it to m4.large and that worked.

Oh! You know what I just noticed... this configuration is actually using mutually-exclusive authentication options. :facepalm: Sorry I didn't see that before! When multiple ways of authenticating are specified, such as when using config_path with token, the Kubernetes provider will combine them in ways that are difficult to predict. There's a chance it could be pulling the CA cert from config_path instead of using the one you're passing into it explicitly with cluster_ca_certificate:

provider "kubernetes" {
    host = data.aws_eks_cluster.cluster.endpoint
    #cluster_ca_certificate = base64decode(data.aws_eks_cluster.cluster.certificate_authority.0.data)
    token = data.aws_eks_cluster_auth.cluster.token
    config_path = "./kubeconfig_${var.cluster_name}"
    insecure = true
}

I have a fix for that provider bug that will make it easier to configure this.

In the mean time, can you try a configuration that does not specify config_path? Also check for any environment variables on the system that start with KUBE. Those are pulled into the provider config and can override statically configured settings like this, until we release that fix.

For example:

This is how I check for environment variables on my system:

env |grep KUBE
unset KUBE_CONFIG_PATH

I usually unset any that might interfere (here's a list of them):

Assuming there are no KUBE environment variables interfering, the following provider config should work. This is the one I had been using in my tests since I saw this issue:

data "aws_eks_cluster_auth" "default" {
  name = "module.eks.cluster_id"
}

data "aws_eks_cluster" "default" {
  name = "module.eks.cluster_id"
}

provider "kubernetes" {
  host                   = data.aws_eks_cluster.default.endpoint
  cluster_ca_certificate = base64decode(data.aws_eks_cluster.default.certificate_authority[0].data)
  token                  = data.aws_eks_cluster_auth.default.token
}

Once the fix is released, the provider will be more straightforward in telling you exactly what options are incompatible with what other options, rather than silently combining some of the given options and ignoring some other options, as it does today.

@dak1n1 Those are some interesting observations!

I will try out the config without config_path and let you know the results.

Have a great weekend!

@dak1n1 I've applied the following k8s logic upon your suggestion.

provider "kubernetes" {
  host                   = data.aws_eks_cluster.cluster.endpoint
  cluster_ca_certificate = base64decode(data.aws_eks_cluster.cluster.certificate_authority.0.data)
  token                  = data.aws_eks_cluster_auth.cluster.token
}

Removed config_path, and it has worked. I didn't see the error that's been highlighted in the issue summary.

I somehow feel the issue highlighted in summary arose due to a combination of wrong instance size and mixed authentication scenario. Why I am saying this because I was able to connect to k8s cluster using kubectl after modifying instance size. No error cropped up (as told in previous message). Also, I didn't remove config_path at that time.

Looking forward for the new version!

I'm glad to hear it worked! I'll go ahead and close this for now. If you find the issue comes up again, we can re-open it.

Sure 👍

Another advantage of avoiding config_path variable in k8s provider configuration is when a user tries to destroy k8s cluster, Terraform wouldn't throw kubeconfig file is not available in the path error.
`
You can mention this in the documentation.

Have a great day!

@ackris I haven't encountered that error myself, but my team would be happy to take a look in a new github issue, especially if you have a config that reproduces the issue. I wouldn't want to leave the bug there and document it as expected behavior, since we can probably fix it instead. Offhand, I think it could be solved in the configuration by ensuring that Terraform knows about the dependency between the file and the Kubernetes provider. Referencing the file by its resource name, rather than an output or hard-coded file name, would establish an implicit dependency between the two. We could document the configuration for that, if it ends up being better solved in configs than in the provider code.

I'm going to lock this issue because it has been closed for _30 days_ ⏳. This helps our maintainers find and focus on the active issues.

If you feel this issue should be reopened, we encourage creating a new issue linking back to this one for added context. If you feel I made an error 🤖 🙉 , please reach out to my human friends 👉 [email protected]. Thanks!

Was this page helpful?
0 / 5 - 0 ratings