Terraform-provider-kubernetes: Unauthorized error for any admin users in accounts on terraform plan/apply except actual cluster creator

Created on 8 Feb 2021  ·  14Comments  ·  Source: hashicorp/terraform-provider-kubernetes

Hi, getting an error on terraform plan for other users in the account (users have full admin access to AWS account, no limits on resources view and modifications. Checked all permissions and groups the same for these users.
If I creating a cluster with one user, another unable to run plan and therefore apply. If I destroying cluster and creating it with another user first one unable to use plan and apply, so that is not specific user permission-related.

The issue seems to be related to terraform provider. I am using the same cluster config for both users.

Terraform Version, Provider Version and Kubernetes Version

Terraform v0.14.2
+ provider registry.terraform.io/hashicorp/aws v3.27.0
+ provider registry.terraform.io/hashicorp/kubernetes v2.0.2

Affected Resource(s)

Terraform Configuration Files

provider "kubernetes" {
  host                   = data.aws_eks_cluster.cluster.endpoint
  cluster_ca_certificate = base64decode(data.aws_eks_cluster.cluster.certificate_authority.0.data)
  exec {
    api_version = "client.authentication.k8s.io/v1alpha1"
    args        = ["eks", "get-token", "--cluster-name", var.cluster_name]
    command     = "aws"
  }
  config_path = "~/.kube/config.cluster-preprod"
}

module "eks" {
  source       = "terraform-aws-modules/eks/aws"
  cluster_name = var.cluster_name
  subnets      = data.terraform_remote_state.vpc.outputs.private_subnets
  vpc_id       = data.terraform_remote_state.vpc.outputs.vpc_id
  cluster_version = "1.18"
  cluster_endpoint_private_access = true
  cluster_endpoint_private_access_cidrs = ["10.0.0.0/8"]
  cluster_endpoint_public_access = true
  cluster_endpoint_public_access_cidrs = ["1.1.0.0/16"]
  write_kubeconfig = false
  create_eks  = true
  enable_irsa  = true
  manage_aws_auth = true
  manage_cluster_iam_resources = true
  kubeconfig_aws_authenticator_command = "aws"
  kubeconfig_aws_authenticator_command_args = ["eks", "get-token", "--cluster-name", var.cluster_name]

  worker_groups_launch_template = [
    {
      name                    = "masters"
      override_instance_types = var.prod_template_instance_types
      asg_max_size            = var.asg_max
      asg_desired_capacity    = var.asg_desired
      on_demand_percentage_above_base_capacity = 100
      kubelet_extra_args      = "--node-labels=node.kubernetes.io/lifecycle=normal --node-labels=environment=masters"
      #bootstrap_extra_args    = "--enable-docker-bridge true"
      public_ip               = false
      #ami_id                        = data.aws_ami.ubuntu.id
      worker_ami_name_filter = "ubuntu-eks/k8s_1.18/images/*"
      worker_ami_owner_id     = "099720109477"
      asg_recreate_on_change        = false
      key_name                      = "cluster-preprod"
      enable_monitoring             = true                        # Enables/disables detailed monitoring.
      additional_security_group_ids = [data.terraform_remote_state.bastion.outputs.security_group_id]#, 
      #                    data.terraform_remote_state.es.outputs.instances_sg,
      #                    data.terraform_remote_state.elasticache.outputs.instances_sg]
      cpu_credits                   = "unlimited" # T2/T3 unlimited mode, can be 'standard' or 'unlimited'.
      manage_aws_auth = true
      #map_roles                            = [aws_iam_policy.aws_ebs_csi_driver.arn]
      map_users                            = var.map_users
      #map_accounts                         = var.map_accounts
      #service_linked_role_arn
      suspended_processes = ["AZRebalance"]
    },{
      name                    = "preprod"
      override_instance_types = var.stage_template_instance_types
      root_volume_size        = "100"  
      spot_instance_pools     = 4
      asg_max_size            = var.asg_max
      asg_desired_capacity    = var.asg_desired
      kubelet_extra_args      = "--node-labels=node.kubernetes.io/lifecycle=spot --node-labels=environment=stage"
      #bootstrap_extra_args    = "--enable-docker-bridge true" # not needed for ubuntu
      public_ip               = false
      ami_id                        = data.aws_ami.ubuntu.id
      #worker_ami_name_filter = "ubuntu-eks/k8s_1.18/images/*"
      #worker_ami_owner_id     = "099720109477"
      asg_recreate_on_change        = true
      spot_max_price                = ""
      key_name                      = "cluster-preprod"
      enable_monitoring             = true                        # Enables/disables detailed monitoring.
      additional_security_group_ids = [data.terraform_remote_state.bastion.outputs.security_group_id]#, 
      #                    data.terraform_remote_state.es.outputs.instances_sg,
      #                    data.terraform_remote_state.elasticache.outputs.instances_sg]
      cpu_credits                   = "unlimited" # T2/T3 unlimited mode, can be 'standard' or 'unlimited'.
      manage_aws_auth = true
      #map_roles                            = [aws_iam_policy.aws_ebs_csi_driver.arn]
      map_users                            = var.map_users
      #map_accounts                         = var.map_accounts
      #service_linked_role_arn
      suspended_processes = ["AZRebalance"]
    }
  ]

Debug Output

2021-02-08T09:30:51.712-0800 [INFO] plugin.terraform-provider-kubernetes_v2.0.2_x5: 2021/02/08 09:30:51 [DEBUG] Kubernetes API Request Details:
---[ REQUEST ]---------------------------------------
GET /api/v1/namespaces/kube-system/configmaps/aws-auth HTTP/1.1
Host: 4E1D727206F1018F94AC71D6DDF03AAA.gr7.us-west-2.eks.amazonaws.com
User-Agent: HashiCorp/1.0 Terraform/0.14.2
Accept: application/json, /
Authorization: Bearer k8s-aws-v1.aaaaaaaaaaabbbbbbbbbbccccccccxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxyyyyyyyyyyyyyyyyyyyyyyyyyyyyzzzzzzzzzzzzzzzzzzz
Accept-Encoding: gzip

-----------------------------------------------------: timestamp=2021-02-08T09:30:51.712-0800
2021-02-08T09:30:51.824-0800 [INFO] plugin.terraform-provider-kubernetes_v2.0.2_x5: 2021/02/08 09:30:51 [DEBUG] Kubernetes API Response Details:
---[ RESPONSE ]--------------------------------------
HTTP/2.0 401 Unauthorized
Content-Length: 129
Audit-Id: 81e16ab2-3dd4-4236-98e8-fda861497a6a
Cache-Control: no-cache, private
Content-Type: application/json
Date: Mon, 08 Feb 2021 17:30:51 GMT

{
"kind": "Status",
"apiVersion": "v1",
"metadata": {},
"status": "Failure",
"message": "Unauthorized",
"reason": "Unauthorized",
"code": 401
}

-----------------------------------------------------: timestamp=2021-02-08T09:30:51.824-0800
2021-02-08T09:30:52.470-0800 [INFO] plugin.terraform-provider-kubernetes_v2.0.2_x5: 2021/02/08 09:30:52 [DEBUG] Received error: &errors.StatusError{ErrStatus:v1.Status{TypeMeta:v1.TypeMeta{Kind:"", APIVersion:""}, ListMeta:v1.ListMeta{SelfLink:"", ResourceVersion:"", Continue:"", RemainingItemCount:(int64)(nil)}, Status:"Failure", Message:"Unauthorized", Reason:"Unauthorized", Details:(v1.StatusDetails)(nil), Code:401}}: timestamp=2021-02-08T09:30:52.470-0800

Panic Output

Steps to Reproduce

  1. run terraform apply
  2. change effective user to another one using new ENV variables for AWS_ACCESS_KEY_ID and AWS_SECRET_ACCESS_KEY
  3. run terraform plan

Expected Behavior

What should have happened?
should be shown the result of terraform plan

Actual Behavior

What actually happened?
getting Error: Unauthorized

Important Factoids

References

Community Note

  • Please vote on this issue by adding a 👍 reaction to the original issue to help the community and maintainers prioritize this request
  • If you are interested in working on this issue or have submitted a pull request, please leave a comment
bug needs investigation

Most helpful comment

I got this error when I was switching from a previous version of the Kubernetes provider to the current one, my change was from

provider "kubernetes" {
  host                              =  ...
  cluster_ca_certificate = ...
  token                            = data.aws_eks_cluster_auth.cluster.token
}

to what you have, using the exec like so

provider "kubernetes" {
  host                              =  ...
  cluster_ca_certificate = ...
  exec {
      api_version = "client.authentication.k8s.io/v1alpha1"
      args        = ["eks", "get-token", "--cluster-name", var.cluster_name]
      command     = "aws"
    }
}

I reverted back to the token instead and it deployed fine after that. Keep in mind that if I ran my terraform from a clean build (no .tfstate etc) where I didn't have anything yet deployed the command ran without a hitch. Only when I attempted to replace token and exec for an existing deployment did I see this error. Actually, I think the real reason it failed was because I ran this (from a clean slate) from a computer where I had put my credentials in. My failing attempt happened within an aws ec2 where the aws cli was installed but I hadn't assigned any role that would give it the permission to do eks activities. Case in point when I attempted to run

> aws eks list-clusters --region <region I have my clusters in>
An error occurred (AccessDeniedException)....

so I think I would have to make sure that the ec2 where I was running the terraform command has the appropriate perms (even though I was feeding the permissions to terraform via variables) because the exec command was calling the aws cli dynamically

All 14 comments

I believe it's a duplicate of https://github.com/hashicorp/terraform-provider-aws/issues/16320 but this bug is reported in the correct provider repo.

Got hit with this very issue too using Akeyless's AWS Producer, it generates AWS access credentials dynamically based on IAM policies which I use in a pipeline with Terraform.

Terraform v0.14.6
+ provider registry.terraform.io/hashicorp/aws v3.27.0
+ provider registry.terraform.io/hashicorp/kubernetes v2.0.2
*snip*
module.eks.kubernetes_config_map.aws_auth[0]: Refreshing state... [id=kube-system/aws-auth]
kubernetes_service_account.aws-load-balancer-controller: Refreshing state... [id=kube-system/aws-load-balancer-controller]
module.eks.random_pet.workers[0]: Refreshing state... [id=pumped-pigeon]
module.eks.aws_autoscaling_group.workers[0]: Refreshing state... [id=aws-eks-template-worker-group-120210210172016823200000014]
Error: Unauthorized
Error: Unauthorized
Error: Unauthorized
Error: Unauthorized
Error: Kubernetes cluster unreachable: the server has asked for the client to provide credentials
Cleaning up file based variables 00:00
ERROR: Job failed: command terminated with exit code 1

And if I revert back to the static, non-dynamic iam user that created the infa:

*snip*
gitlab_group_cluster.aws_cluster: Refreshing state... [id=5151960:137133]
module.eks.random_pet.workers[0]: Refreshing state... [id=pumped-pigeon]
module.eks.aws_autoscaling_group.workers[0]: Refreshing state... [id=aws-eks-template-worker-group-120210210172016823200000014]
kubernetes_ingress.connect_ingress: Refreshing state... [id=kube-system/alb-ingress-connect-nginx]
An execution plan has been generated and is shown below.
Resource actions are indicated with the following symbols:
  + create
Terraform will perform the following actions:
  # aws_iam_policy.AWSLoadBalancerControllerIAMPolicy will be created
  + resource "aws_iam_policy" "AWSLoadBalancerControllerIAMPolicy" {
*snip*

I got this error when I was switching from a previous version of the Kubernetes provider to the current one, my change was from

provider "kubernetes" {
  host                              =  ...
  cluster_ca_certificate = ...
  token                            = data.aws_eks_cluster_auth.cluster.token
}

to what you have, using the exec like so

provider "kubernetes" {
  host                              =  ...
  cluster_ca_certificate = ...
  exec {
      api_version = "client.authentication.k8s.io/v1alpha1"
      args        = ["eks", "get-token", "--cluster-name", var.cluster_name]
      command     = "aws"
    }
}

I reverted back to the token instead and it deployed fine after that. Keep in mind that if I ran my terraform from a clean build (no .tfstate etc) where I didn't have anything yet deployed the command ran without a hitch. Only when I attempted to replace token and exec for an existing deployment did I see this error. Actually, I think the real reason it failed was because I ran this (from a clean slate) from a computer where I had put my credentials in. My failing attempt happened within an aws ec2 where the aws cli was installed but I hadn't assigned any role that would give it the permission to do eks activities. Case in point when I attempted to run

> aws eks list-clusters --region <region I have my clusters in>
An error occurred (AccessDeniedException)....

so I think I would have to make sure that the ec2 where I was running the terraform command has the appropriate perms (even though I was feeding the permissions to terraform via variables) because the exec command was calling the aws cli dynamically

We think this is related to the policy configuration of the IAM users being created - could any of you who are able to reproduce let us know what configuration is being assigned to the users?

One way you can confirm that it is a permissions issue is by adding an admin policy to the user who is unable to access the cluster and see if they are then able to access - this would point to an issue with the configuration in IAM.

As I mentioned in first post all my users have admin policy attached with full permissions in the AWS account. They able to do everything via aws console or CLI, including modifying or deleting cluster. but not through terraform, where they getting permission denied on one of the steps related to kubernetes itself. And they have identical kubeconfig from created cluster.

@aareet I can also confirm what @vvchik is seeing. I posted on the issue @loungerider mentions here.

All users involved are aws admins on the account and can create/modify/delete any resource via cli or aws console but cannot modify/delete an eks cluster with terraform that they themselves did not create.

@vvchik in your particular case, the provider configuration seems to contain both kubeconfig and cluster cert/token

 host                   = data.aws_eks_cluster.cluster.endpoint
  cluster_ca_certificate = base64decode(data.aws_eks_cluster.cluster.certificate_authority.0.data)
  exec {
    api_version = "client.authentication.k8s.io/v1alpha1"
    args        = ["eks", "get-token", "--cluster-name", var.cluster_name]
    command     = "aws"
  }
  config_path = "~/.kube/config.cluster-preprod"

There may be some issue with credentials being picked up from one or the other that may be causing an issue. Typically it's recommended to only use one method of authentication

@rmendal can you post your configuration so we can investigate?

```terraform {
# The configuration for this backend will be filled in by Terragrunt
required_version = ">=0.14"
required_providers {
aws = {
source = "hashicorp/aws"
version = "~> 3.0"
}
kubernetes = {
version = "~>2.0.0"
}
}
}

provider "aws" {
region = var.region
}

data "aws_eks_cluster" "cluster" {
name = module.eks.cluster_id
}

data "aws_eks_cluster_auth" "cluster" {
name = module.eks.cluster_id
}

data "aws_caller_identity" "current" {}

data "aws_subnet_ids" "my_subnets" {
vpc_id = var.vpc_id
tags = {
key = "value"
}
}

provider "kubernetes" {
host = data.aws_eks_cluster.cluster.endpoint
cluster_ca_certificate = base64decode(data.aws_eks_cluster.cluster.certificate_authority.0.data)
token = data.aws_eks_cluster_auth.cluster.token
}

module "eks" {
source = "terraform-aws-modules/eks/aws"
version = "14.0.0"

cluster_endpoint_private_access = true
cluster_endpoint_public_access = false
enable_irsa = true
cluster_endpoint_private_access_cidrs = var.priv_access_cidrs
cluster_name = var.cluster_name
cluster_version = var.cluster_version
create_eks = true
vpc_id = var.vpc_id
subnets = data.aws_subnet_ids.my_subnets.ids
write_kubeconfig = false

map_roles = [
{
"rolearn": "${aws_iam_role.eks_admin.arn}",
"username": "${aws_iam_role.eks_admin.name}",
"groups": [ "system:masters" ],
},
{
"rolearn": "${aws_iam_role.eks_read_only.arn}",
"username": "${aws_iam_role.eks_read_only.name}",
"groups": [ "system:eks-read-only-group" ],
}
]

kubeconfig_aws_authenticator_env_variables = {
AWS_PROFILE = var.profile
}

workers_group_defaults = {
asg_desired_capacity = var.node_group_desired_size,
asg_min_size = var.node_group_min_size,
asg_max_size = var.node_group_max_size,
}

node_groups = [
{
subnets = var.node_group_a_subnets,
instance_types = var.node_group_type,
name = var.node_group_a_name,
additional_tags = {
"k8s.io/cluster-autoscaler/${var.cluster_name}" = "owned",
"k8s.io/cluster-autoscaler/enabled" = "true",
},
node_group_k8s_labels = {
Environment = var.environment,
},
}
]

tags = {
Environment = var.environment
Team = var.team
}
}
```

OK, seems I was very wrong.

For whatever reason I posted map_users = var.map_users into worker_groups_launch_template copy-paste issue. that map should be in the module block.
After fixing my issue gone.

Sorry for raising that.

Seems other users have a similar issue. My recommendation is to check mappings in the aws_auth config map and check if these users are actually able to see/edit resources inside of the cluster, not the AWS account, they may have admin rights in the account level but could be mapped wrong and have no admin rights inside of the cluster.

My recommendation is to check mappings in the aws_auth config map and check if these users are actually able to see/edit resources inside of the cluster, not the AWS account, they may have admin rights in the account level but could be mapped wrong and have no admin rights inside of the cluster.

Myself and other users are able to interact with the cluster, as system:masters based on my code in my previous post. The assumable role is mapped to the correct context in our kube config files.

That said, my issue still stands. If a colleague created a cluster using the code in my previous post no one else will be able to plan/apply/destroy it in any way, just them.

@rmendal Isn't the problem that your provider in that code is not assuming the role you created that has the system:masters mapping in aws_auth. I don't know what your .aws/config files look like but if you're not assuming the role using the default profile in that file or by explicitly specifying it in your provider block then you wouldn't have access to the cluster. Right? Maybe?

@vvchik Are you satisfied with the solution based on your findings? If so I would like to go ahead and close this issue.

I'm going to lock this issue because it has been closed for _30 days_ ⏳. This helps our maintainers find and focus on the active issues.

If you feel this issue should be reopened, we encourage creating a new issue linking back to this one for added context. If you feel I made an error 🤖 🙉 , please reach out to my human friends 👉 [email protected]. Thanks!

Was this page helpful?
0 / 5 - 0 ratings