Aws-load-balancer-controller: Controller trying to use wrong token for service account

Created on 2 Nov 2020 · 10Comments · Source: kubernetes-sigs/aws-load-balancer-controller

I'm probably missing something obvious, but seems like the controller pod is trying to access some other secret instead of the IAM token it is configured to be using for its ServiceAccount...?

Installed through helm:

kubectl apply -k "github.com/aws/eks-charts/stable/aws-load-balancer-controller//crds?ref=master"
helm upgrade -i aws-load-balancer-controller eks/aws-load-balancer-controller --set clusterName=ctdemo-development-services --set serviceAccount.create=false --set serviceAccount.name=ctdemo-development-eks-service-account-load-balancer --set vpcId=vpc-... --set region=us-east-1 -n kube-system

Pod errors out with:

{"level":"info","ts":1604336648.555419,"msg":"version","GitVersion":"v2.0.0","GitCommit":"1028fa4f363a9a8e37b07ff6a093b7b422923512","BuildDate":"2020-10-21T22:17:18+0000"}
{"level":"error","ts":1604336648.5561078,"logger":"setup","msg":"unable to build REST config","error":"open /var/run/secrets/kubernetes.io/serviceaccount/token: no such file or directory"}

To test IAM SA, I deployed an nginx pod in kube-system namespace using the same service account as above (ctdemo-development-eks-service-account-load-balancer), then within a kubectl exec to that pod I am able to assume the role:

root@nginx:/# aws sts assume-role-with-web-identity --role-arn $AWS_ROLE_ARN --role-session-name anyname --web-identity-token file://$AWS_WEB_IDENTITY_TOKEN_FILE --duration-seconds 1000
{
    "Credentials": {
        "AccessKeyId": "...",
        "SecretAccessKey": "...",
        "SessionToken": "...",
        "Expiration": "2020-11-02T17:28:44Z"
    },
    "SubjectFromWebIdentityToken": "system:serviceaccount:kube-system:ctdemo-development-eks-service-account-load-balancer",
    "AssumedRoleUser": {
        "AssumedRoleId": "...:anyname",
        "Arn": "arn:aws:sts::...:assumed-role/ctdemo-development-eks-service-account-load-balancer/anyname"
    },
    "Provider": "arn:aws:iam::...:oidc-provider/oidc.eks.us-east-1.amazonaws.com/id/...",
    "Audience": "sts.amazonaws.com"
}

Note that I'm using a Fargate EKS cluster.

$ kubectl -n kube-system describe deployment.apps/aws-load-balancer-controller
Name:                   aws-load-balancer-controller
Namespace:              kube-system
CreationTimestamp:      Mon, 02 Nov 2020 11:56:27 -0500
Labels:                 app.kubernetes.io/instance=aws-load-balancer-controller
                        app.kubernetes.io/managed-by=Helm
                        app.kubernetes.io/name=aws-load-balancer-controller
                        app.kubernetes.io/version=v2.0.0
                        helm.sh/chart=aws-load-balancer-controller-1.0.4
Annotations:            deployment.kubernetes.io/revision: 1
                        meta.helm.sh/release-name: aws-load-balancer-controller
                        meta.helm.sh/release-namespace: kube-system
Selector:               app.kubernetes.io/instance=aws-load-balancer-controller,app.kubernetes.io/name=aws-load-balancer-controller
Replicas:               1 desired | 1 updated | 1 total | 0 available | 1 unavailable
StrategyType:           RollingUpdate
MinReadySeconds:        0
RollingUpdateStrategy:  25% max unavailable, 25% max surge
Pod Template:
  Labels:           app.kubernetes.io/instance=aws-load-balancer-controller
                    app.kubernetes.io/name=aws-load-balancer-controller
  Annotations:      prometheus.io/port: 8080
                    prometheus.io/scrape: true
  Service Account:  ctdemo-development-eks-service-account-load-balancer
  Containers:
   aws-load-balancer-controller:
    Image:       602401143452.dkr.ecr.us-west-2.amazonaws.com/amazon/aws-load-balancer-controller:v2.0.0
    Ports:       9443/TCP, 8080/TCP
    Host Ports:  0/TCP, 0/TCP
    Command:
      /controller
    Args:
      --cluster-name=ctdemo-development-services
      --ingress-class=alb
      --aws-region=us-east-1
      --aws-vpc-id=vpc-...
    Liveness:     http-get http://:61779/healthz delay=30s timeout=10s period=10s #success=1 #failure=2
    Environment:  <none>
    Mounts:
      /tmp/k8s-webhook-server/serving-certs from cert (ro)
  Volumes:
   cert:
    Type:        Secret (a volume populated by a Secret)
    SecretName:  aws-load-balancer-tls
    Optional:    false
Conditions:
  Type           Status  Reason
  ----           ------  ------
  Progressing    True    NewReplicaSetAvailable
  Available      False   MinimumReplicasUnavailable
OldReplicaSets:  <none>
NewReplicaSet:   aws-load-balancer-controller-77b6f47888 (1/1 replicas created)
Events:
  Type    Reason             Age   From                   Message
  ----    ------             ----  ----                   -------
  Normal  ScalingReplicaSet  40m   deployment-controller  Scaled up replica set aws-load-balancer-controller-77b6f47888 to 1

$ kubectl -n kube-system describe pod aws-load-balancer-controller-77b6f47888-ks6w4
Name:                 aws-load-balancer-controller-77b6f47888-ks6w4
Namespace:            kube-system
Priority:             2000001000
Priority Class Name:  system-node-critical
Node:                 fargate-ip-10-10-2-29.ec2.internal/10.10.2.29
Start Time:           Mon, 02 Nov 2020 11:57:39 -0500
Labels:               app.kubernetes.io/instance=aws-load-balancer-controller
                      app.kubernetes.io/name=aws-load-balancer-controller
                      eks.amazonaws.com/fargate-profile=ctdemo-development-services
                      pod-template-hash=77b6f47888
Annotations:          kubernetes.io/psp: eks.privileged
                      prometheus.io/port: 8080
                      prometheus.io/scrape: true
Status:               Running
IP:                   10.10.2.29
IPs:
  IP:           10.10.2.29
Controlled By:  ReplicaSet/aws-load-balancer-controller-77b6f47888
Containers:
  aws-load-balancer-controller:
    Container ID:  containerd://dc7f44ee0adf7d2f5e37f7827f169c4a9f9bb910e7f0bea0f087cc11d1fefd29
    Image:         602401143452.dkr.ecr.us-west-2.amazonaws.com/amazon/aws-load-balancer-controller:v2.0.0
    Image ID:      602401143452.dkr.ecr.us-west-2.amazonaws.com/amazon/aws-load-balancer-controller@sha256:1a9e08d2766785e9e6320bcdf2298f9f4fbb134ed7da4ff27aec9d7f176232de
    Ports:         9443/TCP, 8080/TCP
    Host Ports:    0/TCP, 0/TCP
    Command:
      /controller
    Args:
      --cluster-name=ctdemo-development-services
      --ingress-class=alb
      --aws-region=us-east-1
      --aws-vpc-id=vpc-...
    State:          Waiting
      Reason:       CrashLoopBackOff
    Last State:     Terminated
      Reason:       Error
      Exit Code:    1
      Started:      Mon, 02 Nov 2020 12:34:49 -0500
      Finished:     Mon, 02 Nov 2020 12:34:49 -0500
    Ready:          False
    Restart Count:  12
    Liveness:       http-get http://:61779/healthz delay=30s timeout=10s period=10s #success=1 #failure=2
    Environment:
      AWS_DEFAULT_REGION:           us-east-1
      AWS_REGION:                   us-east-1
      AWS_ROLE_ARN:                 arn:aws:iam::...:role/ctdemo-development-eks-service-account-load-balancer
      AWS_WEB_IDENTITY_TOKEN_FILE:  /var/run/secrets/eks.amazonaws.com/serviceaccount/token
    Mounts:
      /tmp/k8s-webhook-server/serving-certs from cert (ro)
      /var/run/secrets/eks.amazonaws.com/serviceaccount from aws-iam-token (ro)
Conditions:
  Type              Status
  Initialized       True 
  Ready             False 
  ContainersReady   False 
  PodScheduled      True 
Volumes:
  aws-iam-token:
    Type:                    Projected (a volume that contains injected data from multiple sources)
    TokenExpirationSeconds:  86400
  cert:
    Type:        Secret (a volume populated by a Secret)
    SecretName:  aws-load-balancer-tls
    Optional:    false
QoS Class:       BestEffort
Node-Selectors:  <none>
Tolerations:     node.kubernetes.io/not-ready:NoExecute op=Exists for 300s
                 node.kubernetes.io/unreachable:NoExecute op=Exists for 300s
Events:
  Type     Reason     Age                   From               Message
  ----     ------     ----                  ----               -------
  Normal   Scheduled  42m                   fargate-scheduler  Successfully assigned kube-system/aws-load-balancer-controller-77b6f47888-ks6w4 to fargate-ip-10-10-2-29.ec2.internal
  Normal   Pulling    42m                   kubelet            Pulling image "602401143452.dkr.ecr.us-west-2.amazonaws.com/amazon/aws-load-balancer-controller:v2.0.0"
  Normal   Pulled     41m                   kubelet            Successfully pulled image "602401143452.dkr.ecr.us-west-2.amazonaws.com/amazon/aws-load-balancer-controller:v2.0.0"
  Normal   Started    40m (x4 over 41m)     kubelet            Started container aws-load-balancer-controller
  Normal   Created    39m (x5 over 41m)     kubelet            Created container aws-load-balancer-controller
  Normal   Pulled     39m (x4 over 41m)     kubelet            Container image "602401143452.dkr.ecr.us-west-2.amazonaws.com/amazon/aws-load-balancer-controller:v2.0.0" already present on machine
  Warning  BackOff    117s (x191 over 41m)  kubelet            Back-off restarting failed container

Source

wr0ngway

Most helpful comment

Ok, thanks for the help - your responses helped me to figure it out. I was creating the serviceaccount in terraform. It has an attribute automount_service_account_token which is set to false by default. After setting it to true, the api server token is now mounted, and the pod starts successfully. Closing this issue now, but it may be something you should mention in the docs for others avoiding the use of eksctl :)

wr0ngway on 2 Nov 2020

👍3

All 10 comments

Can you verify if the serviceaccount exists in the kube-system namespace?

kubectl describe serviceaccounts -n kube-system ctdemo-development-eks-service-account-load-balancer

kishorj on 2 Nov 2020

It does. Note that the nginx test pod using that account is also running in the kube-system namespace, and I have no problem using the aws cli to assume the role when running with that service account.

$ kubectl describe serviceaccounts -n kube-system ctdemo-development-eks-service-account-load-balancer
Name:                ctdemo-development-eks-service-account-load-balancer
Namespace:           kube-system
Labels:              <none>
Annotations:         eks.amazonaws.com/role-arn: arn:aws:iam::...:role/ctdemo-development-eks-service-account-load-balancer
Image pull secrets:  <none>
Mountable secrets:   ctdemo-development-eks-service-account-load-balancer-tokenrjsmv
Tokens:              ctdemo-development-eks-service-account-load-balancer-tokenrjsmv
Events:              <none>

wr0ngway on 2 Nov 2020

Note also the mounts in the failing pod:

      AWS_WEB_IDENTITY_TOKEN_FILE:  /var/run/secrets/eks.amazonaws.com/serviceaccount/token
    Mounts:
      /tmp/k8s-webhook-server/serving-certs from cert (ro)
      /var/run/secrets/eks.amazonaws.com/serviceaccount from aws-iam-token (ro)

which seems correct, but I don't know enough to say why the controller is still trying to open /var/run/secrets/kubernetes.io/serviceaccount/token

wr0ngway on 2 Nov 2020

@wr0ngway
The "/var/run/secrets/kubernetes.io/serviceaccount/token" is for access to Kubernetes API instead of AWS IAM(so it's not related to iam for service accounts).

Are you using some customized eks cluster or kops cluster or have some webhook removed the volume source?

M00nF1sh on 2 Nov 2020

I have this from my running setup -

      AWS_WEB_IDENTITY_TOKEN_FILE:  /var/run/secrets/eks.amazonaws.com/serviceaccount/token
    Mounts:
      /tmp/k8s-webhook-server/serving-certs from cert (ro)
      /var/run/secrets/eks.amazonaws.com/serviceaccount from aws-iam-token (ro)
      /var/run/secrets/kubernetes.io/serviceaccount from aws-load-balancer-controller-token-pjt5m (ro)

As @M00nF1sh mentioned, the token to access K8s API server is not mounted on the controller pod.

kishorj on 2 Nov 2020

Nothing custom that I know of - Just a basic aws fargate eks cluster that I setup via terraform. I could have messed up the config somewhere, but I was able to setup kubernetes-dashboard on there (also via helm), and that seems to be working. For the lb controller, just following the docs and installing via helm

wr0ngway on 2 Nov 2020

As @M00nF1sh mentioned, the token to access K8s API server is not mounted on the controller pod.

What controls this behavior? Maybe I can trigger it manually, or at least dig around till I find where my setup differs for this.

wr0ngway on 2 Nov 2020

From what I gather from :

https://kubernetes.io/docs/tasks/administer-cluster/access-cluster-api/#directly-accessing-the-rest-api-1

Since I'm assuming a pod can't be associated with multiple SAs, the ServiceAccount would also need to have permissions setup for access k8s api server...? I created the service account, and told the helm install to use that instead of creating its own, so maybe I need to add more permissions to the SA I created? Any idea where to look for that?

wr0ngway on 2 Nov 2020