I'm probably missing something obvious, but seems like the controller pod is trying to access some other secret instead of the IAM token it is configured to be using for its ServiceAccount...?
Installed through helm:
kubectl apply -k "github.com/aws/eks-charts/stable/aws-load-balancer-controller//crds?ref=master"
helm upgrade -i aws-load-balancer-controller eks/aws-load-balancer-controller --set clusterName=ctdemo-development-services --set serviceAccount.create=false --set serviceAccount.name=ctdemo-development-eks-service-account-load-balancer --set vpcId=vpc-... --set region=us-east-1 -n kube-system
Pod errors out with:
{"level":"info","ts":1604336648.555419,"msg":"version","GitVersion":"v2.0.0","GitCommit":"1028fa4f363a9a8e37b07ff6a093b7b422923512","BuildDate":"2020-10-21T22:17:18+0000"}
{"level":"error","ts":1604336648.5561078,"logger":"setup","msg":"unable to build REST config","error":"open /var/run/secrets/kubernetes.io/serviceaccount/token: no such file or directory"}
To test IAM SA, I deployed an nginx pod in kube-system namespace using the same service account as above (ctdemo-development-eks-service-account-load-balancer), then within a kubectl exec to that pod I am able to assume the role:
root@nginx:/# aws sts assume-role-with-web-identity --role-arn $AWS_ROLE_ARN --role-session-name anyname --web-identity-token file://$AWS_WEB_IDENTITY_TOKEN_FILE --duration-seconds 1000
{
"Credentials": {
"AccessKeyId": "...",
"SecretAccessKey": "...",
"SessionToken": "...",
"Expiration": "2020-11-02T17:28:44Z"
},
"SubjectFromWebIdentityToken": "system:serviceaccount:kube-system:ctdemo-development-eks-service-account-load-balancer",
"AssumedRoleUser": {
"AssumedRoleId": "...:anyname",
"Arn": "arn:aws:sts::...:assumed-role/ctdemo-development-eks-service-account-load-balancer/anyname"
},
"Provider": "arn:aws:iam::...:oidc-provider/oidc.eks.us-east-1.amazonaws.com/id/...",
"Audience": "sts.amazonaws.com"
}
Note that I'm using a Fargate EKS cluster.
$ kubectl -n kube-system describe deployment.apps/aws-load-balancer-controller
Name: aws-load-balancer-controller
Namespace: kube-system
CreationTimestamp: Mon, 02 Nov 2020 11:56:27 -0500
Labels: app.kubernetes.io/instance=aws-load-balancer-controller
app.kubernetes.io/managed-by=Helm
app.kubernetes.io/name=aws-load-balancer-controller
app.kubernetes.io/version=v2.0.0
helm.sh/chart=aws-load-balancer-controller-1.0.4
Annotations: deployment.kubernetes.io/revision: 1
meta.helm.sh/release-name: aws-load-balancer-controller
meta.helm.sh/release-namespace: kube-system
Selector: app.kubernetes.io/instance=aws-load-balancer-controller,app.kubernetes.io/name=aws-load-balancer-controller
Replicas: 1 desired | 1 updated | 1 total | 0 available | 1 unavailable
StrategyType: RollingUpdate
MinReadySeconds: 0
RollingUpdateStrategy: 25% max unavailable, 25% max surge
Pod Template:
Labels: app.kubernetes.io/instance=aws-load-balancer-controller
app.kubernetes.io/name=aws-load-balancer-controller
Annotations: prometheus.io/port: 8080
prometheus.io/scrape: true
Service Account: ctdemo-development-eks-service-account-load-balancer
Containers:
aws-load-balancer-controller:
Image: 602401143452.dkr.ecr.us-west-2.amazonaws.com/amazon/aws-load-balancer-controller:v2.0.0
Ports: 9443/TCP, 8080/TCP
Host Ports: 0/TCP, 0/TCP
Command:
/controller
Args:
--cluster-name=ctdemo-development-services
--ingress-class=alb
--aws-region=us-east-1
--aws-vpc-id=vpc-...
Liveness: http-get http://:61779/healthz delay=30s timeout=10s period=10s #success=1 #failure=2
Environment: <none>
Mounts:
/tmp/k8s-webhook-server/serving-certs from cert (ro)
Volumes:
cert:
Type: Secret (a volume populated by a Secret)
SecretName: aws-load-balancer-tls
Optional: false
Conditions:
Type Status Reason
---- ------ ------
Progressing True NewReplicaSetAvailable
Available False MinimumReplicasUnavailable
OldReplicaSets: <none>
NewReplicaSet: aws-load-balancer-controller-77b6f47888 (1/1 replicas created)
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal ScalingReplicaSet 40m deployment-controller Scaled up replica set aws-load-balancer-controller-77b6f47888 to 1
$ kubectl -n kube-system describe pod aws-load-balancer-controller-77b6f47888-ks6w4
Name: aws-load-balancer-controller-77b6f47888-ks6w4
Namespace: kube-system
Priority: 2000001000
Priority Class Name: system-node-critical
Node: fargate-ip-10-10-2-29.ec2.internal/10.10.2.29
Start Time: Mon, 02 Nov 2020 11:57:39 -0500
Labels: app.kubernetes.io/instance=aws-load-balancer-controller
app.kubernetes.io/name=aws-load-balancer-controller
eks.amazonaws.com/fargate-profile=ctdemo-development-services
pod-template-hash=77b6f47888
Annotations: kubernetes.io/psp: eks.privileged
prometheus.io/port: 8080
prometheus.io/scrape: true
Status: Running
IP: 10.10.2.29
IPs:
IP: 10.10.2.29
Controlled By: ReplicaSet/aws-load-balancer-controller-77b6f47888
Containers:
aws-load-balancer-controller:
Container ID: containerd://dc7f44ee0adf7d2f5e37f7827f169c4a9f9bb910e7f0bea0f087cc11d1fefd29
Image: 602401143452.dkr.ecr.us-west-2.amazonaws.com/amazon/aws-load-balancer-controller:v2.0.0
Image ID: 602401143452.dkr.ecr.us-west-2.amazonaws.com/amazon/aws-load-balancer-controller@sha256:1a9e08d2766785e9e6320bcdf2298f9f4fbb134ed7da4ff27aec9d7f176232de
Ports: 9443/TCP, 8080/TCP
Host Ports: 0/TCP, 0/TCP
Command:
/controller
Args:
--cluster-name=ctdemo-development-services
--ingress-class=alb
--aws-region=us-east-1
--aws-vpc-id=vpc-...
State: Waiting
Reason: CrashLoopBackOff
Last State: Terminated
Reason: Error
Exit Code: 1
Started: Mon, 02 Nov 2020 12:34:49 -0500
Finished: Mon, 02 Nov 2020 12:34:49 -0500
Ready: False
Restart Count: 12
Liveness: http-get http://:61779/healthz delay=30s timeout=10s period=10s #success=1 #failure=2
Environment:
AWS_DEFAULT_REGION: us-east-1
AWS_REGION: us-east-1
AWS_ROLE_ARN: arn:aws:iam::...:role/ctdemo-development-eks-service-account-load-balancer
AWS_WEB_IDENTITY_TOKEN_FILE: /var/run/secrets/eks.amazonaws.com/serviceaccount/token
Mounts:
/tmp/k8s-webhook-server/serving-certs from cert (ro)
/var/run/secrets/eks.amazonaws.com/serviceaccount from aws-iam-token (ro)
Conditions:
Type Status
Initialized True
Ready False
ContainersReady False
PodScheduled True
Volumes:
aws-iam-token:
Type: Projected (a volume that contains injected data from multiple sources)
TokenExpirationSeconds: 86400
cert:
Type: Secret (a volume populated by a Secret)
SecretName: aws-load-balancer-tls
Optional: false
QoS Class: BestEffort
Node-Selectors: <none>
Tolerations: node.kubernetes.io/not-ready:NoExecute op=Exists for 300s
node.kubernetes.io/unreachable:NoExecute op=Exists for 300s
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Scheduled 42m fargate-scheduler Successfully assigned kube-system/aws-load-balancer-controller-77b6f47888-ks6w4 to fargate-ip-10-10-2-29.ec2.internal
Normal Pulling 42m kubelet Pulling image "602401143452.dkr.ecr.us-west-2.amazonaws.com/amazon/aws-load-balancer-controller:v2.0.0"
Normal Pulled 41m kubelet Successfully pulled image "602401143452.dkr.ecr.us-west-2.amazonaws.com/amazon/aws-load-balancer-controller:v2.0.0"
Normal Started 40m (x4 over 41m) kubelet Started container aws-load-balancer-controller
Normal Created 39m (x5 over 41m) kubelet Created container aws-load-balancer-controller
Normal Pulled 39m (x4 over 41m) kubelet Container image "602401143452.dkr.ecr.us-west-2.amazonaws.com/amazon/aws-load-balancer-controller:v2.0.0" already present on machine
Warning BackOff 117s (x191 over 41m) kubelet Back-off restarting failed container
Can you verify if the serviceaccount exists in the kube-system namespace?
kubectl describe serviceaccounts -n kube-system ctdemo-development-eks-service-account-load-balancer
It does. Note that the nginx test pod using that account is also running in the kube-system namespace, and I have no problem using the aws cli to assume the role when running with that service account.
$ kubectl describe serviceaccounts -n kube-system ctdemo-development-eks-service-account-load-balancer
Name: ctdemo-development-eks-service-account-load-balancer
Namespace: kube-system
Labels: <none>
Annotations: eks.amazonaws.com/role-arn: arn:aws:iam::...:role/ctdemo-development-eks-service-account-load-balancer
Image pull secrets: <none>
Mountable secrets: ctdemo-development-eks-service-account-load-balancer-tokenrjsmv
Tokens: ctdemo-development-eks-service-account-load-balancer-tokenrjsmv
Events: <none>
Note also the mounts in the failing pod:
AWS_WEB_IDENTITY_TOKEN_FILE: /var/run/secrets/eks.amazonaws.com/serviceaccount/token
Mounts:
/tmp/k8s-webhook-server/serving-certs from cert (ro)
/var/run/secrets/eks.amazonaws.com/serviceaccount from aws-iam-token (ro)
which seems correct, but I don't know enough to say why the controller is still trying to open /var/run/secrets/kubernetes.io/serviceaccount/token
@wr0ngway
The "/var/run/secrets/kubernetes.io/serviceaccount/token" is for access to Kubernetes API instead of AWS IAM(so it's not related to iam for service accounts).
Are you using some customized eks cluster or kops cluster or have some webhook removed the volume source?
I have this from my running setup -
AWS_WEB_IDENTITY_TOKEN_FILE: /var/run/secrets/eks.amazonaws.com/serviceaccount/token
Mounts:
/tmp/k8s-webhook-server/serving-certs from cert (ro)
/var/run/secrets/eks.amazonaws.com/serviceaccount from aws-iam-token (ro)
/var/run/secrets/kubernetes.io/serviceaccount from aws-load-balancer-controller-token-pjt5m (ro)
As @M00nF1sh mentioned, the token to access K8s API server is not mounted on the controller pod.
Nothing custom that I know of - Just a basic aws fargate eks cluster that I setup via terraform. I could have messed up the config somewhere, but I was able to setup kubernetes-dashboard on there (also via helm), and that seems to be working. For the lb controller, just following the docs and installing via helm
As @M00nF1sh mentioned, the token to access K8s API server is not mounted on the controller pod.
What controls this behavior? Maybe I can trigger it manually, or at least dig around till I find where my setup differs for this.
From what I gather from :
https://kubernetes.io/docs/tasks/administer-cluster/access-cluster-api/#directly-accessing-the-rest-api-1
Since I'm assuming a pod can't be associated with multiple SAs, the ServiceAccount would also need to have permissions setup for access k8s api server...? I created the service account, and told the helm install to use that instead of creating its own, so maybe I need to add more permissions to the SA I created? Any idea where to look for that?
Ok, thanks for the help - your responses helped me to figure it out. I was creating the serviceaccount in terraform. It has an attribute automount_service_account_token which is set to false by default. After setting it to true, the api server token is now mounted, and the pod starts successfully. Closing this issue now, but it may be something you should mention in the docs for others avoiding the use of eksctl :)
Thanks for sharing your issue. This experience will be useful for other terraform users as well. I agree, something we need to document as well.
Most helpful comment
Ok, thanks for the help - your responses helped me to figure it out. I was creating the serviceaccount in terraform. It has an attribute automount_service_account_token which is set to false by default. After setting it to true, the api server token is now mounted, and the pod starts successfully. Closing this issue now, but it may be something you should mention in the docs for others avoiding the use of eksctl :)