Amazon-vpc-cni-k8s: Cannot run metrics helper with IAM roles mapped to k8s service accounts

Created on 18 Oct 2019  ยท  14Comments  ยท  Source: aws/amazon-vpc-cni-k8s

I've setup my EKS cluster to block pod access to the EC2 metadata endpoint and instead obtain IAM policies via roles mapped to service accounts (via OpenID Connect).

Turns out that the cni metrics helper wants to reach that endpoint. Since I'm blocking pod access to the EC2 metadata with (calico) network policies, I've allowed that one pod (metrics helper) to reach the endpoint. What happens next is that, since the pod can reach the EC2 metadata endpoint, it assumes the worker role instead of the role I created for it.

I'm stuck in between the setup with IAM roles mapped to k8s service accounts and running the metrics helper. Is there a way to have both?

bug good first issue help wanted

Most helpful comment

Yes, I got it working. Instead of the manifest on that same documentation (step 3 in _For all other Kubernetes versions_) I used this other one that pulls in version v1.5.4.

I've made a terraform module that is able to upgrade the CNI plugin that is installed by default in EKS, and another one that sets up the IAM side of things.

All 14 comments

@miguelaferreira Were you actually able to get the IAM roles via OIDC working? The documentation here (https://docs.aws.amazon.com/eks/latest/userguide/iam-roles-for-service-accounts-minimum-sdk.html) says that the minimum Go SDK version is 1.23.13, but I believe amazon-vpc-cni-k8s is using 1.21.7 (https://github.com/aws/amazon-vpc-cni-k8s/blob/master/go.mod)

Yes, I got it working. Instead of the manifest on that same documentation (step 3 in _For all other Kubernetes versions_) I used this other one that pulls in version v1.5.4.

I've made a terraform module that is able to upgrade the CNI plugin that is installed by default in EKS, and another one that sets up the IAM side of things.

@miguelaferreira There is an issue with ip rules going missing in v1.5.4 (#641), please try the v1.5.5 release candidate instead.

I've tried that version @mogren but I still get the same output.

With network policy blocking access to the EC2 metadata endpoint pod complaints it needs that access:

....
โ”‚ E1029 09:59:22.901290       1 cni-metrics-helper.go:99] Failed to create publisher: publisher: unable to obtain EC2 service client: EC2MetadataRequestError: failed to get  โ”‚
โ”‚ EC2 instance identity document                                                                                                                                              โ”‚
โ”‚ caused by: RequestError: send request failed                                                                                                                                โ”‚
โ”‚ caused by: Get http://169.254.169.254/latest/dynamic/instance-identity/document: net/http: request canceled while waiting for connection (Client.Timeout exceeded while awa โ”‚
โ”‚ iting headers)             

Without the network policy blocking access to the EC2 metadata endpoint pod assumes the role of the worker node and then complaints because it does not have access to cloudwatch:

...
โ”‚ E1029 10:06:08.331326       1 publisher.go:173] Unable to publish CloudWatch metrics: AccessDenied: User: arn:aws:sts::111111111111:assumed-role/cluster-worker โ”‚
โ”‚ 123050886700000005/i-04eXXXXXXX22 is not authorized to perform: cloudwatch:PutMetricData  

@miguelaferreira Oh, did you add that permission though? It's not available in the managed CNI policy by default. See https://docs.aws.amazon.com/eks/latest/userguide/cni-metrics-helper.html#install-metrics-helper for details

@mogren I'm not sure what permission you are referring to. But if that's the policy to allow the pod to call cloudwatch:PutMetricData, then yes I have put that policy in a role that I assign to the SA that runs the pod (according to instructions here https://docs.aws.amazon.com/eks/latest/userguide/iam-roles-for-service-accounts.html).

Using the role for the SA I have to block the access to the EC2 metadata, otherwise the pod assumes the role of the worker node (arn:aws:sts::111111111111:assumed-role/cluster-worker-123050886700000005/i-04eXXXXXXX22) which is not allowed to call cloudwatch:PutMetricData. However, when I block the access to the EC2 metadata (and the pod assumes the correct role that is allowed to call cloudwatch:PutMetricData) then the pod complains about not being able to reach the EC2 metadata endpoint.

Does that clarify the problem?

Ah, thanks @miguelaferreira for the explanation. This requires some more work from our side.

@mogren is there any progress towards supporting running the metrics helper with IAM roles mapped to k8s service accounts?

@miguelaferreira Sorry, not yet, but thanks for pinging me about it. Similar changes should be done to the ipamd pod (aws-node) as well.

@mogren I was checking back on this issue when I re-read your comment. I'm not sure what needs to change in the ipamd pod but I can confirm it works perfectly with IAM roles mapped to k8s service accounts. I have the metadata endpoint blocked on my cluster and the ipamd pods are using the role I assign to them.

# extract of aws-node pod manifest
   containers:
    - env:
      - name: AWS_VPC_K8S_CNI_LOGLEVEL
        value: DEBUG
      - name: MY_NODE_NAME
        valueFrom:
          fieldRef:
            apiVersion: v1
            fieldPath: spec.nodeName
      - name: AWS_ROLE_ARN
        value: arn:aws:iam::1234567890:role/kube-system-aws-node
      - name: AWS_WEB_IDENTITY_TOKEN_FILE
        value: /var/run/secrets/eks.amazonaws.com/serviceaccount/token

@mogren I was checking back on this issue when I re-read your comment. I'm not sure what needs to change in the ipamd pod but I can confirm it works perfectly with IAM roles mapped to k8s service accounts. I have the metadata endpoint blocked on my cluster and the ipamd pods are using the role I assign to them.

# extract of aws-node pod manifest
   containers:
    - env:
      - name: AWS_VPC_K8S_CNI_LOGLEVEL
        value: DEBUG
      - name: MY_NODE_NAME
        valueFrom:
          fieldRef:
            apiVersion: v1
            fieldPath: spec.nodeName
      - name: AWS_ROLE_ARN
        value: arn:aws:iam::1234567890:role/kube-system-aws-node
      - name: AWS_WEB_IDENTITY_TOKEN_FILE
        value: /var/run/secrets/eks.amazonaws.com/serviceaccount/token

@miguelaferreira Have you applied the same above changes (for AWS_ROLE_ARN and AWS_WEB_IDENTITY_TOKEN_FILE) to the CNI metrics helper Deployment.template.spec?

@jaypipes I'm not sure I understand what you are asking. But the way I have been doing this is to annotate a service account and then the pod spec gets extended with these extra env vars. I have done this consistently with several deployments in my cluster.

@jaypipes I'm not sure I understand what you are asking. But the way I have been doing this is to annotate a service account and then the pod spec gets extended with these extra env vars. I have done this consistently with several deployments in my cluster.

@miguelaferreira yes, sorry for being unclear.

@mogren I believe I have found the source of this problem.

Note that the CNI metrics helper instantiates the AWS SDK Session object differently than ipamd.

Here is the CNI metrics helper instantiating its Publisher's session:

https://github.com/aws/amazon-vpc-cni-k8s/blob/71538acafa41548b1121aa532c50cce571b02e9d/pkg/publisher/publisher.go#L92-L108

and here is where the Metrics client ends up instantiating its session:

https://github.com/aws/amazon-vpc-cni-k8s/blob/71538acafa41548b1121aa532c50cce571b02e9d/pkg/ec2wrapper/ec2wrapper.go#L32-L40

Note that in the latter case, we call GetInstanceIdentityDocument(), which is defined here:

https://github.com/aws/aws-sdk-go/blob/e80315117c6955364974702b89f67d6f0a7247e3/aws/ec2metadata/api.go#L103

which queries IMDS for the instance-identity/document path.

I think something to do with GetInstanceIdentityDocument() and the different between the publisher and metrics client is the source of the issue here.

/cc @micahhausler

As @jaypipes mentioned we need to look at the IPAMD and metrics helper code to understand how the session is setup. That should clarify why the behavior is different.

Was this page helpful?
0 / 5 - 0 ratings