Please note that I am not reporting an immediate exploitable vulnerability, but rather, a change request to implement a best practice and reduce permission scope to that of least privilege.
Currently, EC2 Instances that comprise EKS managed nodegroups are expected to have an IAM Policy (AmazonEKS_CNI_Policy) assigned to them via their EC2 Instance Role that allows them broad access to manage Elastic Network Interfaces (ENIs). The requirement is discussed in our documentation.
This Instance Role and associated Policy were once required to allow the CNI controller pods, which are deployed as a DaemonSet on each worker node, to manage ENI associations with the Instances comprising the worker nodes. However, supplying each Instance these permissions implicitly provided the same permissions to all containers running on the node, whether those containers actually needed them or not. (While this privilege could be restricted from containers by denying them access to the EC2 Metadata Endpoint, this action would also deny them access to other information or credentials they might legitimately require.)
With the availability of IAM Roles for Service Accounts (IRSA) this requirement is no longer needed. Instead, the CNI DaemonSet pods should be associated with a ServiceAccount that is tied to an IAM Role having permission to manipulate the ENIs.
Once that is done, the worker node role will no longer need the ability to manage ENIs, thereby improving the overall security posture for all customers.
We do provide instructions on our docs on how to accomplish this task. We'll investigate enabling this by default
https://docs.aws.amazon.com/eks/latest/userguide/iam-roles-for-service-accounts-cni-walkthrough.html
Another note, next gen CNI #398 will remove the need for the long running daemon on every node that requires broad VPC permissions
@mikestef9 ,
following the steps in https://docs.aws.amazon.com/eks/latest/userguide/iam-roles-for-service-accounts-cni-walkthrough.html is cumbersome, because it requires us to create a new role for every worker/node group to be started. So I need to attach an unnecessary (see below) policy, just to remove it right after startup of the worker nodes. Moreover this doesn't work well with tools like terraform.
This is how the AmazonEKS_CNI_Policy becomes unnecessary on the worker nodes:
Once the node group is created, do a:
kubectl describe pods -n kube-system aws-node-xyzabc | grep AWS_ROLE_ARN
It will show:
AWS_ROLE_ARN: arn:aws:iam::XXX:role/eks-cni-plugin
As you can see, the AmazonEKS_CNI_Policy is not even needed in the worker node role. Moreover, considering the steps above we think that AmazonEKS_CNI_Policy is displaced on the worker nodes, since conceptually it belongs to the "aws-node" daemonset which is by design part of the control plane setup.
So essentially, step 1 to 4 could be part of the EKS managed service control plane setup, it just requires one more role arn/name as an input parameter (the role carrying the AmazonEKS_CNI_Policy)
There might also be an easy migration path for this: if the additional role arn/name is given as input provision the control plane like above (and remove the enforcement of the CNI policy on the node role), else keep enforcement for the CNI policy on the node role.
What do you think?
You are absolutely correct that AmazonEKS_CNI_Policy is not required on the worker node roles if you are using IAM Roles for service accounts on the CNI. By default, the CNI does not use IRSA so the policy is required to prevent the common case where the user uses a role that is missing the policy which would not allow the worker nodes to become ready.
There is also a bit of a chicken/egg problem. To create the IRSA role for the CNI, you need the IAM OIDC identity provider. To create that you need the cluster to have been created. It is a very priviledged operation to create the roles and OIDC identity provider so it is problematic for EKS to create them for the user in many cases.
We are evaluating on how to make the situation more streamlined.
@brycecarman -- That 5 steps plan (see above) was specifically meant to show what needs to be done in which order to resolve the chicken/egg situation, meaning how everything can be set up before the node groups come into existence.
This will be solved by #252
Most helpful comment
You are absolutely correct that AmazonEKS_CNI_Policy is not required on the worker node roles if you are using IAM Roles for service accounts on the CNI. By default, the CNI does not use IRSA so the policy is required to prevent the common case where the user uses a role that is missing the policy which would not allow the worker nodes to become ready.
There is also a bit of a chicken/egg problem. To create the IRSA role for the CNI, you need the IAM OIDC identity provider. To create that you need the cluster to have been created. It is a very priviledged operation to create the roles and OIDC identity provider so it is problematic for EKS to create them for the user in many cases.
We are evaluating on how to make the situation more streamlined.