Why do you want this feature?
We use AWS organizations service control policies together with managed IAM permissions boundary policies for team accounts. This allows us to:
In this model a role may be created _only_ if a permissions boundary policy ARN is provided at the same time. Because eksctl doesn't do this, it fails to create the service and instance roles, which then aborts the cluster and node group creation.
What feature/behavior/change do you want?
Please provide a way to specify:
(Having distinct parameters lets each role be constrained differently.)
Please support these parameters both as CLI arguments and within the configuration file, with the CLI argument having precedence.
@gatesjd this is very interesting, thanks for suggesting this feature. Could you please illustrate it a little more e.g. with a permission boundary spec that one can use for the EKS service role?
@errordeveloper, I can see how a sample spec would be useful for consumers of the requested feature, and possibly for testing it. I don't have that yet, but I'll try to illustrate the problem a little more.
Permissions boundary policies are all custom by design. For the EKS service role, I can see different organizations taking radically different approaches. At one extreme you might see a minimal permissions boundary, which implicitly trusts that eksctl itself follows the _principle of least privilege_ when it defines the role. At the other extreme, you might see a fully constrained policy that fits the service role like a glove--and which therefore must be updated whenever the service role needs new permissions). I expect most organizations using permissions boundaries to fall somewhere in between.
The "Delegating Responsibility to Others Using Permissions Boundaries" section of the Permissions Boundaries for IAM Entities page provides a pretty good illustration. The DelegatedUserBoundary policy is a simple example that allows user creation to be delegated, while ensuring that new users have the XCompanyBoundaries policy. (See particularly Task 4 where user creation fails unless the boundary policy is assigned.)
Our organization's delegated permissions management works exactly the same, except that we enforce the boundary policy assignment when creating roles as well as users. And like the example, role creation fails unless the boundary policy is assigned.
Once we're able to pass in the permissions boundary policy ARNs, I can start with a minimally constrained policy for each role, then tighten them down after everything is working.
@errordeveloper, what more information do you require?
See also this comment: https://github.com/aws/containers-roadmap/issues/456#issuecomment-530343470.
I may be able to provide an example of a policy that is not too restrictive, but restrictive enough to cause a problem maybe
For example my organization's permission boundary restricts things like: creating instances outside of a selected list of regions; creating instances over a certain limit size; creating or modifying any VPC resources (as the VPC for our network has already been designed by pros, and it's not expected to modify it, but to just pick an existing network based on your security posture and requirements)
Common permission boundaries probably also include, preventing from granting permission to perform modification of any role boundaries; creating an IAM role that does not at least attach the role boundary of its parent.
I will see if I can export the specific boundary that we are using, if it will help to have as concrete example, but those are some sensible defaults that you can imagine using in a Role Permissions Boundary.
I'm afraid I still haven't communicated the real need. Sensible defaults for permissions boundary policies aren't the point. Eksctl would never be in the position to define one, outside of its own unit tests.
Organizations can and do have service control policies in place which will not allow any role to be created unless it is constrained by an org-defined boundary policy. Period.
Without a way to pass a permissions boundary policy ARN to eksctl, which it then supplies to the roles it creates, role creation is aborted.
There's no need for eksctl to take responsibility for what constitutes a reasonable permissions boundary policy. So long as eksctl is conservative in the permissions it requires (as it seems to be), and lets the customer supply boundary policies when it creates roles, its part is done.
This is something that is AWS-specific. Is there another cloud provider that has a similar model in general availability that is similar to this Delegated Boundary role permission?
Whether there is or there isn't something like that for any other cloud provider, or whether this is an AWS-only issue, this is something I need in order to be able to use eks and eksphemeral. It has been around for a while, and seems to be something that was already supported in Terraform back in the 0.11 series.
This is bar none the one single thing that I trip over most frequently in any Kubernetes or other (Serverless or what have you) quickstart guides, since my AWS "Lab" environment enforces the delegated boundary role on me, it is the only way that I can create roles, to have attached a role boundary. So any guide or utility that tries to abstract away roles creation and the complexity of it, should provide a way to attach a boundary, (but most simply don't).
It's not just a problem for my lab or ephemeral clusters, though. The lab is designed this way to mirror our production environment, where the same policy is enforced. So we would have this same issue if we tried promoting an EKS environment into prod that was managed through eksctl.
This issue report still has the "awaiting more information" label, and I'm assuming that's because we still haven't provided a specific example of a policy... here's a gist with an example policy.
https://gist.github.com/kingdonb/cceb6e5f0db4ae7980fd1ab130e2c72e
This policy document is, I believe, the one we have in use in my lab environment as described.
For me, I would just ask for a way to attach this single boundary to any roles that are created by EKSctl initialization (I think this boundary actually specifies that all roles must attach the same boundary), but for others whose policies differ I could see the value in providing the separate boundaries for the service role and instance role, as it was described in the opening issue report.
Similar to what @kingdonb has mentioned, this is critical for using in environments where you want to allow users to create roles/policies but not get outside a "box" that blocks certain activity. It really shouldn't matter what is in the permission boundary (that will vary from organization to organization), but the idea is that it needs to be able to attach one.
@errordeveloper, I believe you can remove the "awaiting more information" label, yes?
You know what would work just as well for me, and probably less effort to implement, is if I could supply my own roles to eksctl by ARN.
I learned that this RolePermissionBoundary scheme is actually something that is only in place in our lab. What's far more likely to happen in the prod VPC and account is, my InfoSec group just isn't going to grant me permission to create a role at all – they're just going to ask me to provide a role spec, and they're going to read it and scrutinize every line asking "why do you need that" until they are satisfied that we are observing the POLP and only asking for what we need.
Then after a long and involved process, they're going to come back with something that hopefully provides enough permissions for EKSctl to do its job, and I will need to pass it in.
I wasn't able to find a way that is documented to do that, either. Fortunately they have provided this Permission Boundary scheme so we don't have to go through all of that for the lab environment (I can create my own roles without review, as long as I attach the boundary and agree to observe the other lab rules, like "no PII and no access to any production data".)
Do you think that one or the other use case is easier to build or more important to support?
@gatesjd @kingdonb can you create your own roles, with permission boundaries attached, and supply them to eksctl in the config file, instead of getting eksctl to create them?
iam:
serviceRoleARN: "arn:aws:iam::11111:role/eks-base-service-role"
nodeGroups:
- name: ng-1
instanceType: m5.large
desiredCapacity: 3
iam:
instanceProfileARN: "arn:aws:iam::11111:instance-profile/eks-nodes-base-role"
instanceRoleARN: "arn:aws:iam::1111:role/eks-nodes-base-role"
...
Is there currently any role that eksctl creates that you can’t already supply your own instead using the config file?
I'm not aware of any role eksctl creates that we couldn't produce manually.
Assuming the service role only ever needs the 2 policies listed in the AWS docs, we could create it once and use the same role for every cluster.
Since the instance role varies depending on the add-on policies, creating multiple clusters (or just nodegroups) using different combinations of add-ons would mean (_should_ mean) creating different instance role variants. Still doable, it just has more potential for duplication and more room for error.
Depending on how #956 produces exported templates, another workaround might be possible:
eksctl export-templates to get the role creation templates.eksctl config file.eksctl normally to create and update the cluster.Eksctl has been fantastic to make working with EKS clusters easy. Thank you for providing such a quality tool.
Our organization's move to delegated permissions management has on balance made many things simpler. Unfortunately, this is one that became more complex.
I looked before and I couldn't find the place in eksctl source where roles are created, and I did not realize the interplay that addons could have, but that makes perfect sense. I would love to have the roles created by eksctl just this once with a boundary attached, because it seems a bit daunting to extract this information from the addons and/or docs myself; but if I only need to supply my own instanceProfileARN and instanceRoleARN, I am certain that will work.
InfoSec prefers to develop our roles iteratively, and they will thank me if I can avoid a proliferation of roles that are created by each new application or EKS instance, so there are fewer roles and role policies to review when that comes time! I just found #508 which is also apparently my issue, and I will try to implement this now, thanks very much for the example config!
I got hung up here:
$ eksctl create cluster --config-file=eksctl-config.yaml
[ℹ] eksctl version 0.8.0
[ℹ] using region us-east-2
[ℹ] setting availability zones to [us-east-2a us-east-2c us-east-2b]
[ℹ] subnets for us-east-2a - public:192.168.0.0/19 private:192.168.96.0/19
[ℹ] subnets for us-east-2c - public:192.168.32.0/19 private:192.168.128.0/19
[ℹ] subnets for us-east-2b - public:192.168.64.0/19 private:192.168.160.0/19
[ℹ] nodegroup "ng-1" will use "ami-082bb518441d3954c" [AmazonLinux2/1.14]
[ℹ] using Kubernetes version 1.14
[ℹ] creating EKS cluster "test-cluster-c-1" in "us-east-2" region
[ℹ] 1 nodegroup (ng-1) was included (based on the include/exclude rules)
[ℹ] will create a CloudFormation stack for cluster itself and 1 nodegroup stack(s)
[ℹ] if you encounter any issues, check CloudFormation console or try 'eksctl utils describe-stacks --region=us-east-2 --cluster=test-cluster-c-1'
[ℹ] CloudWatch logging will not be enabled for cluster "test-cluster-c-1" in "us-east-2"
[ℹ] you can enable it with 'eksctl utils update-cluster-logging --region=us-east-2 --cluster=test-cluster-c-1'
[ℹ] Kubernetes API endpoint access will use default of {publicAccess=true, privateAccess=false} for cluster "test-cluster-c-1" in "us-east-2"
[ℹ] 2 sequential tasks: { create cluster control plane "test-cluster-c-1", create nodegroup "ng-1" }
[ℹ] building cluster stack "eksctl-test-cluster-c-1-cluster"
[ℹ] deploying stack "eksctl-test-cluster-c-1-cluster"
[✖] unexpected status "ROLLBACK_IN_PROGRESS" while waiting for CloudFormation stack "eksctl-test-cluster-c-1-cluster"
[ℹ] fetching stack events in attempt to troubleshoot the root cause of the failure
[!] AWS::EC2::InternetGateway/InternetGateway: DELETE_IN_PROGRESS
[✖] AWS::EC2::InternetGateway/InternetGateway: CREATE_FAILED – "Resource creation cancelled"
[✖] AWS::EC2::EIP/NATIP: CREATE_FAILED – "The maximum number of addresses has been reached. (Service: AmazonEC2; Status Code: 400; Error Code: AddressLimitExceeded; Request ID: f950e047-18e7-47e5-9ded-cdd9b4bc11d7)"
[✖] AWS::EC2::VPC/VPC: CREATE_FAILED – "API: ec2:CreateVpc You are not authorized to perform this operation."
[ℹ] 1 error(s) occurred and cluster hasn't been created properly, you may wish to check CloudFormation console
[ℹ] to cleanup resources, run 'eksctl delete cluster --region=us-east-2 --name=test-cluster-c-1'
[✖] waiting for CloudFormation stack "eksctl-test-cluster-c-1-cluster" to reach "CREATE_COMPLETE" status: ResourceNotReady: failed waiting for successful resource state
[✖] failed to create cluster "test-cluster-c-1"
I've tried to create the cluster using the following config,
apiVersion: eksctl.io/v1alpha5
kind: ClusterConfig
metadata:
name: test-cluster-c-1
region: us-east-2
vpc:
subnets:
Private:
us-east-1a:
id: "subnet-0f9cb0b6f426b0ffd"
cidr: "172.22.147.0/25"
eu-north-1b:
id: "subnet-0fd58f23ebc443029"
cidr: "172.22.147.128/25"
iam:
serviceRoleARN: "arn:aws:iam::209773529123:role/cluster-api-role"
nodeGroups:
- name: ng-1
instanceType: m5.large
desiredCapacity: 3
iam:
instanceProfileARN: "arn:aws:iam::209773529123:role/cluster-api-role"
instanceRoleARN: "arn:aws:iam::209773529123:role/cluster-api-role"
I'm not sure that I have selected the right subnet here, but that doesn't seem to be the issue, it seems to be trying to create and destroy InternetGateway which has led me to #365 and https://github.com/weaveworks/eksctl/issues/204#issuecomment-450450945, I will let you know if I'm able to figure this out. I think creating and deleting InternetGateways is on the list of things we're not able to do.
The issue I'm having seems to be documented pretty thoroughly in those comments, and I see some 🎉 replies so I'm optimistic, wish me luck I guess!
By default eksctl will create a new VPC and/or NAT for the cluster. Looks like your user does not have rights for that. You can disable it.
https://eksctl.io/usage/vpc-networking/#nat-gateway
Also the instance profile ARN needs to be and instance profile, not a role.
@paulbes I see you are working on this. This is very exciting, as I was about to start creating my own roles outside of eksctl. I would much prefer to have eksctl create the roles and allow me to specify the permissions boundary to use when creating them.
I can build off your fork and test it out if your work is complete.
It should be fairly complete, it is missing some test coverage, the names might change, and perhaps some of the options need to be made available as CLI commands.
I have tested it manually towards our AWS accounts, which have permissions boundaries set, and it worked. You can take a look at examples/17-permissions-boundary.yaml to see how you could configure this yourself.
Resolved with #1638. Thanks @paulbes and @martina-if!
This is perfect, thanks @paulbes!
Most helpful comment
I'm afraid I still haven't communicated the real need. Sensible defaults for permissions boundary policies aren't the point. Eksctl would never be in the position to define one, outside of its own unit tests.
Organizations can and do have service control policies in place which will not allow any role to be created unless it is constrained by an org-defined boundary policy. Period.
Without a way to pass a permissions boundary policy ARN to eksctl, which it then supplies to the roles it creates, role creation is aborted.
There's no need for eksctl to take responsibility for what constitutes a reasonable permissions boundary policy. So long as eksctl is conservative in the permissions it requires (as it seems to be), and lets the customer supply boundary policies when it creates roles, its part is done.