While destroying the cluster I get the following error:
Error: Error applying plan:
2 error(s) occurred:
* module.eks.aws_iam_service_linked_role.elasticloadbalancing (destroy): 1 error(s) occurred:
* aws_iam_service_linked_role.elasticloadbalancing: Error waiting for role (arn:aws:iam::<my-account>:role/aws-service-role/elasticloadbalancing.amazonaws.com/AWSServiceRoleForElasticLoadBalancing) to be deleted: unexpected state 'FAILED', wanted target 'SUCCEEDED'. last error: %!s(<nil>)
* module.vpc.aws_vpc.this (destroy): 1 error(s) occurred:
* aws_vpc.this: DependencyViolation: The vpc 'vpc-0df03bd251f8445a6' has dependencies and cannot be deleted.
status code: 400, request id: 91485bb5-a679-4617-beb6-f4bb3112cdf4
Set create_elb_service_linked_role to true:
module "eks" {
source = "terraform-aws-modules/eks/aws"
version = "1.6.0"
...
# Create an ELB service role to be able to deploy load balancer
create_elb_service_linked_role = true
...
}
module.eks.aws_iam_service_linked_role.elasticloadbalancing can be destroyed/deleted without
any problems.
Can you run aws iam get-service-linked-role-deletion-status and related commands to see why it didn't delete? I think you need to get the deletion-task-id
I am not seeing a way to get the deletion-task-id from the Terraform error.
What I tried:
$ aws iam delete-service-linked-role --role-name AWSServiceRoleForElasticLoadBalancing
aws iam delete-service-linked-role --role-name AWSServiceRoleForElasticLoadBalancing
{
"DeletionTaskId": "task/aws-service-role/elasticloadbalancing.amazonaws.com/AWSServiceRoleForElasticLoadBalancing/f6bed8e0-5acb-4db1-ac4f-eb09098074af"
}
and afterwards:
$ aws iam get-service-linked-role-deletion-status --deletion-task-id "task/aws-service-role/elasticloadbalancing.amazonaws.com/AWSServiceRoleForElasticLoadBalancing/f6bed8e0-5acb-4db1-ac4f-eb09098074af"
{
"Status": "SUCCEEDED"
}
then I run into another issue:
$ terraform destroy
Error: Error applying plan:
1 error(s) occurred:
* module.vpc.aws_vpc.this (destroy): 1 error(s) occurred:
* aws_vpc.this: DependencyViolation: The vpc 'vpc-0d957d0b6f14f7a4c' has dependencies and cannot be deleted.
status code: 400, request id: 2b0fc64f-958e-4627-945a-6f49d0881652
to fix this I went into the UI and deleted it from there and run terraform destroy again. The last two action fixed it:
$ terraform destroy
...
Destroy complete! Resources: 0 destroyed.
...
Is there something else I can try?
hmmm, I don't really know how it's supposed to work but there seems to be many problems with this resource so maybe we just remove it from the module and people can sort it outside the module?
@max-rocket-internet I believe what happens is if your EKS cluster is up, and you strip it down without terminating all the services on the EKS cluster, the ELB tied to those services doesn't delete and therefore the role will not delete because the ELB is still up with 0/0 instances but using that role.
Once i manually go delete those ELB and run my terraform destroy again, it then delete's the role.
@max-rocket-internet the reason why the deletion fails is because the "AWSServiceRoleForElasticLoadBalancing" is still in use by a load balancer. However, the load balancer was deleted shortly before the cluster was teared down. Have a look at the attached screenshot.

I am not sure if there is a waiting period after deleting the load balancer. If there is then maybe the role should be deleted in the very last step.
Maybe it would also be possible to detach the role from the EKS cluster. Just in case somebody did not delete the Load Balancers before destroying EKS cluster.
I'm open to suggestions.
Should we just remove this resource from the module?
@max-rocket-internet lets keep this issue open. Maybe somebody else reports this issue.
In addition, I am not sure if this is not a bug in AWS. Basically, something which has nothing to do with this module and Terraform. Keep in mind the AWS bug theory is just a wild speculation of mine.
All good, makes sense 馃憤
I gather this boils down to a resource ownership problem: this module and terraform, while having helped create the role, doesn't own it. Kubernetes owns the resource and would need to destroy it before terraform can safely spin the rest of the components down. I don't think we have a good way to control deletion of k8s owned resources at destroy time... that feels out of scope. This sort of problem highlights the problems of mixing two declarative systems wanting to own the world. It's difficult to manage if they don't play nice together.
I would most favor the approach of terraform creating the role through its native k8s provider and in that way, it'd own that state but I don't think that's compatible or been achieved just yet. Something to keep in mind when we arrive.
@brandoconnor I agree with your statement, but I deleted the resources attached to this policy. Meaning the request of deleting it should go through without any problems.
I run into an issue where I am unable to create the cluster because the AWSServiceRoleForElasticLoadBalancing service linked role is in use by an ECS cluster and therefore already created. I am unable to delete the AWSServiceRoleForElasticLoadBalancing role because it is in use. As a result I am stuck.
I don't think it is a good idea to have a service linked role. I would prefer a normal role attached to the worker or cluster which can create LoadBalancers. The service linked role seems to be a limitation. In addition, I would prefer if it would use a different name such as "cluster-name-loadbalancer-role". A different name would allow to isolate the role from other AWS managed resources.
@Jeeppler
I don't think it is a good idea to have a service linked role. I would prefer a normal role attached to the worker or cluster which can create LoadBalancers.
This is not how it works. By default the cluster runs with arn:aws:iam::aws:policy/AmazonEKSClusterPolicy anyway and already has this access in that policy.
I'm still not sure what is the correct way solving this. I'll have a look this week if I have time.
@max-rocket-internet thanks for the clarification. However, if this is not how it works and the AmazonEKSClusterPolicy already grants permissions to create load balancers, then why do I even have to set create_elb_service_linked_role = true?
@Jeeppler
then why do I even have to set
create_elb_service_linked_role = true?
No idea 馃槀It changed a few months back. I think it was an AWS IAM change, not a TF thing. I think we need to move the resource outside of the module and the user needs to manage this themselves. But I'll check with AWS or the documentation and find something more definitive.
OK I think we aren't meant to create the Service Linked Role at all.
The default EKS cluster policy arn:aws:iam::aws:policy/AmazonEKSClusterPolicy now includes:
{
"Effect": "Allow",
"Action": "iam:CreateServiceLinkedRole",
"Resource": "*",
"Condition": {
"StringLike": {
"iam:AWSServiceName": "elasticloadbalancing.amazonaws.com"
}
}
}
So perhaps we hit this issue before AWS had this in their policy? And we solved it by adding the service linked role manually but really the EKS service itself is supposed to create it.
Here's the original issue for reference: https://github.com/terraform-aws-modules/terraform-aws-eks/issues/87
So I think we remove it now. Does that make sense? For people who already have it, they can either leave it as is or delete all ELBs, then delete the service linked role and then have EKS recreate the ELBs and its own service linked role.
@max-rocket-internet thanks for researching this. This seems like a good idea. Is there any code change necessary in this Terraform EKS module? Or does it mean we can just use create_elb_service_linked_role = false to create a cluster and everything should work?
we can just use create_elb_service_linked_role = false to create a cluster and everything should work?
Correct.
And I think we should completely remove this resource and that parameter from the module.
@max-rocket-internet that seems like a good idea.
@max-rocket-internet this role gives an ELB an ability to work with EC2 instances (for example, create security group for node in case of you use service with type: LoadBalancer). Do I understand correct?
@okgolove
I don't think it's quite like that. The arn:aws:iam::aws:policy/AmazonEKSClusterPolicy role already has access to create SGs and ELBs so k8s can do this without the service linked role. TBH it's not entirely clear how AWS uses the service linked role in our situation. From the docs:
A service-linked role is a unique type of IAM role that is linked directly to an AWS service. Service-linked roles are predefined by the service and include all the permissions that the service requires to call other AWS services on your behalf. The linked service also defines how you create, modify, and delete a service-linked role. A service might automatically create or delete the role. It might allow you to create, modify, or delete the role as part of a wizard or process in the service. Or it might require that you use IAM to create or delete the role. Regardless of the method, service-linked roles make setting up a service easier because you don鈥檛 have to manually add the necessary permissions for the service to complete actions on your behalf.
So I can't really say how the EKS service uses it or not. But the original error was not authorized to perform: iam:CreateServiceLinkedRole and now the EKS service can create it's own service linked role so I don't think we need to create it in this module.
PR: https://github.com/terraform-aws-modules/terraform-aws-eks/pull/160
Ok, in this case it looks like we don't need to use this linked role. Thank you for the feedback!
PR is now merged so I'll close this issue.
Most helpful comment
@Jeeppler
No idea 馃槀It changed a few months back. I think it was an AWS IAM change, not a TF thing. I think we need to move the resource outside of the module and the user needs to manage this themselves. But I'll check with AWS or the documentation and find something more definitive.