Aws-load-balancer-controller: ALB ingress controller is looking for the instances in an unkonwn VPC to place the target groups

Created on 11 Mar 2019  路  30Comments  路  Source: kubernetes-sigs/aws-load-balancer-controller

After deploying ingress resource to the ingress controller, below error is getting looped in the logs, I searched for the VPC in my account but there is no such VPC at all.

Error adding targets to target group arn:aws:elasticloadbalancing:ap-southeast-1:*:targetgroup/a6106773-73ee60071b3e98f408a/d9a7**: InvalidTarget: The following targets are not in the target group VPC 'vpc-0dc2960bdb7e9': 'i-0e62a05e40c2', 'i-0e3e11eb9294', 'i-0b*718cdd87091

lifecyclrotten

Most helpful comment

I just encountered this bug.

alb-ingress-controller attempts to use an existing target group if your ingress spec doesn't change, even if your VPC does.

For example, if you were to build your entire stack in CloudFormation, then delete it, then recreate it, alb-ingress-controller uses the target groups from your first VPC.

Maybe implementing something to cleanup target groups when deleting your cluster/vpc/ingress or having alb-ingress-controller check your configured VPC ID before using a bad target group?

All 30 comments

Hi,
Do you mean the vpc-0dc2960bdb7e9 is not your cluster's VPC?
By default, the alb ingress controller infers the VPC of cluster by accessing ec2metadata(from the controller pod). Did you running any sidecar that hijacked the ec2metadata call(such as kube2iam)?
If so, you can manually specify the vpcID and via --aws-vpc-id=YourVPCID and --aws-region=YourClusterRegion.

I actually have the same issue when deploying a chart with an ALB ingress using Helm. I notice that it happens if I try to deploy my chart on a freshly created EKS cluster immediately after bringing the cluster up and installing the ALB ingress controller. If I wait like 10 minutes after installing the ALB ingress controller and then install my chart the ingress is created successfully. I noticed that the error message contains the correct instance IDs value but it has the wrong VPC value. The value that it's using for the VPC does not exist in any region in my AWS account, and through testing this several times on freshly created EKS clusters the value of the VPC actually changes. I have no idea where it's getting those VPC values from. I don't have any other add-ons or sidecars that could be responsible for this. I just have these running:

kube-system alb-ingress-controller-55fdf469dc-wsdn2 1/1 Running 0 1h kube-system aws-node-df7jv 1/1 Running 0 1h kube-system aws-node-npbq6 1/1 Running 0 1h kube-system aws-node-zd7w6 1/1 Running 0 1h kube-system coredns-7d77776957-hn9r9 1/1 Running 0 1h kube-system coredns-7d77776957-ts8vl 1/1 Running 0 1h kube-system kube-proxy-2c8dr 1/1 Running 0 1h kube-system kube-proxy-cc4rc 1/1 Running 0 1h kube-system kube-proxy-vqjf9 1/1 Running 0 1h kube-system tiller-deploy-85744d9bfb-jzwbg 1/1 Running 0 1h

Here is an example of one such error from the ingress controller:

The instance IDs were correct but the VPC was not and this was not a VPC running in ANY region within my account.

E0326 22:41:44.279073 1 targets.go:80] default/ui: status code: 400, request id: 54b2ece6-5018-11e9-b675-db068043e3e4 E0326 22:41:44.279176 1 :0] kubebuilder/controller "msg"="Reconciler error" "error"="failed to reconcile targetGroups due to failed to reconcile targetGroup targets due to InvalidTarget: The following targets are not in the target group VPC 'vpc-04e6ea299853b63eb': 'i-08a5793d40144befe', 'i-031442f34a892b4eb', 'i-0b4c0698d1be8726e'\n\tstatus code: 400, request id: 54b2ece6-5018-11e9-b675-db068043e3e4" "Controller"="alb-ingress-controller" "Request"={"Namespace":"default","Name":"ui"}

I'm also noticing that it fails on the first ingress that I create but if I delete that ingress and then reapply the exact same manifest to recreate the ingress it works the second time.

I'm also noticing that it fails on the first ingress that I create but if I delete that ingress and then reapply the exact same manifest to recreate the ingress it works the second time.

Just encountered the same issue on a newly created EKS cluster.

Versions:
Image version: docker.io/amazon/aws-alb-ingress-controller:v1.0.1
EKS Kubernetes version: 1.12

The load balancer (ALB) was deployed into the correct cluster VPC, but the Target Group was associated with a different VPC not found in any region in the account.

Logs were the same as campee reported. Recreating the ingress controller and redeploying charts that created ingress resources resolved the issue.

Edit: both --aws-vpc-id and --aws-region were specified as arguments in the ingress controller deployment.

@huntermassey
This is super wired...did you used some vpc-peering EKS cluster?
Would you help share your aws accountID and clusterName with me([email protected])? And also the error vpcID for targetGroup(if you still have the logs).

I am also having this issue on a brand new EKS cluster.

E0413 01:11:21.259596 1 :0] kubebuilder/controller "msg"="Reconciler error" "error"="failed to reconcile targetGroups due to failed to reconcile targetGroup targets due to InvalidTarget: The following targets are not in the target group VPC 'vpc-03caf487fb0a0174d': 'i-0f8e658161b8af7e7', 'i-093bf9b038f794fc0', 'i-0914c426325445091'\n

I've flipped through all of my regions in AWS and I have the default AWS VPC and a VPC created by terraform (which is also creating the EKS cluster) and I cannot find VPC ID vpc-03caf487fb0a0174d anywhere.

[edit]
it's probably also work noting that this is an unused AWS account. The only resources in the entire account are the items mentioned here, an S3 bucket and some users/roles/policies.

I just encountered this bug.

alb-ingress-controller attempts to use an existing target group if your ingress spec doesn't change, even if your VPC does.

For example, if you were to build your entire stack in CloudFormation, then delete it, then recreate it, alb-ingress-controller uses the target groups from your first VPC.

Maybe implementing something to cleanup target groups when deleting your cluster/vpc/ingress or having alb-ingress-controller check your configured VPC ID before using a bad target group?

@stevenoctopus you just saved my day from debuging this issue 馃ぃ .....Yeah, this should be what happened.
The unique ID for AWS Resources(LB/TG) are computed with clusterName, namespace, ingressName/svcName which didn't include vpcID.
Users should delete the ingresses before turn down the cluster(the alb ingress controller will then delete the aws resources, we'll use finalizer to ensure aws resources are deleted before the ingress object are removed).

+1 - in my case we had just torn down a cluster+VPC before standing the whole thing up again. Looked at CloudTrail logs, the erroneous VPC for the target group was the one we had deleted.

So for those of us with automation to stand up/tear down clusters & VPCs with CloudFormation etc., we will need to delete all ingress resources prior to deleting the cluster to avoid this issue?

Any plans to factor in VPC ID in unique ID, or something else so this extra step isn't required?

@huntermassey Factor in vpcID in uniqueID didn't solve the root problem but silently bypass it. (you will have unused targetGroups dangling in aws account). Instead, we can validate the vpcID and provides better error message for this case.

I think delete all ingresses for automation is the best way to go. (Once we add support finalizers, delete all namespaces should also work :D)

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle stale

Stale issues rot after 30d of inactivity.
Mark the issue as fresh with /remove-lifecycle rotten.
Rotten issues close after an additional 30d of inactivity.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle rotten

Rotten issues close after 30d of inactivity.
Reopen the issue with /reopen.
Mark the issue as fresh with /remove-lifecycle rotten.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/close

@fejta-bot: Closing this issue.

In response to this:

Rotten issues close after 30d of inactivity.
Reopen the issue with /reopen.
Mark the issue as fresh with /remove-lifecycle rotten.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

Reopening this as we need another solution for those restoring from etcd backup....the controller needs to re-verify the metadata somehow....

@Morriz: You can't reopen an issue/PR unless you authored it or you are a collaborator.

In response to this:

/reopen

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

Can somebody reopen this?

Agree also to reopen this. Any time I deploy an ingress for the first time, I have to delete it and re-deploy it in order to make it work.

I'm still having this problem with 1.1.5 and terraform 0.12.23.

What's the solution from @stevenoctopus's suggestion?

  • How can we have the alb-ingress-controller check for the configred vpc id?
  • How can we automatically clean up stale target groups?

Maybe implementing something to cleanup target groups when deleting your cluster/vpc/ingress or having alb-ingress-controller check your configured VPC ID before using a bad target group?

Yup, this is still an issue, seeing this exact behavior after tearing down a VPC & EKS cluster to upgrade to EKS 1.15.

/reopen

@knight42: Reopened this issue.

In response to this:

/reopen

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

Rotten issues close after 30d of inactivity.
Reopen the issue with /reopen.
Mark the issue as fresh with /remove-lifecycle rotten.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/close

@fejta-bot: Closing this issue.

In response to this:

Rotten issues close after 30d of inactivity.
Reopen the issue with /reopen.
Mark the issue as fresh with /remove-lifecycle rotten.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@Morriz Is your case caused by #889 (comment)?

it most probably is, but there is no fix coming then?

/reopen

Would be great to have an option to update target groups with the autodiscovered VPC ID if it's out of date. Unfortunately deleting a VPC doesn't require removing target groups, so if a cluster gets torn down improperly at any time it causes an issue going forward without manual intervention.

@maracle6: You can't reopen an issue/PR unless you authored it or you are a collaborator.

In response to this:

/reopen

Would be great to have an option to update target groups with the autodiscovered VPC ID if it's out of date. Unfortunately deleting a VPC doesn't require removing target groups, so if a cluster gets torn down improperly at any time it causes an issue going forward without manual intervention.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

Was this page helpful?
0 / 5 - 0 ratings

Related issues

ghostsquad picture ghostsquad  路  4Comments

rdubya16 picture rdubya16  路  4Comments

madhu131313 picture madhu131313  路  3Comments

jwickens picture jwickens  路  4Comments

gigi-at-zymergen picture gigi-at-zymergen  路  5Comments