Aws-load-balancer-controller: When nodes are cordoned they are not removed from the target group

Created on 11 Feb 2020 · 13Comments · Source: kubernetes-sigs/aws-load-balancer-controller

I am running eks 1.14 and deployments with the ingress controller seem to work ok but when I drain a node or cordon a node it stays in the target group meaning its still routable via the ALB.

This has the effect that when running a load test and then draining a node I see 504
errors when traffic is sent to the node and it is in the process of being terminated.

Is this the desired behaviour or a bug?

kinbug

Source

paulalex

All 13 comments

Hi, this is not desired behavior, we'll fix this in next release.
currently we relies on EC2's healthy status of nodes.
/kind bug

M00nF1sh on 12 Feb 2020

Check out https://github.com/keikoproj/lifecycle-manager
This removed 504s for us by adding the exclude-balancer label and initiating target deregistration when a termination hook is received.

eytan-avisror on 22 Feb 2020

Is there a rough ETA on the release of v1.1.6? We believe we are also hitting this and if it's in the region of a few weeks we may just wait, but if it's a month plus we will look more deeply at workarounds.

gshutler on 25 Feb 2020

Just wondering how will this be implemented, will the controller reconcile on node events (such as drain)? What happens if the draining node is member of hundreds of target groups? How will clients know until when they should wait post drain? What about terminations that occurs from ASG? There seems to be an inherent race with this approach (what is faster, termination or deregistrstion)

eytan-avisror on 25 Feb 2020

@gshutler the ETA of new release is before Mar 08. (it's a task on my sprint board this week).
@eytan-avisror Yes, the controller will reconcile on node events and check node's status. And it's unavoidable to update all target groups.
I didn't get the question of "How will clients know until when they should wait post drain".
If the termination occurs from ASG, it will guarantee a broken connection.
There is also an unavoidable case for long-live sessions(Websocket), where ALB -> nodeA -> pod(on nodeB), and we cordon nodeA. (there is no way for the pod to initiate proper protocol-specific graceful shutdown).

Let's get https://github.com/kubernetes-sigs/aws-alb-ingress-controller/pull/955 merged to favor IP mode, which don't have these issues.

M00nF1sh on 26 Feb 2020

👍1

Thanks @M00nF1sh
To clarify the question - if a node is getting drained and now starts deregistering from target groups, how can you / whoever is issuing the drain/termination assure it has completed deregistration so that it can safely proceed to terminate?
Imagine an account with 300+ target groups, getting throttled, etc.
So if a node is getting drained / terminated, you only have so much time until it goes down.

eytan-avisror on 26 Feb 2020

Great, thanks @M00nF1sh

gshutler on 26 Feb 2020

@eytan-avisror
There is no good way to solve this problem as far as i know. one way I can imaging is the controller feedback draining info back to the node via labels/finalizers. But i don't think it's needed once we use the IP mode.
Any ideas?

M00nF1sh on 27 Feb 2020

Definitely not needed with ip mode, however moving to it is not a trivial change and has other implications, I feel that there should be no errors in both modes if we can help it.
We use lifecycle hooks to solve this problem for instance mode (see my first comment about using lifecycle-manager), adding a label or publishing an event when work is done will be extremely useful (especially label which is more guaranteed), in this case we could simply use lifecycle hooks to wait for that label instead of hitting AWS APIs so hard from both controllers

eytan-avisror on 27 Feb 2020

@eytan-avisror
Yeah, if there is a lot target groups, a fixed wait time don't work for instances.
I think this can be added in V2, where we are plan to periodically check draining status to decided whether remove securityGroups rules into pods/nodes.
I'm wondering whether we can add a finalizer when nodes are added as backend, and remove the finalizer when it's finished draining from targetGroup. (It can be finalizers for each targetGroup or a single finalizer for all targetGroup).

M00nF1sh on 27 Feb 2020

@M00nF1sh
You mean the finalizer will be on the node object? It sounds like the right thing to do, but that might be problematic if it means node objects cannot be deleted until they are deregistered, no? What if controller is down and no one is around to remove the finalizer? Nodes will be stuck in deleting if they are terminated? Not sure how cluster will behave with node finalizer during edge cases, but should definitely check.

eytan-avisror on 27 Feb 2020

finalizer on the node also won't help if a termination is already issued, the "finalizer" is needed on the EC2 instance to be fully safe, and in a sense this is what lifecycle-hooks accomplish since you can keep the instance alive until you are ready for it to terminate.
I think if you simply add a label that says instance is deregistered from all target groups, lifecycle-hooks will be very easy to use in order to have 0 errors on scale-downs, whether they are user initiated or ASG initiated (including AZRebalance).

eytan-avisror on 27 Feb 2020

@M00nF1sh any updates on when the next release can roughly be expected?