Aws-load-balancer-controller: Stickiness not working with ALB

Created on 14 Mar 2019 · 14Comments · Source: kubernetes-sigs/aws-load-balancer-controller

Hi folks,
I have a two node EKS cluster setup where I have one single instance of my application running on each node. Everything works great, my ingress DNS can access the pod on each EC2 instance. The target group has each instance showing available and healthy. The problem is when I enabled stickiness. I do receive the AWSALB cookie however I'm still jumping between each EC2 instance on subsequent requests. We added a blurb to our theme jsps to show the pod name being accessed. Stickiness is applied at the node instance so I don't understand how I can be randomly bouncing between two EC2s listed in the target group?

My ALB values in our ingress:
alb.ingress.kubernetes.io/scheme: internet-facing
alb.ingress.kubernetes.io/target-group-attributes: stickness.enabled=true,stinkyness.ib_cookie.duration_seconds=1200
alb.ingress.kubernetes.io/target-type: instance

I was using the ALB image v1 and I updated to v1.1.2 and the behavior is still the same. My service port for my application is done via NodePort as per the 2048 example. Again, everything works like a champ but enabling stickiness, while providing the cookie, seemingly does nothing to keep the client going to the same worker node.

lifecyclrotten

Source

georgefridrich

Most helpful comment

I hope this may be useful for others, it was resolved by using this in the ALB:

alb.ingress.kubernetes.io/target-type: ip

Now I'm a little confused why instance wouldn't work as expected so I'll leave this open for now and see we get a comment on that. Cheers!

georgefridrich on 14 Mar 2019

👍6 🎉4

All 14 comments

I hope this may be useful for others, it was resolved by using this in the ALB:

alb.ingress.kubernetes.io/target-type: ip

Now I'm a little confused why instance wouldn't work as expected so I'll leave this open for now and see we get a comment on that. Cheers!

georgefridrich on 14 Mar 2019

👍6 🎉4

Hi,
The behavior you observed is due to the nodePort on each workerNode will/might proxy traffic into every pods in your cluster(same ec2 or different ec2).

M00nF1sh on 15 Mar 2019

Thanks M00nF1sh however I'm still scratching my head with how is that ok when you have stickiness enabled and target set to instance. Shouldn't that send the client to that instance (EC2 worker node) each time if the AWSALB cookie is valid? IP is sending to Pod, which is very cool, that works great and it a valid way to use it but instance should stick to instance. Or at a minimum if target is set to instance then stickyness should not be allowed to be configured in the ingress since it basically does nothing in EKS. Beyond that it provided a cookie that has no use at that point either.

georgefridrich on 15 Mar 2019

@georgefridrich Take a look here and work your way up. I agree with @M00nF1sh , in instance mode the Kubernetes Service is published in NodePort mode so then the ALB's target group can talk to the service.

Since the discerning happens at the Service level there is no mechanism I know of so far that would allow a service to choose the same Pod for actual stickiness to happen. Remember, the routing is happening with Containers, not nodes.

https://v1-11.docs.kubernetes.io/docs/concepts/services-networking/service/#the-gory-details-of-virtual-ips

@M00nF1sh The follow up question is, how does stickiness work with AWS ELBs? because those use NodePorts as well.

ecout on 26 Mar 2019

Thanks M00nF1sh however I'm still scratching my head with how is that ok when you have stickiness enabled and target set to instance. Shouldn't that send the client to that instance (EC2 worker node) each time if the AWSALB cookie is valid? IP is sending to Pod, which is very cool, that works great and it a valid way to use it but instance should stick to instance. Or at a minimum if target is set to instance then stickyness should not be allowed to be configured in the ingress since it basically does nothing in EKS. Beyond that it provided a cookie that has no use at that point either.

Check this link out, don't know enough about the ALB ingress controller at this point but a Natting issue in NodePort mode is a plausible cause. The solution offered below then is to use, as described in the kubernetes docs, the HA proxy that handles IP persistence across hops:
https://nishadikirielle.blogspot.com/2016/03/load-balancing-kubernetes-services-and.html

For the record in the Kubernetes world they use the term Session Affinity, not stickiness. That's how I was able to find it.

ecout on 29 Mar 2019

You could try setting the service's externalTrafficPolicy to Local. This would force requests that reach a node to only get routed to pods on that node. There are side effects regarding instance health in the load balancer and distribution of traffic between pods though.

https://kubernetes.io/docs/reference/generated/kubernetes-api/v1.10/#servicespec-v1-core

https://kubernetes.io/docs/tasks/access-application-cluster/create-external-load-balancer/#preserving-the-client-source-ip

rifelpet on 29 Mar 2019

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle stale

fejta-bot on 27 Jun 2019

Stale issues rot after 30d of inactivity.
Mark the issue as fresh with /remove-lifecycle rotten.
Rotten issues close after an additional 30d of inactivity.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle rotten

fejta-bot on 27 Jul 2019

Rotten issues close after 30d of inactivity.
Reopen the issue with /reopen.
Mark the issue as fresh with /remove-lifecycle rotten.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/close

fejta-bot on 26 Aug 2019

@fejta-bot: Closing this issue.

In response to this:

Rotten issues close after 30d of inactivity.
Reopen the issue with /reopen.
Mark the issue as fresh with /remove-lifecycle rotten.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

k8s-ci-robot on 26 Aug 2019

/reopen

wdalmut on 15 Apr 2020

@wdalmut: You can't reopen an issue/PR unless you authored it or you are a collaborator.

In response to this:

/reopen

k8s-ci-robot on 15 Apr 2020

According to the documentation here, it should be:

alb.ingress.kubernetes.io/target-group-attributes: stickiness.enabled=true,stickiness.lb_cookie.duration_seconds=1200

Instead of:
alb.ingress.kubernetes.io/target-group-attributes: stickness.enabled=true,stinkyness.ib_cookie.duration_seconds=1200

eduardomourar on 17 Apr 2020

👍4

I hope this may be useful for others, it was resolved by using this in the ALB:

alb.ingress.kubernetes.io/target-type: ip

Now I'm a little confused why instance wouldn't work as expected so I'll leave this open for now and see we get a comment on that. Cheers!

Hi @georgefridrich , have you get a solution to support instance type? thanks,