Ingress-nginx: enabling Session affinity goes to a single pod only

Created on 7 Sep 2018 · 21Comments · Source: kubernetes/ingress-nginx

Is this a BUG REPORT or FEATURE REQUEST? (choose one): BUG

NGINX Ingress controller version:

0.18.0 and 0.19.0

Kubernetes version (use kubectl version):

1.10.7

Environment:

Cloud provider or hardware configuration: GKE
OS (e.g. from /etc/os-release): COS
Install tools:
Others:

Using an external GCP TCP load balancer (L4) as the ingress IP.

What happened:
With session affinity enabled, traffic goes to a single pod only.

What you expected to happen:
Multiple requests (e.g. with curl -vk ..) should get sent to a different backend.

How to reproduce it (as minimally and precisely as possible):
Working on a simpler repro...

Anything else we need to know:

service works internally. Curl on the internal service name will get spread out to both nodes
both endpoints are alive / responsive
The cookie is getting set just fine. Each curl request results in a new cookie
The nginx config looks like it finds both backend endpoints
disabling affinity results in the ingress results in traffic being spread out over both nodes.

The configuration output is below. The service in question is "openam"

[  
   {  
      "name":"prod-openam-80",
      "service":{  
         "metadata":{  
            "creationTimestamp":null
         },
         "spec":{  
            "ports":[  
               {  
                  "name":"openam",
                  "protocol":"TCP",
                  "port":80,
                  "targetPort":8080
               }
            ],
            "selector":{  
               "app":"openam",
               "release":"openam-prod"
            },
            "clusterIP":"10.0.28.178",
            "type":"ClusterIP",
            "sessionAffinity":"ClientIP",
            "sessionAffinityConfig":{  
               "clientIP":{  
                  "timeoutSeconds":10800
               }
            }
         },
         "status":{  
            "loadBalancer":{  

            }
         }
      },
      "port":80,
      "secure":false,
      "secureCACert":{  
         "secret":"",
         "caFilename":"",
         "pemSha":""
      },
      "sslPassthrough":false,
      "endpoints":[  
         {  
            "address":"10.4.0.19",
            "port":"8080",
            "maxFails":0,
            "failTimeout":0
         },
         {  
            "address":"10.4.3.22",
            "port":"8080",
            "maxFails":0,
            "failTimeout":0
         }
      ],
      "sessionAffinityConfig":{  
         "name":"cookie",
         "cookieSessionAffinity":{  
            "name":"INGRESSCOOKIE",
            "hash":"md5",
            "locations":{  
               "openam.prod.frk8s.net":[  
                  "/openam"
               ]
            }
         }
      }
   },
   {  
      "name":"prod-openidm-80",
      "service":{  
         "metadata":{  
            "creationTimestamp":null
         },
         "spec":{  
            "ports":[  
               {  
                  "name":"openidm",
                  "protocol":"TCP",
                  "port":80,
                  "targetPort":8080,
                  "nodePort":30606
               }
            ],
            "selector":{  
               "app":"openidm",
               "release":"openidm-prod"
            },
            "clusterIP":"10.0.24.160",
            "type":"NodePort",
            "sessionAffinity":"None",
            "externalTrafficPolicy":"Cluster"
         },
         "status":{  
            "loadBalancer":{  

            }
         }
      },
      "port":80,
      "secure":false,
      "secureCACert":{  
         "secret":"",
         "caFilename":"",
         "pemSha":""
      },
      "sslPassthrough":false,
      "endpoints":[  
         {  
            "address":"10.4.2.18",
            "port":"8080",
            "maxFails":0,
            "failTimeout":0
         },
         {  
            "address":"10.4.1.14",
            "port":"8080",
            "maxFails":0,
            "failTimeout":0
         }
      ],
      "sessionAffinityConfig":{  
         "name":"cookie",
         "cookieSessionAffinity":{  
            "name":"route",
            "hash":"sha1",
            "locations":{  
               "openidm.prod.frk8s.net":[  
                  "/"
               ]
            }
         }
      }
   },
   {  
      "name":"upstream-default-backend",
      "service":{  
         "metadata":{  
            "creationTimestamp":null
         },
         "spec":{  
            "ports":[  
               {  
                  "name":"http",
                  "protocol":"TCP",
                  "port":80,
                  "targetPort":"http"
               }
            ],
            "selector":{  
               "app":"nginx-ingress",
               "component":"default-backend",
               "release":"nginx"
            },
            "clusterIP":"10.0.22.133",
            "type":"ClusterIP",
            "sessionAffinity":"None"
         },
         "status":{  
            "loadBalancer":{  

            }
         }
      },
      "port":0,
      "secure":false,
      "secureCACert":{  
         "secret":"",
         "caFilename":"",
         "pemSha":""
      },
      "sslPassthrough":false,
      "endpoints":[  
         {  
            "address":"10.4.2.4",
            "port":"8080",
            "maxFails":0,
            "failTimeout":0
         }
      ],
      "sessionAffinityConfig":{  
         "name":"",
         "cookieSessionAffinity":{  
            "name":"",
            "hash":""
         }
      }
   }
]

Source

wstrange

👍1

Most helpful comment

We have the same problem with a 3 pod deployment and http loadbalancing. Sometimes (not always) one pod does not recieve any http traffic. The traffic is instead sent to one of the remaining pods.
We recieve 3 different INGRESSCOOKIEs, but two of them proxy to the same pod. Even if no cookie is set on the request the pod in question is not recieving any requests.

We assume this problem is the same here and is introduced by the dynamic-configuration of backends in https://github.com/kubernetes/ingress-nginx/releases/tag/nginx-0.18.0.
Disabling this feature by adding --enable-dynamic-configuration=false to the args-section solved this issue for us.

Of course this is just a workaround and we would like to solve the underlying issue with the lua balancer.

svenbs on 24 Sep 2018

👍2

All 21 comments

Given that we are using a GCP L4 TCP load balancer, is it possible that the hashing algorithm is usng the IP of the GCP load balancer, instead of the client? Would this explain why it always goes to the same pod?

wstrange on 10 Sep 2018

For reference, this is how we are installing the helm chart:

helm install --namespace nginx --name nginx  \
    --set rbac.create=true \
    --set controller.service.loadBalancerIP=$IP \
    --set controller.publishService.enabled=true \
    --set controller.stats.enabled=true \
    --set controller.service.externalTrafficPolicy=Local \
    --set controller.service.type=LoadBalancer \
    stable/nginx-ingress

If we set the image version back to < 0.18.0, we get load balanced requests.

wstrange on 10 Sep 2018

@wstrange can you post a minimal Ingress manifest to reproduce this?

ElvinEfendi on 11 Sep 2018

I'm trying to replicate this. It looks like it does not happen with http. Something to do with https / ssl. I'll keep testing.

Update: Can't replicate yet with a simple test headers app, even on SSL. Sigh..

wstrange on 12 Sep 2018

We've also encountered exactly the same symptoms on 2 seperate occations last two weeks where all our load goes to a single pod when using session affinity. We have not experienced this in versions prior to 0.18.0 from what I remember.

Is there some changes done to how session affinity is handled in later versions? I cant seem to see anything about it in the release notes.

We are currently also unable to reproduce this so it's hard to find the root cause of it.
Any ideas what might be the issue here?

StaffanSvensson-playtech on 21 Sep 2018

We can not replicate this with a simple echo headers application, but see it in a more complex deployment of our Java application.

What is the logic used to calculate the backend pod to steer the session to? This might help us to narrrow down how this happens.

wstrange on 21 Sep 2018

Of course this is just a workaround and we would like to solve the underlying issue with the lua balancer.

svenbs on 24 Sep 2018

👍2

We are experiencing the same problem (on 0.19.0 and 0.20.0).

Thank you @svenbs for suggesting disabling dynamic configuration. Using --enable-dynamic-configuration=false gives us the desired results.

We did however experience an issue that the generated cookie is different in the domain with dynamic configuration enabled. With dynamic configuration the domain of the cookie is .somedomain.com, while without dynamic configuration the domain of the cookie is somedomain.com. Having these two cookie results in non-desired behavior See Image.

floriantraber on 22 Oct 2018

Given that others are seeing this issue, and it seems to be hard to reproduce with a simple echoheaders sample, is there any way that more debug / diagnostic information can be logged to show how the dynamic configuration module arrives at decisions on pod backends.

I am guessing that there is some timing issue. i.e. some pod is ready before others, or briefly reports not live, etc.

wstrange on 22 Oct 2018

anyone having this issue please try

quay.io/kubernetes-ingress-controller/nginx-ingress-controller:dev

ElvinEfendi on 30 Oct 2018

/close

ElvinEfendi on 30 Oct 2018

@ElvinEfendi: Closing this issue.

In response to this:

/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

k8s-ci-robot on 30 Oct 2018

@ElvinEfendi To which tag will this fix be deployed?
--enable-dynamic-configuration=false fixed the problem for us using the 0.18 but I'm afraid that upgrading it to a newer version without the fix would break the deployment again

nelsonfassis on 9 Nov 2018

@ElvinEfendi To which tag will this fix be deployed?

@nelsonfassis it will be included in 0.21.0

ElvinEfendi on 14 Nov 2018

👍1

We are still seeing this issue on 0.21.0

Anyone else?

wstrange on 6 Dec 2018

Yes, we have the same issue on 0.22.0 as well.

StaffanSvensson-playtech on 21 Jan 2019

We're still seeing this on 0.21.0.

svenbs on 24 Jan 2019

Me too, I see same issue in 0.22.0
When I created ingress, everything is ok. But after a few minutes (doing nothing), the nginx start to route all request to same pod.

qi-min on 28 Feb 2019

Can you try the latest version quay.io/kubernetes-ingress-controller/nginx-ingress-controller:0.23.0 and see whether this still is an issue.

ElvinEfendi on 28 Feb 2019

@ElvinEfendi After updating to 0.23.0, I do not see this problem for now. Very appreciate for your suggestion.

qi-min on 1 Mar 2019

@m7luffy you are welcome! In that case most likely the bug was related to https://github.com/kubernetes/ingress-nginx/pull/3809#issuecomment-467036731 that got fixed in 0.23.0.

ElvinEfendi on 1 Mar 2019

Was this page helpful?

0 / 5 - 0 ratings