Ingress-nginx: enabling Session affinity goes to a single pod only

Created on 7 Sep 2018  路  21Comments  路  Source: kubernetes/ingress-nginx

Is this a BUG REPORT or FEATURE REQUEST? (choose one): BUG

NGINX Ingress controller version:

0.18.0 and 0.19.0

Kubernetes version (use kubectl version):

1.10.7

Environment:

  • Cloud provider or hardware configuration: GKE
  • OS (e.g. from /etc/os-release): COS
  • Install tools:
  • Others:

Using an external GCP TCP load balancer (L4) as the ingress IP.

What happened:
With session affinity enabled, traffic goes to a single pod only.

What you expected to happen:
Multiple requests (e.g. with curl -vk ..) should get sent to a different backend.

How to reproduce it (as minimally and precisely as possible):
Working on a simpler repro...

Anything else we need to know:

  • service works internally. Curl on the internal service name will get spread out to both nodes
  • both endpoints are alive / responsive
  • The cookie is getting set just fine. Each curl request results in a new cookie
  • The nginx config looks like it finds both backend endpoints
  • disabling affinity results in the ingress results in traffic being spread out over both nodes.

The configuration output is below. The service in question is "openam"

[  
   {  
      "name":"prod-openam-80",
      "service":{  
         "metadata":{  
            "creationTimestamp":null
         },
         "spec":{  
            "ports":[  
               {  
                  "name":"openam",
                  "protocol":"TCP",
                  "port":80,
                  "targetPort":8080
               }
            ],
            "selector":{  
               "app":"openam",
               "release":"openam-prod"
            },
            "clusterIP":"10.0.28.178",
            "type":"ClusterIP",
            "sessionAffinity":"ClientIP",
            "sessionAffinityConfig":{  
               "clientIP":{  
                  "timeoutSeconds":10800
               }
            }
         },
         "status":{  
            "loadBalancer":{  

            }
         }
      },
      "port":80,
      "secure":false,
      "secureCACert":{  
         "secret":"",
         "caFilename":"",
         "pemSha":""
      },
      "sslPassthrough":false,
      "endpoints":[  
         {  
            "address":"10.4.0.19",
            "port":"8080",
            "maxFails":0,
            "failTimeout":0
         },
         {  
            "address":"10.4.3.22",
            "port":"8080",
            "maxFails":0,
            "failTimeout":0
         }
      ],
      "sessionAffinityConfig":{  
         "name":"cookie",
         "cookieSessionAffinity":{  
            "name":"INGRESSCOOKIE",
            "hash":"md5",
            "locations":{  
               "openam.prod.frk8s.net":[  
                  "/openam"
               ]
            }
         }
      }
   },
   {  
      "name":"prod-openidm-80",
      "service":{  
         "metadata":{  
            "creationTimestamp":null
         },
         "spec":{  
            "ports":[  
               {  
                  "name":"openidm",
                  "protocol":"TCP",
                  "port":80,
                  "targetPort":8080,
                  "nodePort":30606
               }
            ],
            "selector":{  
               "app":"openidm",
               "release":"openidm-prod"
            },
            "clusterIP":"10.0.24.160",
            "type":"NodePort",
            "sessionAffinity":"None",
            "externalTrafficPolicy":"Cluster"
         },
         "status":{  
            "loadBalancer":{  

            }
         }
      },
      "port":80,
      "secure":false,
      "secureCACert":{  
         "secret":"",
         "caFilename":"",
         "pemSha":""
      },
      "sslPassthrough":false,
      "endpoints":[  
         {  
            "address":"10.4.2.18",
            "port":"8080",
            "maxFails":0,
            "failTimeout":0
         },
         {  
            "address":"10.4.1.14",
            "port":"8080",
            "maxFails":0,
            "failTimeout":0
         }
      ],
      "sessionAffinityConfig":{  
         "name":"cookie",
         "cookieSessionAffinity":{  
            "name":"route",
            "hash":"sha1",
            "locations":{  
               "openidm.prod.frk8s.net":[  
                  "/"
               ]
            }
         }
      }
   },
   {  
      "name":"upstream-default-backend",
      "service":{  
         "metadata":{  
            "creationTimestamp":null
         },
         "spec":{  
            "ports":[  
               {  
                  "name":"http",
                  "protocol":"TCP",
                  "port":80,
                  "targetPort":"http"
               }
            ],
            "selector":{  
               "app":"nginx-ingress",
               "component":"default-backend",
               "release":"nginx"
            },
            "clusterIP":"10.0.22.133",
            "type":"ClusterIP",
            "sessionAffinity":"None"
         },
         "status":{  
            "loadBalancer":{  

            }
         }
      },
      "port":0,
      "secure":false,
      "secureCACert":{  
         "secret":"",
         "caFilename":"",
         "pemSha":""
      },
      "sslPassthrough":false,
      "endpoints":[  
         {  
            "address":"10.4.2.4",
            "port":"8080",
            "maxFails":0,
            "failTimeout":0
         }
      ],
      "sessionAffinityConfig":{  
         "name":"",
         "cookieSessionAffinity":{  
            "name":"",
            "hash":""
         }
      }
   }
]

Most helpful comment

We have the same problem with a 3 pod deployment and http loadbalancing. Sometimes (not always) one pod does not recieve any http traffic. The traffic is instead sent to one of the remaining pods.
We recieve 3 different INGRESSCOOKIEs, but two of them proxy to the same pod. Even if no cookie is set on the request the pod in question is not recieving any requests.

We assume this problem is the same here and is introduced by the dynamic-configuration of backends in https://github.com/kubernetes/ingress-nginx/releases/tag/nginx-0.18.0.
Disabling this feature by adding --enable-dynamic-configuration=false to the args-section solved this issue for us.

Of course this is just a workaround and we would like to solve the underlying issue with the lua balancer.

All 21 comments

Given that we are using a GCP L4 TCP load balancer, is it possible that the hashing algorithm is usng the IP of the GCP load balancer, instead of the client? Would this explain why it always goes to the same pod?

For reference, this is how we are installing the helm chart:

helm install --namespace nginx --name nginx  \
    --set rbac.create=true \
    --set controller.service.loadBalancerIP=$IP \
    --set controller.publishService.enabled=true \
    --set controller.stats.enabled=true \
    --set controller.service.externalTrafficPolicy=Local \
    --set controller.service.type=LoadBalancer \
    stable/nginx-ingress

If we set the image version back to < 0.18.0, we get load balanced requests.

@wstrange can you post a minimal Ingress manifest to reproduce this?

I'm trying to replicate this. It looks like it does not happen with http. Something to do with https / ssl. I'll keep testing.

Update: Can't replicate yet with a simple test headers app, even on SSL. Sigh..

We've also encountered exactly the same symptoms on 2 seperate occations last two weeks where all our load goes to a single pod when using session affinity. We have not experienced this in versions prior to 0.18.0 from what I remember.

Is there some changes done to how session affinity is handled in later versions? I cant seem to see anything about it in the release notes.

We are currently also unable to reproduce this so it's hard to find the root cause of it.
Any ideas what might be the issue here?

We can not replicate this with a simple echo headers application, but see it in a more complex deployment of our Java application.

What is the logic used to calculate the backend pod to steer the session to? This might help us to narrrow down how this happens.

We have the same problem with a 3 pod deployment and http loadbalancing. Sometimes (not always) one pod does not recieve any http traffic. The traffic is instead sent to one of the remaining pods.
We recieve 3 different INGRESSCOOKIEs, but two of them proxy to the same pod. Even if no cookie is set on the request the pod in question is not recieving any requests.

We assume this problem is the same here and is introduced by the dynamic-configuration of backends in https://github.com/kubernetes/ingress-nginx/releases/tag/nginx-0.18.0.
Disabling this feature by adding --enable-dynamic-configuration=false to the args-section solved this issue for us.

Of course this is just a workaround and we would like to solve the underlying issue with the lua balancer.

We are experiencing the same problem (on 0.19.0 and 0.20.0).

Thank you @svenbs for suggesting disabling dynamic configuration. Using --enable-dynamic-configuration=false gives us the desired results.

We did however experience an issue that the generated cookie is different in the domain with dynamic configuration enabled. With dynamic configuration the domain of the cookie is .somedomain.com, while without dynamic configuration the domain of the cookie is somedomain.com. Having these two cookie results in non-desired behavior See Image.
image 1

Given that others are seeing this issue, and it seems to be hard to reproduce with a simple echoheaders sample, is there any way that more debug / diagnostic information can be logged to show how the dynamic configuration module arrives at decisions on pod backends.

I am guessing that there is some timing issue. i.e. some pod is ready before others, or briefly reports not live, etc.

anyone having this issue please try

quay.io/kubernetes-ingress-controller/nginx-ingress-controller:dev

/close

@ElvinEfendi: Closing this issue.

In response to this:

/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@ElvinEfendi To which tag will this fix be deployed?
--enable-dynamic-configuration=false fixed the problem for us using the 0.18 but I'm afraid that upgrading it to a newer version without the fix would break the deployment again

@ElvinEfendi To which tag will this fix be deployed?

@nelsonfassis it will be included in 0.21.0

We are still seeing this issue on 0.21.0

Anyone else?

Yes, we have the same issue on 0.22.0 as well.

We're still seeing this on 0.21.0.

Me too, I see same issue in 0.22.0
When I created ingress, everything is ok. But after a few minutes (doing nothing), the nginx start to route all request to same pod.

Can you try the latest version quay.io/kubernetes-ingress-controller/nginx-ingress-controller:0.23.0 and see whether this still is an issue.

@ElvinEfendi After updating to 0.23.0, I do not see this problem for now. Very appreciate for your suggestion.

@m7luffy you are welcome! In that case most likely the bug was related to https://github.com/kubernetes/ingress-nginx/pull/3809#issuecomment-467036731 that got fixed in 0.23.0.

Was this page helpful?
0 / 5 - 0 ratings

Related issues

smeruelo picture smeruelo  路  3Comments

lachlancooper picture lachlancooper  路  3Comments

natemurthy picture natemurthy  路  3Comments

briananstett picture briananstett  路  3Comments

geek876 picture geek876  路  3Comments