Ingress-nginx: gRPC: disproportionate load balancing

Created on 30 Apr 2019 · 3Comments · Source: kubernetes/ingress-nginx

NGINX Ingress controller version: nginx-ingress-controller:0.22.0
Kubernetes version: v1.12.3
Environment:

Cloud provider or hardware configuration: on-prem
OS (e.g. from /etc/os-release): Ubuntu 16.04
Kernel (e.g. uname -a): Linux 4.15.0

What happened:
I have a cluster running 5 pods (replicas) of the same gRPC server deployment and multiple clients (about 80) running outside the cluster. The clients connect to the backend pods through an nginx-ingress configured with the GRPC annotation. Occasionally I will observe that one or more pods receive a disproportionate number of connections:

grpc-unbalanced

The reader will notice that between 12:30 and 14:30 one pod was handling nearly 80% of all the incoming connections! Sometimes this may last for just an hour (I have a configuration snippet with grpc_read_timeout 3600s; set), sometimes this may last for several hours.

What you expected to happen:
I would expect connections to be roughly uniformly balanced across each pod, for example:

grpc-somewhat-balanced

How to reproduce it (as minimally and precisely as possible):
It is unclear how to reproduce this other than just running gRPC servers with both unary and streaming handlers across multiple pods reachable via a ClusterIP service type exposed through nginx-ingress (using default round-robin load balancer) with DNS endpoint, and observing this behavior over several hours.

Source

natemurthy

Most helpful comment

@aledbf I have confirmed that your recommendation works as desired. You can see the changes applied at around 10:02 on the below. Closing this out. Thank you for the support!

success

natemurthy on 7 May 2019

👍2

All 3 comments

@natemurthy please update to 0.24.1 and disable reuse-port in the configuration configmap

aledbf on 1 May 2019

👍2

Can you point to a specific issue resolved in 0.24.1 that fixes this? I will give this a try but will take some time to verify because our ingress controller is a shared resource across many organizations' pods and namespaces.

natemurthy on 1 May 2019

@aledbf I have confirmed that your recommendation works as desired. You can see the changes applied at around 10:02 on the below. Closing this out. Thank you for the support!

success

natemurthy on 7 May 2019

👍2

Was this page helpful?

0 / 5 - 0 ratings