Ingress-nginx: Upgrading node pools without dropping requests on GKE

Created on 21 Jan 2019  路  11Comments  路  Source: kubernetes/ingress-nginx

Is this a request for help? (If yes, you should use our troubleshooting guide and community support channels, see https://kubernetes.io/docs/tasks/debug-application-cluster/troubleshooting/.): yes

What keywords did you search in NGINX Ingress controller issues before filing this one? (If you have found any duplicates, you should instead reply there.): gke upgrade node pool


Is this a BUG REPORT or FEATURE REQUEST? (choose one): feature request

NGINX Ingress controller version: 0.19

Kubernetes version (use kubectl version):
Client Version: version.Info{Major:"1", Minor:"13", GitVersion:"v1.13.1", GitCommit:"eec55b9ba98609a46fee712359c7b5b365bdd920", GitTreeState:"clean", BuildDate:"2018-12-13T10:39:04Z", GoVersion:"go1.11.2", Compiler:"gc", Platform:"linux/amd64"}
Server Version: version.Info{Major:"1", Minor:"10+", GitVersion:"v1.10.9-gke.5", GitCommit:"d776b4deeb3655fa4b8f4e8e7e4651d00c5f4a98", GitTreeState:"clean", BuildDate:"2018-11-08T20:33:00Z", GoVersion:"go1.9.3b4", Compiler:"gc", Platform:"linux/amd64"}

Environment: N/A

  • Cloud provider or hardware configuration: Google Cloud
  • OS (e.g. from /etc/os-release): container-optimized OS
  • Kernel (e.g. uname -a): Linux gke-cluster-1-app-01-2ea0b09c-5bcb 4.4.111+ #1 SMP Fri Aug 10 11:48:29 PDT 2018 x86_64 Intel(R) Xeon(R) CPU @ 2.30GHz GenuineIntel GNU/Linux

  • Install tools: GKE

  • Others:

What happened:
Node pool upgrades trigger a cordon of host(s) running nginx-ingress pods. Cordon causes the host to be dropped from the upstream TCP load balancer without first draining connections. This causes any in-flight requests to be dropped.

What you expected to happen:
Some method by which a node can be removed from the upstream LB with connection draining, so that in-flight requests coming through the ingress controller are not dropped.

How to reproduce it (as minimally and precisely as possible):
Perform a node-pool upgrade on a pool running nginx-ingress controller pods.

Anything else we need to know:
To phrase the problem as I understand it as succinctly as possible:
nginx-ingress for GKE elects to use a TCP load balancer as upstream for the nginx-ingress controller pods.
GCP TCP load balancers do not support connection draining.
When a GKE node is cordoned, it is removed from the upstream TCP LB, per https://github.com/kubernetes/kubernetes/issues/65013
When the node is removed from the upstream LB, any in-flight requests being serviced by pods that arrived through the now cordoned node are dropped.
Therefore it is not possible to upgrade GKE node pools using nginx-ingress without disruption.

Is the above assessment correct?
If so, are there known workarounds or feature requests that would allow a non-disruptive node pool upgrade when using nginx-ingress?

Most helpful comment

To use L7 instead of L4, you would define your nginx Service of type NodePort, then create a GCE ingress (and to do that properly we need to formalize ingress classes) that has just a default backend to your nginx. You do get some benefits - better protection against network level attacks, for example.

That said, there IS a way for GCP to drain (kind of) but it requires some fairly deep changes to controllers. I will noodle on it and see if I can write up a bug.

BTW - if you just @ me, I probably won't see it (too many). If you assign to me, I will. Or ping me on slack.

All 11 comments

Therefore it is not possible to upgrade GKE node pools using nginx-ingress without disruption.

@thockin can you confirm this, please?

I'd like to add some more context to hopefully get some attention on this.

It's clear from GCP's documentation that connection draining is only supported for HTTPS load balancers, SSL proxy load balancers, and internal load balancers: https://cloud.google.com/load-balancing/docs/enabling-connection-draining

In addition to node pool upgrades, it seems that this behavior could also be seen when a GKE cluster with cluster autoscaling enabled scales down and removes a node running an nginx-ingress controller pod, since the behavior during scale down is the same as during upgrade (cordon + drain node). We currently work around this by isolating nginx-ingress to it's own node pool and disabling autoscaling, which is less than ideal because we'd like to be able to scale up ingress along with our applications under high load.

I'm honestly a bit surprised I can't find more about this issue in the community, since nginx-ingress seems to be a very popular ingress solution for GKE. This issue came up almost immediately for my team when we tested a node pool upgrade under even small amounts of requests flowing through Ingress. I would expect that not being able to gracefully remove a node from the cluster would be a major production consideration for anybody running on GKE.

My only other thought on a general workaround for this is to ensure requests are retried at the client layer, which may be good practice depending on the situation but I can't imagine everybody using nginx-ingress on GKE is doing.

@brandentimm there is nothing we can do in ingress-nginx with this issue. I suggest you continue the conversation in https://github.com/kubernetes/kubernetes/issues/65013

@aledbf I can attempt to tug on this issue more in that ticket, but I thought it worthwhile to start a conversation here as well to at least confirm that this is expected behavior. It doesn't seem to be documented anywhere, regardless of whether something can be done about it.

Please pardon me if this is a naive question, but I noticed that for AWS nginx-ingress supports either L4 or L7 load balancing. Would swapping in the HTTPS Load Balancer for GKE instead of the TCP Load Balancer not address the issue of connection draining, so that any time a node is cordoned we at least don't drop in-flight requests?

I can attempt to tug on this issue more in that ticket, but I thought it worthwhile to start a conversation here as well to at least confirm that this is expected behavior

This project is too specific. I think the main repo is the right place. Maybe you can ask in the gke slack channel too?

Would swapping in the HTTPS Load Balancer for GKE instead of the TCP Load Balancer not address the issue of connection draining, so that any time a node is cordoned we at least don't drop in-flight requests?

I don't know. I don't use GKE. How you define such change? Some additional annotation in the service?

Good call out @aledbf looks like they only support creating Services of type internal through annotation currently, so they must be managing this through a google API directly for the GCLB Ingress which uses this LB type.

Looks like this is a dead end currently. I've opened a feature request to support connection draining for TCP LBs on GCE here: https://issuetracker.google.com/issues/123457325

To use L7 instead of L4, you would define your nginx Service of type NodePort, then create a GCE ingress (and to do that properly we need to formalize ingress classes) that has just a default backend to your nginx. You do get some benefits - better protection against network level attacks, for example.

That said, there IS a way for GCP to drain (kind of) but it requires some fairly deep changes to controllers. I will noodle on it and see if I can write up a bug.

BTW - if you just @ me, I probably won't see it (too many). If you assign to me, I will. Or ping me on slack.

Just wanted to note I'm also experiencing this issue. As a new-comer to Kubernetes and ingress-nginx my initial reaction was to just think that I was doing something wrong. The more I came to understand how ingress-nginx / k8s works this makes more sense about why I was experiencing down-time. I think if we don't have an ample work-around at this time this needs to be better spelled out in the documentation that it is a legitimate problem on GKE deployments.

Although this problem exists there are ways around it (however it's on the more manual side, so if you have GKE running with auto-updates enabled then the following likely won't help).

When we create a new node pool and want to move our ingress controllers over to that. We simply start up more ingress-nginx pods running on that new node pool. Then update the k8s service in-front of the ingress-nginx pods, to start sending traffic from TCP LB to the new pods in the new node-pool (so run different labels on the new pods so you can address them within the k8s service separately to the existing ones). New traffic starts to use the new pods in the new pool. Existing traffic (mainly existing long lived connection based traffic), carries on via the pods running on the existing/older node pool, once the traffic has all gone/moved. Clean up the old pods.

We use this method and don't notice any lost requests when doing so. Setting the pod graceful termination periods high enough also helps, i've seen the traffic continue to go to the old pods for well over 10-15 minutes before. I put this down to http2 connections but also http keep-alive. So if clients connecting to your services have really long life connections, expect this period to be extended

@markfermor how do you know the old pods have no traffic anymore? is there a call you can do to nginx to figure that out?

@dvaldivia you could either look at the Nginx metrics, if the numbers of requests have dropped way down then the metrics should tell you when they reach 0. Tail the logs of the old pods to see when the last access log message was printed

Was this page helpful?
0 / 5 - 0 ratings