Ingress-nginx: Ingress nginx OOM

Created on 21 Oct 2019 · 10Comments · Source: kubernetes/ingress-nginx

Is this a request for help? (If yes, you should use our troubleshooting guide and community support channels, see https://kubernetes.io/docs/tasks/debug-application-cluster/troubleshooting/.): already asked in slack channel - no answer

What keywords did you search in NGINX Ingress controller issues before filing this one? (If you have found any duplicates, you should instead reply there.): memory, OOM, nginx, nginx-ingress

Is this a BUG REPORT or FEATURE REQUEST? (choose one): BUG

NGINX Ingress controller version: 0.26.1

Kubernetes version (use kubectl version):
Server Version: version.Info{Major:"1", Minor:"15", GitVersion:"v1.15.3", GitCommit:"2d3c76f9091b6bec110a5e63777c332469e0cba2", GitTreeState:"clean", BuildDate:"2019-08-19T11:05:50Z", GoVersion:"go1.12.9", Compiler:"gc", Platform:"linux/amd64"}

Environment:

Cloud provider or hardware configuration: hardware
OS (e.g. from /etc/os-release): reproduced both on ubuntu16 and ubuntu18
Kernel (e.g. uname -a): 4.15.0-65-generic
Install tools:
Others:

What happened: memory start leaking and after few hours container was killed by OOM killer

What you expected to happen: no memory leaks

How to reproduce it (as minimally and precisely as possible): ~10-15k RPS

Anything else we need to know:
The main process begins to use more and more memory until it is killed by the OOM killer. I added a location to check the garbage collection (https://github.com/kubernetes/ingress-nginx/issues/3314#issuecomment-433875622). It shows 1-5 MB. No errors or warnings were observed in nginx log.
nginx-ingress-memory-leak

Source

andrii29

👍6

All 10 comments

I checked profiler and found that metrics can be possible source of problem
image (2)
So, I disabled metrics on one host and enabled on another (almost simmilar servers with same traffic)
image (3)
As you can see, server with enabled metrics have problems with memory leaks. Also this server use more CPU resources.
Is there any way to reconfigure metrics part (for example, enable only some metrics) to avoid memory leaking and high CPU usage?
This metrics are really useful, and I don't want to switch into log parsing (https://github.com/martin-helmich/prometheus-nginxlog-exporter) or any other nginx metrics collectors.

Regards,
Andrii

andrii29 on 23 Oct 2019

There was indeed an increase in memory usage after we upgraded 0.26.1. Nginx pods are consuming 700-800 Mi on average with 0 qps.

bzon on 24 Oct 2019

I'm getting sudden timeouts when nginx-ingress is running for a few days (6-7) with no apparent error in the logs, like if the requests were not being processed at all. This behaviour started to show up after upgrading to 0.26.1. I rolled back to version 0.24.1 and everything works smooth. Not sure how I can provide data/information that would allow you to debug that.

davidcodesido on 19 Nov 2019

Having the same big issue right now in PROD.
it starts from using no memory and after 1/2 hours it consumed it all. growing pretty fast.

Screenshot_2019-12-19_13-37-15
I'll try the 0.24.1 as suggested by @davidcodesido

lucax88x on 19 Dec 2019

Having same issue with 0.24.1 :(

It doesn't always happens, it randomly starts bumping high gb/30 minutes and then server collapses and stabilizes again.

lucax88x on 20 Dec 2019

Please test quay.io/kubernetes-ingress-controller/nginx-ingress-controller-amd64:dev-1.17.7-1
This image contains the current master and https://github.com/kubernetes/ingress-nginx/pull/4863

aledbf on 29 Dec 2019

👀1 👍1

Hello,
@aledbf
I'm using helm chart for ingress-nginx and got error when trying to use this tag
Could you change tag name to satisfy condition
https://github.com/helm/charts/blob/master/stable/nginx-ingress/templates/controller-daemonset.yaml#L64-L73
?

Regards,
Andrii

andrii29 on 30 Dec 2019

Please test quay.io/kubernetes-ingress-controller/nginx-ingress-controller-amd64:dev-1.17.7-1
This image contains the current master and #4863

@aledbf In my case I'm still having same issue :(

As soon I try to telnet a specific port nginx it suddenly starts loopping saying this port is not reachable and goes OOM after some minutes.

And yes, if I close the connection it keeps telling and logging port is not reachable and I must kill manually the pod.

lucax88x on 31 Dec 2019

Hello,
tried new release of ingress-nginx 0.28.0 with enabled metrics - no memory problems

Will wait for 1 more day to confirm that everything is OK