Ingress-nginx: Ingress nginx OOM

Created on 21 Oct 2019  路  10Comments  路  Source: kubernetes/ingress-nginx

Is this a request for help? (If yes, you should use our troubleshooting guide and community support channels, see https://kubernetes.io/docs/tasks/debug-application-cluster/troubleshooting/.): already asked in slack channel - no answer

What keywords did you search in NGINX Ingress controller issues before filing this one? (If you have found any duplicates, you should instead reply there.): memory, OOM, nginx, nginx-ingress


Is this a BUG REPORT or FEATURE REQUEST? (choose one): BUG

NGINX Ingress controller version: 0.26.1

Kubernetes version (use kubectl version):
Server Version: version.Info{Major:"1", Minor:"15", GitVersion:"v1.15.3", GitCommit:"2d3c76f9091b6bec110a5e63777c332469e0cba2", GitTreeState:"clean", BuildDate:"2019-08-19T11:05:50Z", GoVersion:"go1.12.9", Compiler:"gc", Platform:"linux/amd64"}

Environment:

  • Cloud provider or hardware configuration: hardware
  • OS (e.g. from /etc/os-release): reproduced both on ubuntu16 and ubuntu18
  • Kernel (e.g. uname -a): 4.15.0-65-generic
  • Install tools:
  • Others:

What happened: memory start leaking and after few hours container was killed by OOM killer

What you expected to happen: no memory leaks

How to reproduce it (as minimally and precisely as possible): ~10-15k RPS

Anything else we need to know:
The main process begins to use more and more memory until it is killed by the OOM killer. I added a location to check the garbage collection (https://github.com/kubernetes/ingress-nginx/issues/3314#issuecomment-433875622). It shows 1-5 MB. No errors or warnings were observed in nginx log.
nginx-ingress-memory-leak

All 10 comments

I checked profiler and found that metrics can be possible source of problem
image (2)
So, I disabled metrics on one host and enabled on another (almost simmilar servers with same traffic)
image (3)
As you can see, server with enabled metrics have problems with memory leaks. Also this server use more CPU resources.
Is there any way to reconfigure metrics part (for example, enable only some metrics) to avoid memory leaking and high CPU usage?
This metrics are really useful, and I don't want to switch into log parsing (https://github.com/martin-helmich/prometheus-nginxlog-exporter) or any other nginx metrics collectors.

Regards,
Andrii

There was indeed an increase in memory usage after we upgraded 0.26.1. Nginx pods are consuming 700-800 Mi on average with 0 qps.

I'm getting sudden timeouts when nginx-ingress is running for a few days (6-7) with no apparent error in the logs, like if the requests were not being processed at all. This behaviour started to show up after upgrading to 0.26.1. I rolled back to version 0.24.1 and everything works smooth. Not sure how I can provide data/information that would allow you to debug that.

Having the same big issue right now in PROD.
it starts from using no memory and after 1/2 hours it consumed it all. growing pretty fast.

Screenshot_2019-12-19_13-37-15
I'll try the 0.24.1 as suggested by @davidcodesido

Having same issue with 0.24.1 :(

It doesn't always happens, it randomly starts bumping high gb/30 minutes and then server collapses and stabilizes again.

Please test quay.io/kubernetes-ingress-controller/nginx-ingress-controller-amd64:dev-1.17.7-1
This image contains the current master and https://github.com/kubernetes/ingress-nginx/pull/4863

Hello,
@aledbf
I'm using helm chart for ingress-nginx and got error when trying to use this tag
Could you change tag name to satisfy condition
https://github.com/helm/charts/blob/master/stable/nginx-ingress/templates/controller-daemonset.yaml#L64-L73
?

Regards,
Andrii

Please test quay.io/kubernetes-ingress-controller/nginx-ingress-controller-amd64:dev-1.17.7-1
This image contains the current master and #4863

@aledbf In my case I'm still having same issue :(

As soon I try to telnet a specific port nginx it suddenly starts loopping saying this port is not reachable and goes OOM after some minutes.

And yes, if I close the connection it keeps telling and logging port is not reachable and I must kill manually the pod.

Hello,
tried new release of ingress-nginx 0.28.0 with enabled metrics - no memory problems
image
Will wait for 1 more day to confirm that everything is OK

Regards,
Andrii

Hello,

confirmed - no memory problems

Regards,
Andrii

Was this page helpful?
0 / 5 - 0 ratings

Related issues

c-mccutcheon picture c-mccutcheon  路  3Comments

kfox1111 picture kfox1111  路  3Comments

geek876 picture geek876  路  3Comments

oilbeater picture oilbeater  路  3Comments

smeruelo picture smeruelo  路  3Comments