Victoriametrics: vminsert: too many open files on "open /proc/self/stat"

Created on 8 Nov 2019 · 12Comments · Source: VictoriaMetrics/VictoriaMetrics

Describe the bug

vminsert fails to start up - too many open files after starting up debug handlers.

Environment

Pretty standard helm deployment of VictoriaMetrics.

2 replicas of vminsert - happens on both simultaneusly
1 replica of vmstorage
5 Kubernetes clusters reporting metrics to VM via Prometheus remote_write
No resource limits configured.

Probably connection storm after vminsert restart, making it open >1024 file descriptors for open connections.

Version

vminsert-20190828-210741-tags-v1.27.0-cluster-0-g8f81908b

Additional context

2019-11-08T14:01:56.279+0000    info    lib/logger/flag.go:11   build version: vminsert-20190828-210741-tags-v1.27.0-cluster-0-g8f81908b
2019-11-08T14:01:56.279+0000    info    lib/logger/flag.go:12   command line flags
2019-11-08T14:01:56.279+0000    info    lib/logger/flag.go:20   flag "graphiteListenAddr" = ""
2019-11-08T14:01:56.279+0000    info    lib/logger/flag.go:20   flag "http.disableResponseCompression" = "false"
2019-11-08T14:01:56.279+0000    info    lib/logger/flag.go:20   flag "httpListenAddr" = ":8480"
2019-11-08T14:01:56.279+0000    info    lib/logger/flag.go:20   flag "influxMeasurementFieldSeparator" = "_"
2019-11-08T14:01:56.279+0000    info    lib/logger/flag.go:20   flag "influxSkipSingleField" = "false"
2019-11-08T14:01:56.279+0000    info    lib/logger/flag.go:20   flag "loggerLevel" = "INFO"
2019-11-08T14:01:56.279+0000    info    lib/logger/flag.go:20   flag "maxConcurrentInserts" = "16"
2019-11-08T14:01:56.279+0000    info    lib/logger/flag.go:20   flag "maxInsertRequestSize" = "33554432"
2019-11-08T14:01:56.279+0000    info    lib/logger/flag.go:20   flag "maxLabelsPerTimeseries" = "30"
2019-11-08T14:01:56.279+0000    info    lib/logger/flag.go:20   flag "memory.allowedPercent" = "60"
2019-11-08T14:01:56.279+0000    info    lib/logger/flag.go:20   flag "opentsdbHTTPListenAddr" = ""
2019-11-08T14:01:56.279+0000    info    lib/logger/flag.go:20   flag "opentsdbListenAddr" = ""
2019-11-08T14:01:56.279+0000    info    lib/logger/flag.go:20   flag "rpc.disableCompression" = "false"
2019-11-08T14:01:56.279+0000    info    lib/logger/flag.go:20   flag "storageNode" = "victoria-metrics-vmstorage-0.victoria-metrics-vmstorage.default.svc.cluster.local:8400"
2019-11-08T14:01:56.279+0000    info    lib/logger/flag.go:20   flag "version" = "false"
2019-11-08T14:01:56.279+0000    info    app/vminsert/main.go:41 initializing netstorage for storageNodes [victoria-metrics-vmstorage-0.victoria-metrics-vmstorage.default.svc.cluster.local:8400]...
2019-11-08T14:01:56.280+0000    info    lib/memory/memory.go:33 limiting caches to 4797053337 bytes, leaving 3198035559 bytes to the OS according to -memory.allowedPercent=60
2019-11-08T14:01:56.280+0000    info    app/vminsert/main.go:47 successfully initialized netstorage in 557.877µs
2019-11-08T14:01:56.281+0000    info    lib/httpserver/httpserver.go:48 starting http server at http://:8480/
2019-11-08T14:01:56.281+0000    info    lib/httpserver/httpserver.go:49 pprof handlers are exposed at http://:8480/debug/pprof/
2019/11/08 14:02:32 ERROR: cannot open /proc/self/stat: open /proc/self/stat: too many open files
2019/11/08 14:03:02 ERROR: cannot open /proc/self/stat: open /proc/self/stat: too many open files
2019-11-08T14:03:03.960+0000    error   lib/httpserver/httpserver.go:359    error in "/insert/1/prometheus": cannot read prompb.WriteRequest: cannot read compressed request: unexpected EOF
2019-11-08T14:03:06.960+0000    error   lib/httpserver/httpserver.go:359    error in "/insert/1/prometheus": cannot read prompb.WriteRequest: cannot read compressed request: unexpected EOF
2019-11-08T14:03:08.466+0000    error   lib/httpserver/httpserver.go:359    error in "/insert/1/prometheus": cannot read prompb.WriteRequest: cannot read compressed request: unexpected EOF
2019-11-08T14:03:13.465+0000    error   lib/httpserver/httpserver.go:359    error in "/insert/1/prometheus": cannot read prompb.WriteRequest: cannot read compressed request: unexpected EOF
2019-11-08T14:03:20.746+0000    info    app/vminsert/main.go:67 service received signal terminated
2019-11-08T14:03:20.746+0000    info    app/vminsert/main.go:69 gracefully shutting down the service at ":8480"

enhancement question

Source

zytek

Most helpful comment

As this happened to me even though we run VM on a quite small scale I would propose an ops friendly solution that does not require to tweak underlying host (changing system default limits, or default dockerd limits applied to ALL containers running on the node)

We could add support for increasing limits straight from our helm chart:

add CAP_SYS_RESOURCE capability to the pod (no need to run privileged container)
execute ulimit -n before starting vminsert from shell wrapper script or straight from binary (https://stackoverflow.com/questions/17817204/how-to-set-ulimit-n-from-a-golang-program)

That would be the most ops friendly option. Cassandra went this way: https://github.com/instaclustr/cassandra-operator/pull/269/files

zytek on 13 Nov 2019

👍3

All 12 comments

Any ideas how to easily bump open file limit on helm deployment of VM?

zytek on 8 Nov 2019

@tenmozes , could you help with this?

valyala on 8 Nov 2019

Probably this would help increasing ulimit -n on K8S.

valyala on 8 Nov 2019

@zytek hi, thank for an issue, this is docker setting you need to increase limits for docker daemon on worker machines.
try to set ulimit in docker config

also you may find useful information here: https://github.com/kubernetes/kubernetes/issues/3595

tenmozes on 8 Nov 2019

We could add support for increasing limits straight from our helm chart:

add CAP_SYS_RESOURCE capability to the pod (no need to run privileged container)
execute ulimit -n before starting vminsert from shell wrapper script or straight from binary (https://stackoverflow.com/questions/17817204/how-to-set-ulimit-n-from-a-golang-program)

That would be the most ops friendly option. Cassandra went this way: https://github.com/instaclustr/cassandra-operator/pull/269/files

zytek on 13 Nov 2019

👍3

This sounds reasonable!
@tenmozes , could you look into this?

valyala on 13 Nov 2019

I can work on implementing chart/docker changes but I can't find original Dockerfile in any of the repos, can you point me to one ? _edit: found them, plus the ones on "cluster" branch_

Also, after thinking about this for a while I have some other improvement ideas:

Delay starting http / tsdb listeners to after vminsert is fully initialised (debug http server started, metrics exposed, etc), so that open file limit would manifest itself with new connection throttling rather than failure to start the process.
_EDIT: from quick code inspection it seems this is not feasible_
Wrap http listeners with netutil.LimitListener so we can always limit max number of client connections. We could either expose max client connections as a flag, or calculate it by reading process limits (syscall.Getrlimit() )

zytek on 17 Nov 2019

@zytek, thank you
the cluster version is in separate branch https://github.com/VictoriaMetrics/VictoriaMetrics/tree/cluster/

https://github.com/VictoriaMetrics/VictoriaMetrics/tree/cluster/app/vmrestore
https://github.com/VictoriaMetrics/VictoriaMetrics/tree/cluster/app/vmbackup
https://github.com/VictoriaMetrics/VictoriaMetrics/tree/cluster/app/vmselect
https://github.com/VictoriaMetrics/VictoriaMetrics/tree/cluster/app/vminsert
https://github.com/VictoriaMetrics/VictoriaMetrics/tree/cluster/app/vmstorage
Please check deployment folder, you find Dockerfile in it

@valyala what do you think about improvements

tenmozes on 17 Nov 2019

Delay starting http / tsdb listeners to after vminsert is fully initialised (debug http server started, metrics exposed, etc), so that open file limit would manifest itself with new connection throttling rather than failure to start the process.

This is hard to achieve, since debug http server expects all the VictoriaMetrics components are already initialized when it starts - it should return valid response with all the metrics from all the components when /metrics page is requested.

Wrap http listeners with netutil.LimitListener so we can always limit max number of client connections. We could either expose max client connections as a flag, or calculate it by reading process limits (syscall.Getrlimit() )

This sounds reasonable. Let's clarify the purpose of this change - it will limit the number of incoming TCP connections to vminsert / vmselect, so spare connections can be always established to debug http server for reading /metrics or /deubg/pprof/* pages. It this correct?

valyala on 17 Nov 2019

Wrap http listeners with netutil.LimitListener so we can always limit max number of client connections. We could either expose max client connections as a flag, or calculate it by reading process limits (syscall.Getrlimit() )

This sounds reasonable. Let's clarify the purpose of this change - it will limit the number of incoming TCP connections to vminsert / vmselect, so spare connections can be always established to debug http server for reading /metrics or /deubg/pprof/* pages. It this correct?

After careful inspection of attached logs I believe that the problem cannot be fixed by limiting number of accepted connections by http listener. None of the errors in the logs seem to trigger os.Exit / log.Fatal etc. The app receives termination signal after almost 2 minutes, so it seems that it is due to livenessProbe failing - for the same reasons as open() call failed earlier - no free file handles, so vminsert did not responds to ping from kubernetes and was terminated.

Adding support for increasing nofile limits on container startup should be sufficient here.

zytek on 19 Nov 2019

If you use OS with systemd, then try increase parameters LimitNOFILE & LimitNPROC.
ex:

cat /etc/systemd/system/victoriametrics.service
[Unit]
Description=VictoriaMetrics
After=network.target

[Service]
Type=simple
Environment="GOMAXPROCS=24"
User=victoriametrics
LimitNOFILE=500000 #!
LimitNPROC=500000 #!
StartLimitBurst=5
StartLimitInterval=0
Restart=on-failure
RestartSec=1
PIDFile=/run/victoriametrics/victoriametrics.pid
ExecStart=/usr/local/bin/victoria-metrics-prod -storageDataPath /vm/dt -maxConcurrentInserts 1024 -memory.allowedPercent 60 -retentionPeriod 24
ExecStop=/bin/kill -s SIGTERM $MAINPID

[Install]
WantedBy=multi-user.target