Haproxy: HAProxy reloads results in fd count increase

Created on 9 Jul 2020 · 6Comments · Source: haproxy/haproxy

Output of `haproxy -vv` and `uname -a`

HA-Proxy version 2.2.0 2020/07/07 - https://haproxy.org/
Status: long-term supported branch - will stop receiving fixes around Q2 2025.
Known bugs: http://www.haproxy.org/bugs/bugs-2.2.0.html
Running on: Linux 4.15.0-42-generic #45-Ubuntu SMP Thu Nov 15 19:32:57 UTC 2018 x86_64
Build options :
  TARGET  = linux-glibc
  CPU     = generic
  CC      = gcc
  CFLAGS  = -O2 -g -Wall -Wextra -Wdeclaration-after-statement -fwrapv -Wno-unused-label -Wno-sign-compare -Wno-unused-parameter -Wno-clobbered -Wno-missing-field-initializers -Wno-stringop-overflow -Wtype-limits -Wshift-negative-value -Wshift-overflow=2 -Wduplicated-cond -Wnull-dereference
  OPTIONS = USE_PCRE=1 USE_LINUX_TPROXY=1 USE_LINUX_SPLICE=1 USE_LIBCRYPT=1 USE_OPENSSL=1 USE_ZLIB=1 USE_SYSTEMD=1

Feature list : +EPOLL -KQUEUE +NETFILTER +PCRE -PCRE_JIT -PCRE2 -PCRE2_JIT +POLL -PRIVATE_CACHE +THREAD -PTHREAD_PSHARED +BACKTRACE -STATIC_PCRE -STATIC_PCRE2 +TPROXY +LINUX_TPROXY +LINUX_SPLICE +LIBCRYPT +CRYPT_H +GETADDRINFO +OPENSSL -LUA +FUTEX +ACCEPT4 +ZLIB -SLZ +CPU_AFFINITY +TFO +NS +DL +RT -DEVICEATLAS -51DEGREES -WURFL +SYSTEMD -OBSOLETE_LINKER +PRCTL +THREAD_DUMP -EVPORTS

Default settings :
  bufsize = 16384, maxrewrite = 1024, maxpollevents = 200

Built with multi-threading support (MAX_THREADS=64, default=16).
Built with OpenSSL version : OpenSSL 1.1.0g  2 Nov 2017
Running on OpenSSL version : OpenSSL 1.1.0g  2 Nov 2017
OpenSSL library supports TLS extensions : yes
OpenSSL library supports SNI : yes
OpenSSL library supports : TLSv1.0 TLSv1.1 TLSv1.2
Built with network namespace support.
Built with zlib version : 1.2.11
Running on zlib version : 1.2.11
Compression algorithms supported : identity("identity"), deflate("deflate"), raw-deflate("deflate"), gzip("gzip")
Built with transparent proxy support using: IP_TRANSPARENT IPV6_TRANSPARENT IP_FREEBIND
Built with PCRE version : 8.39 2016-06-14
Running on PCRE version : 8.39 2016-06-14
PCRE library supports JIT : no (USE_PCRE_JIT not set)
Encrypted password support via crypt(3): yes
Built with gcc compiler version 7.3.0
Built with the Prometheus exporter as a service

Available polling systems :
      epoll : pref=300,  test result OK
       poll : pref=200,  test result OK
     select : pref=150,  test result OK
Total: 3 (3 usable), will use epoll.

Available multiplexer protocols :
(protocols marked as <default> cannot be specified using 'proto' keyword)
            fcgi : mode=HTTP       side=BE        mux=FCGI
       <default> : mode=HTTP       side=FE|BE     mux=H1
              h2 : mode=HTTP       side=FE|BE     mux=H2
       <default> : mode=TCP        side=FE|BE     mux=PASS

Available services :
    prometheus-exporter

Available filters :
    [SPOE] spoe
    [COMP] compression
    [TRACE] trace
    [CACHE] cache
    [FCGI] fcgi-app


# uname -a
Linux hostname-xxx 4.15.0-42-generic #45-Ubuntu SMP Thu Nov 15 19:32:57 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux

What's the configuration?

global
  user haproxy
    group haproxy
    nbproc 1
    nbthread 16
    cpu-map auto:1/1-16 0-15
    log /dev/log local2
    log /dev/log local0 notice
    chroot /path/to/haproxy
    pidfile /var/run/haproxy.pid
    daemon
    master-worker
    maxconn 200000
    hard-stop-after 1h
    stats socket /path/to/haproxy/socket mode 660 level admin expose-fd listeners
    tune.ssl.cachesize 3000000
    tune.ssl.lifetime 60000
    ssl-default-bind-ciphers ECDHE-ECDSA-AES256-GCM-SHA384:ECDHE-RSA-AES256-GCM-SHA384:ECDHE-ECDSA-CHACHA20-POLY1305:ECDHE-RSA-CHACHA20-POLY1305:ECDHE-ECDSA-AES128-GCM-SHA256:ECDHE-RSA-AES128-GCM-SHA256:ECDHE-ECDSA-AES256-SHA384:ECDHE-RSA-AES256-SHA384:ECDHE-ECDSA-AES128-SHA256:ECDHE-RSA-AES128-SHA256
    ssl-default-bind-options ssl-min-ver TLSv1.2 ssl-max-ver TLSv1.2
    server-state-file /path/to/haproxy/haproxy_server_states
    tune.bufsize 40960

defaults
  mode http
  log global
  retries 3
  timeout http-request 10s
  timeout queue 10s
  timeout connect 10s
  timeout client 1m
  timeout server 1m
  timeout tunnel 10m
  timeout client-fin 30s
  timeout server-fin 30s
  timeout check 10s
  option httplog
  option forwardfor except 127.0.0.0/8
  option redispatch
  load-server-state-from-file global

Steps to reproduce the behavior

use version 2.1x/2.2x
make your systemd unit file to dump the server state to a file during the reload.
have global section to read the server-state file that systemd reload dumped.

Actual behavior

we use haproxy in our environment and we recently upgraded to 2.1.0 from 2.0.

Now everytime we reload, the number of FD is increasing by 1. This was not the case in version 2.0.

The fd limit for root user is set to 1024, after 1024 reloads, the haproxy process starts to fail and the reload works no more.
new changes are not being applied.

This happens in haproxy 2.2 as well.

to give a background, we reload the haproxy very frequently so we hit the fd limit real soon and our services starts to fail.

This issue does not happen if we do not read the server-state during reload.

Below is the experiment, and it shows the fd count increase.

root@haproxynode:/usr/local/src/haproxy-2.2.0# ls -l /proc/$(systemctl status haproxy |grep 'Main PID' |awk '{print $3}')/fd
total 0
lr-x------ 1 root root 64 Jul  9 09:11 0 -> /dev/null
lrwx------ 1 root root 64 Jul  9 09:11 1 -> 'socket:[254090076]'
lrwx------ 1 root root 64 Jul  9 09:11 10 -> 'socket:[254090089]'
lrwx------ 1 root root 64 Jul  9 09:11 2 -> 'socket:[254090076]'
lrwx------ 1 root root 64 Jul  9 09:11 3 -> 'socket:[254090081]'
lrwx------ 1 root root 64 Jul  9 09:11 4 -> 'anon_inode:[eventpoll]'
lr-x------ 1 root root 64 Jul  9 09:11 5 -> /path/to/haproxy/server-state
lr-x------ 1 root root 64 Jul  9 09:11 6 -> 'pipe:[254090092]'
l-wx------ 1 root root 64 Jul  9 09:11 7 -> 'pipe:[254090092]'

root@haproxynode:/usr/local/src/haproxy-2.2.0# systemctl reload haproxy

root@haproxynode:/usr/local/src/haproxy-2.2.0# ls -l /proc/$(systemctl status haproxy |grep 'Main PID' |awk '{print $3}')/fd
total 0
lr-x------ 1 root root 64 Jul  9 09:11 0 -> /dev/null
lrwx------ 1 root root 64 Jul  9 09:11 1 -> 'socket:[254090076]'
l-wx------ 1 root root 64 Jul  9 09:11 10 -> 'pipe:[254090150]'
lrwx------ 1 root root 64 Jul  9 09:11 2 -> 'socket:[254090076]'
lrwx------ 1 root root 64 Jul  9 09:11 4 -> 'socket:[254090139]'
lr-x------ 1 root root 64 Jul  9 09:11 5 -> /path/to/haproxy/server-state
lrwx------ 1 root root 64 Jul  9 09:11 6 -> 'anon_inode:[eventpoll]'
lr-x------ 1 root root 64 Jul  9 09:11 7 -> /path/to/haproxy/server-state
lr-x------ 1 root root 64 Jul  9 09:11 8 -> 'pipe:[254090150]'
lrwx------ 1 root root 64 Jul  9 09:11 9 -> 'socket:[254090148]'

The number of fd reading the server-state increased to 2. Is this a known bug? or do we have to do something different in newer versions?

Expected behavior

Reload should not make the fd number to increase in every reload and keep them idle forever.

Do you have any idea what may have caused this?

No, I know reloads are causing this, not sure about the internals.

Do you have an idea how to solve the issue?

fixed bug

Source

VigneshSP94

All 6 comments

da29fe2360e61a5ed4acd283765b20addd5a3ea8

$ g bisect good
da29fe2360e61a5ed4acd283765b20addd5a3ea8 is the first bad commit
commit da29fe2360e61a5ed4acd283765b20addd5a3ea8
Author: Baptiste Assmann <[email protected]>
Date:   Thu Jun 13 13:24:29 2019 +0200

    MEDIUM: server: server-state global file stored in a tree

    Server states can be recovered from either a "global" file (all backends)
    or a "local" file (per backend).

    The way the algorithm to parse the state file was first implemented was good
    enough for a low number of backends and servers per backend.
    Basically, for each backend the state file (global or local) is opened,
    parsed entirely and for each line we check if it contains data related to
    a server from the backend we're currently processing.
    We must read the file entirely, just in case some lines for the current
    backend are stored at the end of the file.
    This does not scale at all!

    This patch changes the behavior above for the "global" file only. Now,
    the global file is read and parsed once and all lines it contains are
    stored in a tree, for faster discovery.
    This result in way much less fopen, fgets, and strcmp calls, which make
    loading of very big state files very quick now.

 include/types/server.h |  11 ++
 src/server.c           | 412 +++++++++++++++++++++++++++++++++----------------
 2 files changed, 294 insertions(+), 129 deletions(-)
$ g bisect log
git bisect start
# bad: [5254321d1447bc72a22f0381a0225175d42e6704] BUILD: tcp: condition TCP keepalive settings to platforms providing them
git bisect bad 5254321d1447bc72a22f0381a0225175d42e6704
# bad: [32bf97fb6048e0fb7afe8c336e6a1594fbde9430] [RELEASE] Released version 2.2-dev3
git bisect bad 32bf97fb6048e0fb7afe8c336e6a1594fbde9430
# good: [9dc6b97429ce0f5be142fa9b920bf0ef0a714d73] [RELEASE] Released version 2.1-dev0
git bisect good 9dc6b97429ce0f5be142fa9b920bf0ef0a714d73
# bad: [34779c34fcb483b91339a1c4c8d74da5ad7ff530] CLEANUP: ssl: remove old TODO commentary
git bisect bad 34779c34fcb483b91339a1c4c8d74da5ad7ff530
# bad: [e40f274878eb70946a1792f5ef142ec0d57ac9c4] BUILD: trace: make the lockon_ptr const to silence a warning without threads
git bisect bad e40f274878eb70946a1792f5ef142ec0d57ac9c4
# bad: [2ab5c38359340c52abce3516e572b838a30b1754] BUG/MINOR: checks: do not exit tcp-checks from the middle of the loop
git bisect bad 2ab5c38359340c52abce3516e572b838a30b1754
# bad: [37243bc61f9c5cf88d1fe96a016e5f2f7e5e0c60] BUG/MEDIUM: mux-h1: Don't release h1 connection if there is still data to send
git bisect bad 37243bc61f9c5cf88d1fe96a016e5f2f7e5e0c60
# bad: [ad03288e6b28d816abb443cf8c6d984a72bb91a6] BUG/MINOR: mworker/cli: don't output a \n before the response
git bisect bad ad03288e6b28d816abb443cf8c6d984a72bb91a6
# bad: [da29fe2360e61a5ed4acd283765b20addd5a3ea8] MEDIUM: server: server-state global file stored in a tree
git bisect bad da29fe2360e61a5ed4acd283765b20addd5a3ea8
# good: [d4376302377e4f51f43a183c2c91d929b27e1ae3] MINOR: sample: Add sha2([<bits>]) converter
git bisect good d4376302377e4f51f43a183c2c91d929b27e1ae3
# first bad commit: [da29fe2360e61a5ed4acd283765b20addd5a3ea8] MEDIUM: server: server-state global file stored in a tree