Haproxy: Segmentation fault with mysql and retry-on option

Created on 13 May 2020  路  3Comments  路  Source: haproxy/haproxy

Output of haproxy -vv and uname -a

$ docker run --rm haproxy:2.1.4 haproxy -vv

HA-Proxy version 2.1.4 2020/04/02 - https://haproxy.org/
Status: stable branch - will stop receiving fixes around Q1 2021.
Known bugs: http://www.haproxy.org/bugs/bugs-2.1.4.html
Build options :
  TARGET  = linux-glibc
  CPU     = generic
  CC      = gcc
  CFLAGS  = -O2 -g -fno-strict-aliasing -Wdeclaration-after-statement -fwrapv -Wno-unused-label -Wno-sign-compare -Wno-unused-parameter -Wno-old-style-declaration -Wno-ignored-qualifiers -Wno-clobbered -Wno-missing-field-initializers -Wno-implicit-fallthrough -Wno-stringop-overflow -Wno-cast-function-type -Wtype-limits -Wshift-negative-value -Wshift-overflow=2 -Wduplicated-cond -Wnull-dereference
  OPTIONS = USE_PCRE2=1 USE_PCRE2_JIT=1 USE_GETADDRINFO=1 USE_OPENSSL=1 USE_LUA=1 USE_ZLIB=1

Feature list : +EPOLL -KQUEUE -MY_EPOLL -MY_SPLICE +NETFILTER -PCRE -PCRE_JIT +PCRE2 +PCRE2_JIT +POLL -PRIVATE_CACHE +THREAD -PTHREAD_PSHARED -REGPARM -STATIC_PCRE -STATIC_PCRE2 +TPROXY +LINUX_TPROXY +LINUX_SPLICE +LIBCRYPT +CRYPT_H -VSYSCALL +GETADDRINFO +OPENSSL +LUA +FUTEX +ACCEPT4 -MY_ACCEPT4 +ZLIB -SLZ +CPU_AFFINITY +TFO +NS +DL +RT -DEVICEATLAS -51DEGREES -WURFL -SYSTEMD -OBSOLETE_LINKER +PRCTL +THREAD_DUMP -EVPORTS

Default settings :
  bufsize = 16384, maxrewrite = 1024, maxpollevents = 200

Built with multi-threading support (MAX_THREADS=64, default=4).
Built with OpenSSL version : OpenSSL 1.1.1d  10 Sep 2019
Running on OpenSSL version : OpenSSL 1.1.1d  10 Sep 2019
OpenSSL library supports TLS extensions : yes
OpenSSL library supports SNI : yes
OpenSSL library supports : TLSv1.0 TLSv1.1 TLSv1.2 TLSv1.3
Built with Lua version : Lua 5.3.3
Built with network namespace support.
Built with transparent proxy support using: IP_TRANSPARENT IPV6_TRANSPARENT IP_FREEBIND
Built with PCRE2 version : 10.32 2018-09-10
PCRE2 library supports JIT : yes
Encrypted password support via crypt(3): yes
Built with zlib version : 1.2.11
Running on zlib version : 1.2.11
Compression algorithms supported : identity("identity"), deflate("deflate"), raw-deflate("deflate"), gzip("gzip")
Built with the Prometheus exporter as a service

Available polling systems :
      epoll : pref=300,  test result OK
       poll : pref=200,  test result OK
     select : pref=150,  test result OK
Total: 3 (3 usable), will use epoll.

Available multiplexer protocols :
(protocols marked as <default> cannot be specified using 'proto' keyword)
              h2 : mode=HTTP       side=FE|BE     mux=H2
            fcgi : mode=HTTP       side=BE        mux=FCGI
       <default> : mode=HTTP       side=FE|BE     mux=H1
       <default> : mode=TCP        side=FE|BE     mux=PASS

Available services :
    prometheus-exporter

Available filters :
    [SPOE] spoe
    [CACHE] cache
    [FCGI] fcgi-app
    [TRACE] trace
    [COMP] compression

$ uname -a
docker run --rm haproxy:2.1.4 uname -a
Linux fc16cad4f85f 5.4.33 #1-NixOS SMP Fri Apr 17 08:50:26 UTC 2020 x86_64 GNU/Linux

# And tested also on Debian 8
$ uname -a
Linux dev-web1 3.16.0-8-amd64 #1 SMP Debian 3.16.64-2 (2019-04-01) x86_64 GNU/Linux

What's the configuration?

global
  log stdout format raw local0

listen 3306_tcp_mysql
  bind :3306
  mode tcp
  retry-on all-retryable-errors
  server mysql1 mysql:3306 check

Steps to reproduce the behavior

Initially, the problem was noticed on the servers managed by Debian 8, where haproxy-2.1.4 was installed through the official package. Then i tried to repeat this error via docker and the smallest configuration file. The problem disappears if you remove the retry-on keyword.

  1. git clone https://github.com/abra7134/haproxy-bugreport
  2. docker-compose up
  3. telnet localhost 3306

Actual behavior

After attempting to connect to 3306 port (via telnet for example), haproxy falls with segmentation fault. The problem is also repeated on 2.0 branch. And for some reason is present only when accessing to mysql (tried 5.6, 5.7 versions). There is no problem when accessing other tcp services like gearman, rabbitmq, redis.

haproxy_1  | [WARNING] 133/123305 (7) : Server 3306_tcp_mysql/mysql1 is UP, reason: Layer4 check passed, check duration: 0ms. 1 active and 0 backup servers online. 0 sessions requeued, 0 total in queue.
haproxy_1  | [ALERT] 133/123320 (1) : Current worker #1 (7) exited with code 139 (Segmentation fault)
haproxy_1  | [ALERT] 133/123320 (1) : exit-on-failure: killing every processes with SIGTERM
haproxy_1  | [WARNING] 133/123320 (1) : All workers exited. Exiting... (139)

Expected behavior

Work properly without segmentation fault.

Do you have any idea what may have caused this?

Do you have an idea how to solve the issue?

medium fixed core bug

Most helpful comment

Indeed, it makes no sense, I was sure I checked if the proxy was indeed in mode http before attempting to store the data, quite obviously not, I'll fix that

All 3 comments

Can be easily reproduced with current HEAD (d645574fd49d19da9513d19f78648de2c9bc670a):

Program terminated with signal SIGSEGV, Segmentation fault.
#0  0x00000000005b2162 in htx_get_blk_ptr (htx=0x27792b0, blk=0x990e09a2) at include/common/htx.h:565
565             return ((void *)htx->blocks + blk->addr);
[Current thread is 1 (Thread 0x7fe1eca45a80 (LWP 31657))]
(gdb) bt full
#0  0x00000000005b2162 in htx_get_blk_ptr (htx=0x27792b0, blk=0x990e09a2) at include/common/htx.h:565
No locals.
#1  0x00000000005b25ba in http_get_stline (htx=0x27792b0) at src/http_htx.c:69
        blk = 0x990e09a2
#2  0x0000000000559b41 in si_cs_recv (cs=0x278b540) at src/stream_interface.c:1378
        htx = 0x27792b0
        sl = 0x7ffddcba9630
        conn = 0x278b350
        si = 0x278b0b0
        ic = 0x278ae20
        ret = 110
        max = 16384
        cur_read = 0
        read_poll = 4
        flags = 0
#3  0x0000000000558b3b in si_cs_io_cb (t=0x278b300, ctx=0x278b0b0, state=0) at src/stream_interface.c:805
        si = 0x278b0b0
        cs = 0x278b540
        ret = 0
#4  0x000000000059faf4 in run_tasks_from_list (list=0x9b82f0 <task_per_thread+48>, max=67) at src/task.c:345
        process = 0x558aa5 <si_cs_io_cb>
        t = 0x278b300
        state = 0
        ctx = 0x278b0b0
        done = 0
#5  0x000000000059ff6a in process_runnable_tasks () at src/task.c:446
        tt = 0x9b82c0 <task_per_thread>
        lrq = 0x0
        grq = 0x0
        t = 0x2720830
        max_processed = 200
        done = 0
        tmp_list = 0x0
#6  0x000000000052bfa7 in run_poll_loop () at src/haproxy.c:2807
        next = 238489322
        wake = 0
#7  0x000000000052c49e in run_thread_poll_loop (data=0x0) at src/haproxy.c:2979
        ptaf = 0x8a2c60 <per_thread_alloc_list>
        ptif = 0x8a2c70 <per_thread_init_list>
        ptdf = 0x0
        ptff = 0x0
        init_left = 0
        init_mutex = {__data = {__lock = 0, __count = 0, __owner = 0, __nusers = 0, __kind = 0, __spins = 0, __elision = 0, __list = {__prev = 0x0, __next = 0x0}}, __size = '\000' <repeats 39 times>, __align = 0}
        init_cond = {__data = {__lock = 0, __futex = 2, __total_seq = 1, __wakeup_seq = 1, __woken_seq = 1, __mutex = 0x8b2ec0 <init_mutex>, __nwaiters = 0, __broadcast_seq = 1},
          __size = "\000\000\000\000\002\000\000\000\001\000\000\000\000\000\000\000\001\000\000\000\000\000\000\000\001\000\000\000\000\000\000\000\300.\213\000\000\000\000\000\000\000\000\000\001\000\000", __align = 8589934592}
#8  0x000000000052e049 in main (argc=3, argv=0x7ffddcba9c18) at src/haproxy.c:3681
        blocked_sig = {__val = {18446744067199990583, 18446744073709551615 <repeats 15 times>}}
        old_sig = {__val = {0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 3, 32, 140728306670112, 0, 0, 0}}
        i = 2
        err = 0
        retry = 200
        limit = {rlim_cur = 1048576, rlim_max = 1048576}
        errmsg = "\000\000\000\000\000\000\000\000\260\017\002\000\000\000\000\000P0q\002\000\000\000\000\030\234\272\334\375\177\000\000\003\000\000\000\000\000\000\000\312m\310\353\341\177\000\000\000\000\000\000\000\000\000\000\b\000\000\000\000\000\000\000袣\272\334\375\177\000\000\340\354\211\000\000\000\000\000\070\234\272\334\375\177\000\000\324^R\000\000\000\000\000\000\000\000"
        pidfd = -1
(gdb)

I'm not sure if retry-on should be allowed with mode tcp. It doesn't really make sense other than retry-on conn-failure. Or all-retryable-errors needs to have a different meaning depending on which mode is used.

Indeed, it makes no sense, I was sure I checked if the proxy was indeed in mode http before attempting to store the data, quite obviously not, I'll fix that

Was this page helpful?
0 / 5 - 0 ratings