Envoy: Envoy Crashes at 300+ req/s using hot-restarter and systemd

Created on 28 Feb 2018  路  4Comments  路  Source: envoyproxy/envoy

Description:
I created a systemd service to call hot-restarter.py to start envoy. Load tested it with hundreds of RPS.
Envoy crashed.

Envoy version: 1.5.0

Repro steps:
Enable systemd
Start envoy.service using systemd
Give it loads of traffic

_My envoy.service:_

[Unit]
Description=Envoy meeeen
After=network.target
[Service]
User=root
Type=simple
ExecStart=/etc/envoy/hot-restarter.py /etc/envoy/start-envoy.sh
ExecStartPre=/etc/envoy/check_envoy.sh
ExecReload=/etc/envoy/reload_envoy.sh $MAINPID
ExecStop=/bin/kill -15 $MAINPID
TimeoutStopSec=10
KillMode=process
[Install]
WantedBy=multi-user.target

_start-envoy.sh:_

#!/bin/bash
set -e
/usr/sbin/envoy -c /etc/envoy/config.yaml --mode validate --base-id 6969;
if [ ! $? ]; then
exit 1;
fi
exec /usr/sbin/envoy -c /etc/envoy/config.yaml --restart-epoch $RESTART_EPOCH

_check_envoy.sh:_

!/bin/bash

set -e
if [ -s /etc/envoy/config.yaml ]; then
/usr/sbin/envoy -c /etc/envoy/config.yaml --mode validate;
else
echo "File /etc/envoy/config.yaml is empty!"
exit 1;
fi

_reload_envoy.sh:_

!/bin/bash

set -e
export MAIN_PID=$1
/usr/sbin/envoy -c /etc/envoy/config.yaml --mode validate --base-id 6969;
kill -1 $MAINPID;

Config:
envoy.yaml:

static_resources:
listeners:

  • address: #http-address
    socket_address:
    address: 0.0.0.0
    port_value: 80
    filter_chains:

    • filters:



      • name: envoy.http_connection_manager


        config:


        codec_type: AUTO


        stat_prefix: ingress_http


        access_log:





        • name: envoy.file_access_log



          config:



          path: /var/log/envoy/http-access.log



          http_filters:



        • name: envoy.router



          route_config:



          virtual_hosts: #http-hosts



        • name: redirect-https



          require_tls: all



          domains:







          • example.com







        • name: example



          domains:







          • example.com




            routes:




          • match:




            prefix: ""




            route:




            cluster: example










  • address: #https-address
    socket_address:
    address: 0.0.0.0
    port_value: 443
    filter_chains:

    • filters:



      • name: envoy.http_connection_manager


        config:


        codec_type: AUTO


        stat_prefix: ingress_http


        access_log:





        • name: envoy.file_access_log



          config:



          path: /var/log/envoy/http-access.log



          http_filters:



        • name: envoy.router



          route_config:



          virtual_hosts: #https-hosts



        • name: example



          domains:







          • example.com




            routes:




          • match:




            prefix: ""




            route:




            cluster: example




            clusters:










    • name: example

      type: STRICT_DNS

      connect_timeout:

      seconds: 60

      nanos: 0

      lb_policy: ROUND_ROBIN

      hosts:

    • socket_address:

      address: example-backend.com

      port_value: 80

      admin:

      access_log_path: /dev/null

      address:

      socket_address:

      address: 0.0.0.0

      port_value: 8001

Call Stack: (syslog)

Feb 28 07:53:18 envoy-machine hot-restarter.py[6744]: [2018-02-28 07:53:18.281][6748][info][config] source/server/listener_manager_impl.cc:482] all dependencies initialized. starting workers
Feb 28 07:55:33 envoy-machine hot-restarter.py[6744]: [2018-02-28 07:55:33.610][6751][critical][assert] source/common/network/address_impl.cc:112] assert failure: fd != -1
Feb 28 07:55:33 envoy-machine hot-restarter.py[6744]: [2018-02-28 07:55:33.611][6751][critical][backtrace] bazel-out/local-fastbuild/bin/source/server/_virtual_includes/backtrace_lib/server/backtrace.h:101] Caught Aborted, suspect fault
ing address 0x1a5c
Feb 28 07:55:33 envoy-machine hot-restarter.py[6744]: [2018-02-28 07:55:33.611][6751][critical][backtrace] bazel-out/local-fastbuild/bin/source/server/_virtual_includes/backtrace_lib/server/backtrace.h:85] Backtrace obj x-gnu/libc.so.6> thr<6751> (use tools/stack_decode.py):
Feb 28 07:55:33 envoy-machine hot-restarter.py[6744]: [2018-02-28 07:55:33.612][6751][critical][backtrace] bazel-out/local-fastbuild/bin/source/server/_virtual_includes/backtrace_lib/server/backtrace.h:95] thr<6751> #0 0x7fbbd94a2428
Feb 28 07:55:33 envoy-machine hot-restarter.py[6744]: [2018-02-28 07:55:33.612][6751][critical][backtrace] bazel-out/local-fastbuild/bin/source/server/_virtual_includes/backtrace_lib/server/backtrace.h:95] thr<6751> #1 0x7fbbd94a4029
Feb 28 07:55:33 envoy-machine hot-restarter.py[6744]: [2018-02-28 07:55:33.612][6751][critical][backtrace] bazel-out/local-fastbuild/bin/source/server/_virtual_includes/backtrace_lib/server/backtrace.h:93] thr<6751> obj
Feb 28 07:55:33 envoy-machine hot-restarter.py[6744]: [2018-02-28 07:55:33.612][6751][critical][backtrace] bazel-out/local-fastbuild/bin/source/server/_virtual_includes/backtrace_lib/server/backtrace.h:95] thr<6751> #2 0x9acf51
Feb 28 07:55:33 envoy-machine hot-restarter.py[6744]: [2018-02-28 07:55:33.612][6751][critical][backtrace] bazel-out/local-fastbuild/bin/source/server/_virtual_includes/backtrace_lib/server/backtrace.h:95] thr<6751> #3 0x9ad503
Feb 28 07:55:33 envoy-machine hot-restarter.py[6744]: [2018-02-28 07:55:33.613][6751][critical][backtrace] bazel-out/local-fastbuild/bin/source/server/_virtual_includes/backtrace_lib/server/backtrace.h:95] thr<6751> #4 0x6fec76
Feb 28 07:55:33 envoy-machine hot-restarter.py[6744]: [2018-02-28 07:55:33.613][6751][critical][backtrace] bazel-out/local-fastbuild/bin/source/server/_virtual_includes/backtrace_lib/server/backtrace.h:95] thr<6751> #5 0x5f1e6c
Feb 28 07:55:33 envoy-machine hot-restarter.py[6744]: [2018-02-28 07:55:33.613][6751][critical][backtrace] bazel-out/local-fastbuild/bin/source/server/_virtual_includes/backtrace_lib/server/backtrace.h:95] thr<6751> #6 0x691149
Feb 28 07:55:33 envoy-machine hot-restarter.py[6744]: [2018-02-28 07:55:33.613][6751][critical][backtrace] bazel-out/local-fastbuild/bin/source/server/_virtual_includes/backtrace_lib/server/backtrace.h:95] thr<6751> #7 0x690f50
Feb 28 07:55:33 envoy-machine hot-restarter.py[6744]: [2018-02-28 07:55:33.613][6751][critical][backtrace] bazel-out/local-fastbuild/bin/source/server/_virtual_includes/backtrace_lib/server/backtrace.h:95] thr<6751> #8 0x684d42
Feb 28 07:55:33 envoy-machine hot-restarter.py[6744]: [2018-02-28 07:55:33.613][6751][critical][backtrace] bazel-out/local-fastbuild/bin/source/server/_virtual_includes/backtrace_lib/server/backtrace.h:95] thr<6751> #9 0x68320c
Feb 28 07:55:33 envoy-machine hot-restarter.py[6744]: [2018-02-28 07:55:33.614][6751][critical][backtrace] bazel-out/local-fastbuild/bin/source/server/_virtual_includes/backtrace_lib/server/backtrace.h:95] thr<6751> #10 0x683534
Feb 28 07:55:33 envoy-machine hot-restarter.py[6744]: [2018-02-28 07:55:33.614][6751][critical][backtrace] bazel-out/local-fastbuild/bin/source/server/_virtual_includes/backtrace_lib/server/backtrace.h:95] thr<6751> #11 0x8898a4
Feb 28 07:55:33 envoy-machine hot-restarter.py[6744]: [2018-02-28 07:55:33.614][6751][critical][backtrace] bazel-out/local-fastbuild/bin/source/server/_virtual_includes/backtrace_lib/server/backtrace.h:95] thr<6751> #12 0x885fbb
Feb 28 07:55:33 envoy-machine hot-restarter.py[6744]: [2018-02-28 07:55:33.614][6751][critical][backtrace] bazel-out/local-fastbuild/bin/source/server/_virtual_includes/backtrace_lib/server/backtrace.h:95] thr<6751> #13 0x77cac1
Feb 28 07:55:33 envoy-machine hot-restarter.py[6744]: [2018-02-28 07:55:33.614][6751][critical][backtrace] bazel-out/local-fastbuild/bin/source/server/_virtual_includes/backtrace_lib/server/backtrace.h:95] thr<6751> #14 0x77c225
Feb 28 07:55:33 envoy-machine hot-restarter.py[6744]: [2018-02-28 07:55:33.614][6751][critical][backtrace] bazel-out/local-fastbuild/bin/source/server/_virtual_includes/backtrace_lib/server/backtrace.h:95] thr<6751> #15 0x7a58fd
Feb 28 07:55:33 envoy-machine hot-restarter.py[6744]: [2018-02-28 07:55:33.615][6751][critical][backtrace] bazel-out/local-fastbuild/bin/source/server/_virtual_includes/backtrace_lib/server/backtrace.h:95] thr<6751> #16 0x7a3908
Feb 28 07:55:33 envoy-machine hot-restarter.py[6744]: [2018-02-28 07:55:33.615][6751][critical][backtrace] bazel-out/local-fastbuild/bin/source/server/_virtual_includes/backtrace_lib/server/backtrace.h:95] thr<6751> #17 0x7a392c
Feb 28 07:55:33 envoy-machine hot-restarter.py[6744]: [2018-02-28 07:55:33.615][6751][critical][backtrace] bazel-out/local-fastbuild/bin/source/server/_virtual_includes/backtrace_lib/server/backtrace.h:95] thr<6751> #18 0x7aeae7
Feb 28 07:55:33 envoy-machine hot-restarter.py[6744]: [2018-02-28 07:55:33.615][6751][critical][backtrace] bazel-out/local-fastbuild/bin/source/server/_virtual_includes/backtrace_lib/server/backtrace.h:95] thr<6751> #19 0x7a43e0
Feb 28 07:55:33 envoy-machine hot-restarter.py[6744]: [2018-02-28 07:55:33.615][6751][critical][backtrace] bazel-out/local-fastbuild/bin/source/server/_virtual_includes/backtrace_lib/server/backtrace.h:95] thr<6751> #20 0x7a42ca
Feb 28 07:55:33 envoy-machine hot-restarter.py[6744]: [2018-02-28 07:55:33.615][6751][critical][backtrace] bazel-out/local-fastbuild/bin/source/server/_virtual_includes/backtrace_lib/server/backtrace.h:95] thr<6751> #21 0x779a66
Feb 28 07:55:33 envoy-machine hot-restarter.py[6744]: [2018-02-28 07:55:33.615][6751][critical][backtrace] bazel-out/local-fastbuild/bin/source/server/_virtual_includes/backtrace_lib/server/backtrace.h:95] thr<6751> #22 0x703589
Feb 28 07:55:33 envoy-machine hot-restarter.py[6744]: [2018-02-28 07:55:33.615][6751][critical][backtrace] bazel-out/local-fastbuild/bin/source/server/_virtual_includes/backtrace_lib/server/backtrace.h:95] thr<6751> #23 0x703605
Feb 28 07:55:33 envoy-machine hot-restarter.py[6744]: [2018-02-28 07:55:33.615][6751][critical][backtrace] bazel-out/local-fastbuild/bin/source/server/_virtual_includes/backtrace_lib/server/backtrace.h:95] thr<6751> #24 0x6fd5a9
Feb 28 07:55:33 envoy-machine hot-restarter.py[6744]: [2018-02-28 07:55:33.615][6751][critical][backtrace] bazel-out/local-fastbuild/bin/source/server/_virtual_includes/backtrace_lib/server/backtrace.h:95] thr<6751> #25 0x6fe15f
Feb 28 07:55:33 envoy-machine hot-restarter.py[6744]: [2018-02-28 07:55:33.615][6751][critical][backtrace] bazel-out/local-fastbuild/bin/source/server/_virtual_includes/backtrace_lib/server/backtrace.h:95] thr<6751> #26 0x6fdf2a
Feb 28 07:55:33 envoy-machine hot-restarter.py[6744]: [2018-02-28 07:55:33.615][6751][critical][backtrace] bazel-out/local-fastbuild/bin/source/server/_virtual_includes/backtrace_lib/server/backtrace.h:95] thr<6751> #27 0x6fbf28
Feb 28 07:55:33 envoy-machine hot-restarter.py[6744]: [2018-02-28 07:55:33.616][6751][critical][backtrace] bazel-out/local-fastbuild/bin/source/server/_virtual_includes/backtrace_lib/server/backtrace.h:95] thr<6751> #28 0x6ff269
Feb 28 07:55:33 envoy-machine hot-restarter.py[6744]: [2018-02-28 07:55:33.616][6751][critical][backtrace] bazel-out/local-fastbuild/bin/source/server/_virtual_includes/backtrace_lib/server/backtrace.h:95] thr<6751> #29 0x5f84ed
Feb 28 07:55:33 envoy-machine hot-restarter.py[6744]: [2018-02-28 07:55:33.616][6751][critical][backtrace] bazel-out/local-fastbuild/bin/source/server/_virtual_includes/backtrace_lib/server/backtrace.h:95] thr<6751> #30 0x5f74fd
Feb 28 07:55:33 envoy-machine hot-restarter.py[6744]: [2018-02-28 07:55:33.616][6751][critical][backtrace] bazel-out/local-fastbuild/bin/source/server/_virtual_includes/backtrace_lib/server/backtrace.h:95] thr<6751> #31 0x5f752d
Feb 28 07:55:33 envoy-machine hot-restarter.py[6744]: [2018-02-28 07:55:33.616][6751][critical][backtrace] bazel-out/local-fastbuild/bin/source/server/_virtual_includes/backtrace_lib/server/backtrace.h:95] thr<6751> #32 0xa344d1
Feb 28 07:55:33 envoy-machine hot-restarter.py[6744]: [2018-02-28 07:55:33.616][6751][critical][backtrace] bazel-out/local-fastbuild/bin/source/server/_virtual_includes/backtrace_lib/server/backtrace.h:95] thr<6751> #33 0xa34c2e
Feb 28 07:55:33 envoy-machine hot-restarter.py[6744]: [2018-02-28 07:55:33.616][6751][critical][backtrace] bazel-out/local-fastbuild/bin/source/server/_virtual_includes/backtrace_lib/server/backtrace.h:95] thr<6751> #34 0x5f28c7
Feb 28 07:55:33 envoy-machine hot-restarter.py[6744]: [2018-02-28 07:55:33.616][6751][critical][backtrace] bazel-out/local-fastbuild/bin/source/server/_virtual_includes/backtrace_lib/server/backtrace.h:95] thr<6751> #35 0x5e5007
Feb 28 07:55:33 envoy-machine hot-restarter.py[6744]: [2018-02-28 07:55:33.616][6751][critical][backtrace] bazel-out/local-fastbuild/bin/source/server/_virtual_includes/backtrace_lib/server/backtrace.h:95] thr<6751> #36 0x5e4b97
Feb 28 07:55:33 envoy-machine hot-restarter.py[6744]: [2018-02-28 07:55:33.616][6751][critical][backtrace] bazel-out/local-fastbuild/bin/source/server/_virtual_includes/backtrace_lib/server/backtrace.h:95] thr<6751> #37 0x5e56e6
Feb 28 07:55:33 envoy-machine hot-restarter.py[6744]: [2018-02-28 07:55:33.616][6751][critical][backtrace] bazel-out/local-fastbuild/bin/source/server/_virtual_includes/backtrace_lib/server/backtrace.h:95] thr<6751> #38 0x4a1d31
Feb 28 07:55:33 envoy-machine hot-restarter.py[6744]: [2018-02-28 07:55:33.616][6751][critical][backtrace] bazel-out/local-fastbuild/bin/source/server/_virtual_includes/backtrace_lib/server/backtrace.h:95] thr<6751> #39 0xa3eb9f
Feb 28 07:55:33 envoy-machine hot-restarter.py[6744]: [2018-02-28 07:55:33.616][6751][critical][backtrace] bazel-out/local-fastbuild/bin/source/server/_virtual_includes/backtrace_lib/server/backtrace.h:95] thr<6751> #40 0xa3ebc4
Feb 28 07:55:33 envoy-machine hot-restarter.py[6744]: [2018-02-28 07:55:33.616][6751][critical][backtrace] bazel-out/local-fastbuild/bin/source/server/_virtual_includes/backtrace_lib/server/backtrace.h:93] thr<6751> obj
Feb 28 07:55:33 envoy-machine hot-restarter.py[6744]: [2018-02-28 07:55:33.616][6751][critical][backtrace] bazel-out/local-fastbuild/bin/source/server/_virtual_includes/backtrace_lib/server/backtrace.h:95] thr<6751> #41 0x7fbbd9b476b9
Feb 28 07:55:33 envoy-machine hot-restarter.py[6744]: [2018-02-28 07:55:33.616][6751][critical][backtrace] bazel-out/local-fastbuild/bin/source/server/_virtual_includes/backtrace_lib/server/backtrace.h:93] thr<6751> obj
Feb 28 07:55:33 envoy-machine hot-restarter.py[6744]: [2018-02-28 07:55:33.616][6751][critical][backtrace] bazel-out/local-fastbuild/bin/source/server/_virtual_includes/backtrace_lib/server/backtrace.h:95] thr<6751> #42 0x7fbbd957441c
Feb 28 07:55:33 envoy-machine hot-restarter.py[6744]: [2018-02-28 07:55:33.616][6751][critical][backtrace] bazel-out/local-fastbuild/bin/source/server/_virtual_includes/backtrace_lib/server/backtrace.h:97] end backtrace thread 6751

bug

Most helpful comment

It turned out that it really is an open file limit problem.
Systemd doesn't respect global ulimit cofig on /etc/security/security.conf and has its own defaults

I have to add

LimitNOFILE=65536

in the systemd [Service] section to bump its limit up

That resolved the problem. Thanks!

All 4 comments

Feb 28 07:55:33 envoy-machine hot-restarter.py[6744]: [2018-02-28 07:55:33.610][6751][critical][assert] source/common/network/address_impl.cc:112] assert failure: fd != -1

Seems it is related to your ulimit settings?

ulimit -n
65536

That's the ulimit setting.
Is there any requirement from envoy to bump it up?

@yudiandreanp can you provide a core dump or a fully resolved stack trace if you can repro this? It's hard to tell what is happening from the report.

It turned out that it really is an open file limit problem.
Systemd doesn't respect global ulimit cofig on /etc/security/security.conf and has its own defaults

I have to add

LimitNOFILE=65536

in the systemd [Service] section to bump its limit up

That resolved the problem. Thanks!

Was this page helpful?
0 / 5 - 0 ratings

Related issues

dstrelau picture dstrelau  路  3Comments

roelfdutoit picture roelfdutoit  路  3Comments

sabiurr picture sabiurr  路  3Comments

boncheo picture boncheo  路  3Comments

karthequian picture karthequian  路  3Comments