Description:
I created a systemd service to call hot-restarter.py to start envoy. Load tested it with hundreds of RPS.
Envoy crashed.
Envoy version: 1.5.0
Repro steps:
Enable systemd
Start envoy.service using systemd
Give it loads of traffic
_My envoy.service:_
[Unit]
Description=Envoy meeeen
After=network.target
[Service]
User=root
Type=simple
ExecStart=/etc/envoy/hot-restarter.py /etc/envoy/start-envoy.sh
ExecStartPre=/etc/envoy/check_envoy.sh
ExecReload=/etc/envoy/reload_envoy.sh $MAINPID
ExecStop=/bin/kill -15 $MAINPID
TimeoutStopSec=10
KillMode=process
[Install]
WantedBy=multi-user.target
_start-envoy.sh:_
#!/bin/bash
set -e
/usr/sbin/envoy -c /etc/envoy/config.yaml --mode validate --base-id 6969;
if [ ! $? ]; then
exit 1;
fi
exec /usr/sbin/envoy -c /etc/envoy/config.yaml --restart-epoch $RESTART_EPOCH
_check_envoy.sh:_
!/bin/bash
set -e
if [ -s /etc/envoy/config.yaml ]; then
/usr/sbin/envoy -c /etc/envoy/config.yaml --mode validate;
else
echo "File /etc/envoy/config.yaml is empty!"
exit 1;
fi
_reload_envoy.sh:_
!/bin/bash
set -e
export MAIN_PID=$1
/usr/sbin/envoy -c /etc/envoy/config.yaml --mode validate --base-id 6969;
kill -1 $MAINPID;
Config:
envoy.yaml:
static_resources:
listeners:
- address: #http-address
socket_address:
address: 0.0.0.0
port_value: 80
filter_chains:
- filters:
- name: envoy.http_connection_manager
config:
codec_type: AUTO
stat_prefix: ingress_http
access_log:
- name: envoy.file_access_log
config:
path: /var/log/envoy/http-access.log
http_filters:
- name: envoy.router
route_config:
virtual_hosts: #http-hosts
- name: redirect-https
require_tls: all
domains:
- example.com
- name: example
domains:
- example.com
routes:
- match:
prefix: ""
route:
cluster: example
- address: #https-address
socket_address:
address: 0.0.0.0
port_value: 443
filter_chains:
- filters:
- name: envoy.http_connection_manager
config:
codec_type: AUTO
stat_prefix: ingress_http
access_log:
- name: envoy.file_access_log
config:
path: /var/log/envoy/http-access.log
http_filters:
- name: envoy.router
route_config:
virtual_hosts: #https-hosts
- name: example
domains:
- example.com
routes:
- match:
prefix: ""
route:
cluster: example
clusters:
- name: example
type: STRICT_DNS
connect_timeout:
seconds: 60
nanos: 0
lb_policy: ROUND_ROBIN
hosts:
- socket_address:
address: example-backend.com
port_value: 80
admin:
access_log_path: /dev/null
address:
socket_address:
address: 0.0.0.0
port_value: 8001
Call Stack: (syslog)
Feb 28 07:53:18 envoy-machine hot-restarter.py[6744]: [2018-02-28 07:53:18.281][6748][info][config] source/server/listener_manager_impl.cc:482] all dependencies initialized. starting workers
Feb 28 07:55:33 envoy-machine hot-restarter.py[6744]: [2018-02-28 07:55:33.610][6751][critical][assert] source/common/network/address_impl.cc:112] assert failure: fd != -1
Feb 28 07:55:33 envoy-machine hot-restarter.py[6744]: [2018-02-28 07:55:33.611][6751][critical][backtrace] bazel-out/local-fastbuild/bin/source/server/_virtual_includes/backtrace_lib/server/backtrace.h:101] Caught Aborted, suspect fault
ing address 0x1a5c
Feb 28 07:55:33 envoy-machine hot-restarter.py[6744]: [2018-02-28 07:55:33.611][6751][critical][backtrace] bazel-out/local-fastbuild/bin/source/server/_virtual_includes/backtrace_lib/server/backtrace.h:85] Backtrace obj x-gnu/libc.so.6> thr<6751> (use tools/stack_decode.py):
Feb 28 07:55:33 envoy-machine hot-restarter.py[6744]: [2018-02-28 07:55:33.612][6751][critical][backtrace] bazel-out/local-fastbuild/bin/source/server/_virtual_includes/backtrace_lib/server/backtrace.h:95] thr<6751> #0 0x7fbbd94a2428
Feb 28 07:55:33 envoy-machine hot-restarter.py[6744]: [2018-02-28 07:55:33.612][6751][critical][backtrace] bazel-out/local-fastbuild/bin/source/server/_virtual_includes/backtrace_lib/server/backtrace.h:95] thr<6751> #1 0x7fbbd94a4029
Feb 28 07:55:33 envoy-machine hot-restarter.py[6744]: [2018-02-28 07:55:33.612][6751][critical][backtrace] bazel-out/local-fastbuild/bin/source/server/_virtual_includes/backtrace_lib/server/backtrace.h:93] thr<6751> obj
Feb 28 07:55:33 envoy-machine hot-restarter.py[6744]: [2018-02-28 07:55:33.612][6751][critical][backtrace] bazel-out/local-fastbuild/bin/source/server/_virtual_includes/backtrace_lib/server/backtrace.h:95] thr<6751> #2 0x9acf51
Feb 28 07:55:33 envoy-machine hot-restarter.py[6744]: [2018-02-28 07:55:33.612][6751][critical][backtrace] bazel-out/local-fastbuild/bin/source/server/_virtual_includes/backtrace_lib/server/backtrace.h:95] thr<6751> #3 0x9ad503
Feb 28 07:55:33 envoy-machine hot-restarter.py[6744]: [2018-02-28 07:55:33.613][6751][critical][backtrace] bazel-out/local-fastbuild/bin/source/server/_virtual_includes/backtrace_lib/server/backtrace.h:95] thr<6751> #4 0x6fec76
Feb 28 07:55:33 envoy-machine hot-restarter.py[6744]: [2018-02-28 07:55:33.613][6751][critical][backtrace] bazel-out/local-fastbuild/bin/source/server/_virtual_includes/backtrace_lib/server/backtrace.h:95] thr<6751> #5 0x5f1e6c
Feb 28 07:55:33 envoy-machine hot-restarter.py[6744]: [2018-02-28 07:55:33.613][6751][critical][backtrace] bazel-out/local-fastbuild/bin/source/server/_virtual_includes/backtrace_lib/server/backtrace.h:95] thr<6751> #6 0x691149
Feb 28 07:55:33 envoy-machine hot-restarter.py[6744]: [2018-02-28 07:55:33.613][6751][critical][backtrace] bazel-out/local-fastbuild/bin/source/server/_virtual_includes/backtrace_lib/server/backtrace.h:95] thr<6751> #7 0x690f50
Feb 28 07:55:33 envoy-machine hot-restarter.py[6744]: [2018-02-28 07:55:33.613][6751][critical][backtrace] bazel-out/local-fastbuild/bin/source/server/_virtual_includes/backtrace_lib/server/backtrace.h:95] thr<6751> #8 0x684d42
Feb 28 07:55:33 envoy-machine hot-restarter.py[6744]: [2018-02-28 07:55:33.613][6751][critical][backtrace] bazel-out/local-fastbuild/bin/source/server/_virtual_includes/backtrace_lib/server/backtrace.h:95] thr<6751> #9 0x68320c
Feb 28 07:55:33 envoy-machine hot-restarter.py[6744]: [2018-02-28 07:55:33.614][6751][critical][backtrace] bazel-out/local-fastbuild/bin/source/server/_virtual_includes/backtrace_lib/server/backtrace.h:95] thr<6751> #10 0x683534
Feb 28 07:55:33 envoy-machine hot-restarter.py[6744]: [2018-02-28 07:55:33.614][6751][critical][backtrace] bazel-out/local-fastbuild/bin/source/server/_virtual_includes/backtrace_lib/server/backtrace.h:95] thr<6751> #11 0x8898a4
Feb 28 07:55:33 envoy-machine hot-restarter.py[6744]: [2018-02-28 07:55:33.614][6751][critical][backtrace] bazel-out/local-fastbuild/bin/source/server/_virtual_includes/backtrace_lib/server/backtrace.h:95] thr<6751> #12 0x885fbb
Feb 28 07:55:33 envoy-machine hot-restarter.py[6744]: [2018-02-28 07:55:33.614][6751][critical][backtrace] bazel-out/local-fastbuild/bin/source/server/_virtual_includes/backtrace_lib/server/backtrace.h:95] thr<6751> #13 0x77cac1
Feb 28 07:55:33 envoy-machine hot-restarter.py[6744]: [2018-02-28 07:55:33.614][6751][critical][backtrace] bazel-out/local-fastbuild/bin/source/server/_virtual_includes/backtrace_lib/server/backtrace.h:95] thr<6751> #14 0x77c225
Feb 28 07:55:33 envoy-machine hot-restarter.py[6744]: [2018-02-28 07:55:33.614][6751][critical][backtrace] bazel-out/local-fastbuild/bin/source/server/_virtual_includes/backtrace_lib/server/backtrace.h:95] thr<6751> #15 0x7a58fd
Feb 28 07:55:33 envoy-machine hot-restarter.py[6744]: [2018-02-28 07:55:33.615][6751][critical][backtrace] bazel-out/local-fastbuild/bin/source/server/_virtual_includes/backtrace_lib/server/backtrace.h:95] thr<6751> #16 0x7a3908
Feb 28 07:55:33 envoy-machine hot-restarter.py[6744]: [2018-02-28 07:55:33.615][6751][critical][backtrace] bazel-out/local-fastbuild/bin/source/server/_virtual_includes/backtrace_lib/server/backtrace.h:95] thr<6751> #17 0x7a392c
Feb 28 07:55:33 envoy-machine hot-restarter.py[6744]: [2018-02-28 07:55:33.615][6751][critical][backtrace] bazel-out/local-fastbuild/bin/source/server/_virtual_includes/backtrace_lib/server/backtrace.h:95] thr<6751> #18 0x7aeae7
Feb 28 07:55:33 envoy-machine hot-restarter.py[6744]: [2018-02-28 07:55:33.615][6751][critical][backtrace] bazel-out/local-fastbuild/bin/source/server/_virtual_includes/backtrace_lib/server/backtrace.h:95] thr<6751> #19 0x7a43e0
Feb 28 07:55:33 envoy-machine hot-restarter.py[6744]: [2018-02-28 07:55:33.615][6751][critical][backtrace] bazel-out/local-fastbuild/bin/source/server/_virtual_includes/backtrace_lib/server/backtrace.h:95] thr<6751> #20 0x7a42ca
Feb 28 07:55:33 envoy-machine hot-restarter.py[6744]: [2018-02-28 07:55:33.615][6751][critical][backtrace] bazel-out/local-fastbuild/bin/source/server/_virtual_includes/backtrace_lib/server/backtrace.h:95] thr<6751> #21 0x779a66
Feb 28 07:55:33 envoy-machine hot-restarter.py[6744]: [2018-02-28 07:55:33.615][6751][critical][backtrace] bazel-out/local-fastbuild/bin/source/server/_virtual_includes/backtrace_lib/server/backtrace.h:95] thr<6751> #22 0x703589
Feb 28 07:55:33 envoy-machine hot-restarter.py[6744]: [2018-02-28 07:55:33.615][6751][critical][backtrace] bazel-out/local-fastbuild/bin/source/server/_virtual_includes/backtrace_lib/server/backtrace.h:95] thr<6751> #23 0x703605
Feb 28 07:55:33 envoy-machine hot-restarter.py[6744]: [2018-02-28 07:55:33.615][6751][critical][backtrace] bazel-out/local-fastbuild/bin/source/server/_virtual_includes/backtrace_lib/server/backtrace.h:95] thr<6751> #24 0x6fd5a9
Feb 28 07:55:33 envoy-machine hot-restarter.py[6744]: [2018-02-28 07:55:33.615][6751][critical][backtrace] bazel-out/local-fastbuild/bin/source/server/_virtual_includes/backtrace_lib/server/backtrace.h:95] thr<6751> #25 0x6fe15f
Feb 28 07:55:33 envoy-machine hot-restarter.py[6744]: [2018-02-28 07:55:33.615][6751][critical][backtrace] bazel-out/local-fastbuild/bin/source/server/_virtual_includes/backtrace_lib/server/backtrace.h:95] thr<6751> #26 0x6fdf2a
Feb 28 07:55:33 envoy-machine hot-restarter.py[6744]: [2018-02-28 07:55:33.615][6751][critical][backtrace] bazel-out/local-fastbuild/bin/source/server/_virtual_includes/backtrace_lib/server/backtrace.h:95] thr<6751> #27 0x6fbf28
Feb 28 07:55:33 envoy-machine hot-restarter.py[6744]: [2018-02-28 07:55:33.616][6751][critical][backtrace] bazel-out/local-fastbuild/bin/source/server/_virtual_includes/backtrace_lib/server/backtrace.h:95] thr<6751> #28 0x6ff269
Feb 28 07:55:33 envoy-machine hot-restarter.py[6744]: [2018-02-28 07:55:33.616][6751][critical][backtrace] bazel-out/local-fastbuild/bin/source/server/_virtual_includes/backtrace_lib/server/backtrace.h:95] thr<6751> #29 0x5f84ed
Feb 28 07:55:33 envoy-machine hot-restarter.py[6744]: [2018-02-28 07:55:33.616][6751][critical][backtrace] bazel-out/local-fastbuild/bin/source/server/_virtual_includes/backtrace_lib/server/backtrace.h:95] thr<6751> #30 0x5f74fd
Feb 28 07:55:33 envoy-machine hot-restarter.py[6744]: [2018-02-28 07:55:33.616][6751][critical][backtrace] bazel-out/local-fastbuild/bin/source/server/_virtual_includes/backtrace_lib/server/backtrace.h:95] thr<6751> #31 0x5f752d
Feb 28 07:55:33 envoy-machine hot-restarter.py[6744]: [2018-02-28 07:55:33.616][6751][critical][backtrace] bazel-out/local-fastbuild/bin/source/server/_virtual_includes/backtrace_lib/server/backtrace.h:95] thr<6751> #32 0xa344d1
Feb 28 07:55:33 envoy-machine hot-restarter.py[6744]: [2018-02-28 07:55:33.616][6751][critical][backtrace] bazel-out/local-fastbuild/bin/source/server/_virtual_includes/backtrace_lib/server/backtrace.h:95] thr<6751> #33 0xa34c2e
Feb 28 07:55:33 envoy-machine hot-restarter.py[6744]: [2018-02-28 07:55:33.616][6751][critical][backtrace] bazel-out/local-fastbuild/bin/source/server/_virtual_includes/backtrace_lib/server/backtrace.h:95] thr<6751> #34 0x5f28c7
Feb 28 07:55:33 envoy-machine hot-restarter.py[6744]: [2018-02-28 07:55:33.616][6751][critical][backtrace] bazel-out/local-fastbuild/bin/source/server/_virtual_includes/backtrace_lib/server/backtrace.h:95] thr<6751> #35 0x5e5007
Feb 28 07:55:33 envoy-machine hot-restarter.py[6744]: [2018-02-28 07:55:33.616][6751][critical][backtrace] bazel-out/local-fastbuild/bin/source/server/_virtual_includes/backtrace_lib/server/backtrace.h:95] thr<6751> #36 0x5e4b97
Feb 28 07:55:33 envoy-machine hot-restarter.py[6744]: [2018-02-28 07:55:33.616][6751][critical][backtrace] bazel-out/local-fastbuild/bin/source/server/_virtual_includes/backtrace_lib/server/backtrace.h:95] thr<6751> #37 0x5e56e6
Feb 28 07:55:33 envoy-machine hot-restarter.py[6744]: [2018-02-28 07:55:33.616][6751][critical][backtrace] bazel-out/local-fastbuild/bin/source/server/_virtual_includes/backtrace_lib/server/backtrace.h:95] thr<6751> #38 0x4a1d31
Feb 28 07:55:33 envoy-machine hot-restarter.py[6744]: [2018-02-28 07:55:33.616][6751][critical][backtrace] bazel-out/local-fastbuild/bin/source/server/_virtual_includes/backtrace_lib/server/backtrace.h:95] thr<6751> #39 0xa3eb9f
Feb 28 07:55:33 envoy-machine hot-restarter.py[6744]: [2018-02-28 07:55:33.616][6751][critical][backtrace] bazel-out/local-fastbuild/bin/source/server/_virtual_includes/backtrace_lib/server/backtrace.h:95] thr<6751> #40 0xa3ebc4
Feb 28 07:55:33 envoy-machine hot-restarter.py[6744]: [2018-02-28 07:55:33.616][6751][critical][backtrace] bazel-out/local-fastbuild/bin/source/server/_virtual_includes/backtrace_lib/server/backtrace.h:93] thr<6751> obj
Feb 28 07:55:33 envoy-machine hot-restarter.py[6744]: [2018-02-28 07:55:33.616][6751][critical][backtrace] bazel-out/local-fastbuild/bin/source/server/_virtual_includes/backtrace_lib/server/backtrace.h:95] thr<6751> #41 0x7fbbd9b476b9
Feb 28 07:55:33 envoy-machine hot-restarter.py[6744]: [2018-02-28 07:55:33.616][6751][critical][backtrace] bazel-out/local-fastbuild/bin/source/server/_virtual_includes/backtrace_lib/server/backtrace.h:93] thr<6751> obj
Feb 28 07:55:33 envoy-machine hot-restarter.py[6744]: [2018-02-28 07:55:33.616][6751][critical][backtrace] bazel-out/local-fastbuild/bin/source/server/_virtual_includes/backtrace_lib/server/backtrace.h:95] thr<6751> #42 0x7fbbd957441c
Feb 28 07:55:33 envoy-machine hot-restarter.py[6744]: [2018-02-28 07:55:33.616][6751][critical][backtrace] bazel-out/local-fastbuild/bin/source/server/_virtual_includes/backtrace_lib/server/backtrace.h:97] end backtrace thread 6751
Feb 28 07:55:33 envoy-machine hot-restarter.py[6744]: [2018-02-28 07:55:33.610][6751][critical][assert] source/common/network/address_impl.cc:112] assert failure: fd != -1
Seems it is related to your ulimit settings?
ulimit -n
65536
That's the ulimit setting.
Is there any requirement from envoy to bump it up?
@yudiandreanp can you provide a core dump or a fully resolved stack trace if you can repro this? It's hard to tell what is happening from the report.
It turned out that it really is an open file limit problem.
Systemd doesn't respect global ulimit cofig on /etc/security/security.conf and has its own defaults
I have to add
LimitNOFILE=65536
in the systemd [Service] section to bump its limit up
That resolved the problem. Thanks!
Most helpful comment
It turned out that it really is an open file limit problem.
Systemd doesn't respect global ulimit cofig on /etc/security/security.conf and has its own defaults
I have to add
in the systemd [Service] section to bump its limit up
That resolved the problem. Thanks!