Envoy: Envoy in container does not get PID 1 / does not receive SIGTERM

Created on 9 Nov 2020  路  3Comments  路  Source: envoyproxy/envoy

When using one of the official Envoy Docker images like envoyproxy/envoy-alpine, by default Envoy will start as a child of PID 1 instead of PID 1 directly. This will prevent a graceful shutdown of the container, as Envoy will never receive the SIGTERMs sent by the container runtime.

Repro steps

$ docker run -d envoyproxy/envoy-alpine:v1.16.0
<container-id>

$ docker exec <container-id> ps
# Envoy is on PID 7, not 1
PID   USER     TIME  COMMAND
    1 root      0:00 sh /docker-entrypoint.sh envoy -c /etc/envoy/envoy.yaml
    7 envoy     0:00 envoy -c /etc/envoy/envoy.yaml
   18 root      0:00 ps

$ docker stop -t 120 <container-id>
# SIGTERM is sent to PID 1, but sh has no signal handler.
# The container keeps running for 120 seconds until a SIGKILL is sent to PID 1. 

This issue affects images v1.15.0 onwards and affects non-Alpine images as well.

Cause

Executing Envoy as root sidesteps the problem:

$ docker run -e ENVOY_UID=0 -d envoyproxy/envoy-alpine:v1.16.0
<container-id>

$ docker exec <container-id> ps
PID   USER     TIME  COMMAND
    1 root      0:00 envoy -c /etc/envoy/envoy.yaml
   16 root      0:00 ps

$ docker stop -t 120 <container-id>
# container shuts down immediately

$ docker logs <container-id> 2>&1 | tail -n 4
[2020-11-09 11:59:10.801][1][warning][main] [source/server/server.cc:617] caught SIGTERM
[2020-11-09 11:59:10.801][1][info][main] [source/server/server.cc:738] shutting down server instance
[2020-11-09 11:59:10.801][1][info][main] [source/server/server.cc:685] main dispatch loop exited
[2020-11-09 11:59:10.802][1][info][main] [source/server/server.cc:731] exiting

It seems like su-exec does not replace PID 1 like exec does:

https://github.com/envoyproxy/envoy/blob/49cbae46d7d4bad644adcc4f809e3499e44406da/ci/docker-entrypoint.sh#L18-L30

I'm not sure su-exec is meant to work within a shell script? For example, this works:

FROM alpine:latest
RUN apk add --no-cache su-exec && adduser --no-create-home -S envoy
ENTRYPOINT ["su-exec", "envoy"]
CMD ["ps"]
$ docker run $(docker build . -q)
PID   USER     TIME  COMMAND
    1 envoy    0:00 ps



md5-82f2ed6c4e3ebb638fb9a29de777b6f0



it does not:



md5-337ae08459f07ba920c4fb5a05907659



```sh
$ docker run $(docker build . -q)
PID   USER     TIME  COMMAND
    1 root      0:00 sh /entrypoint.sh ps
    7 envoy    0:00 ps
bug

All 3 comments

hi @anatolebeuzon thanks for raising this.

im kinda surprised that su-exec is not working as exec does - i thought that was its purpose.

i can confirm what you have shown above - ill look into this a bit further

@anatolebeuzon

i did a quick rebuild and replaced the su-exec line with the following

exec su-exec envoy "${@}"

and all seems to be correct.

can you confirm this fixes the problem?

if it works and you are happy to PR, please do - otherwise, i can.

This fixes the problem indeed. Thanks! PR: https://github.com/envoyproxy/envoy/pull/13946

Was this page helpful?
0 / 5 - 0 ratings

Related issues

jeremybaumont picture jeremybaumont  路  3Comments

phlax picture phlax  路  3Comments

roelfdutoit picture roelfdutoit  路  3Comments

dstrelau picture dstrelau  路  3Comments

vpiduri picture vpiduri  路  3Comments