Something is off when beanstalkd is ran in docker container as first process.
% docker run -it --pid=container:bean --net=container:bean --cap-add sys_admin alpine sh
/ # kill -TERM 1
/ # kill -TERM 1
/ # kill -KILL 1
/ # kill -INT 1
/ # kill -HUP 1
/ #
I have attached strace. Notice that nothing happened for -KILL signal:
% docker run -t --pid=container:bean --net=container:bean --cap-add sys_admin --cap-add sys_ptrace strace
strace: Process 1 attached
epoll_pwait(6, 0x7ffff54d60dc, 1, 3600000, NULL, 8) = -1 EINTR (Interrupted system call)
--- SIGTERM {si_signo=SIGTERM, si_code=SI_USER, si_pid=14, si_uid=0} ---
epoll_pwait(6, 0x7ffff54d60dc, 1, 3600000, NULL, 8) = -1 EINTR (Interrupted system call)
--- SIGTERM {si_signo=SIGTERM, si_code=SI_USER, si_pid=14, si_uid=0} ---
epoll_pwait(6, 0x7ffff54d60dc, 1, 3600000, NULL, 8) = -1 EINTR (Interrupted system call)
--- SIGINT {si_signo=SIGINT, si_code=SI_USER, si_pid=14, si_uid=0} ---
epoll_pwait(6, 0x7ffff54d60dc, 1, 3600000, NULL, 8) = -1 EINTR (Interrupted system call)
--- SIGHUP {si_signo=SIGHUP, si_code=SI_USER, si_pid=14, si_uid=0} ---
epoll_pwait(6,
Dockerfile that uses beanstalkd 1.10 build on alpine:
FROM alpine
RUN apk add --no-cache beanstalkd
RUN mkdir /beanstalkd
EXPOSE 11300
ENTRYPOINT ["/usr/bin/beanstalkd", "-z", "650000000"]
CMD ["-b", "/beanstalkd", "-f", "300"]
I have tried debian:jessie with the same result. So it is not Alpine linux.
The code where it happens:
r = epoll_wait(epfd, &ev, 1, (int)(timeout/1000000));
if (r == -1 && errno != EINTR) {
twarn("epoll_wait");
exit(1);
}
It means that on regular clean Linux when I sent -TERM to the process return from epoll_wait never happens and process is immediately cleaned by OS. But on Docker the call returns and r==-1, errno=EINTR.
This issue is critical. At this moment beanstalkd is killed with -KILL signal by Docker after timeout of (10?) seconds. This leads to slow restarts and praying that kill did not corrupt WAL file. _Ugh..._ And I wondered why restarts of beanstalkd containers are so slow. 馃槶
@ysmolsky Hi! This problem is still exists. When we run beanstalk as PID1 in a container, he is not catching SIGTERM signal and still goes on. After 10 seconds we have 137 error exit code.
beanstalkd 1.11
Uhm, v1.11 does not have this fix. You should try v1.12 for that.
@ysmolsky Thanks a lot! I need to test it now.
I will save you some time: do not bother. It is broken in v1.12 too. I have just tested and looking at the code I made a mistake of assuming that process can SIGKILL itself even if it has pid=1. But Linux blocks this as a feature.
@GenaANTG you should try latest commit from master branch. It contains the fix.
@ysmolsky Thanks a lot!
Most helpful comment
@GenaANTG you should try latest commit from master branch. It contains the fix.