Nixpkgs: ntpd.service does not shut down cleanly

Created on 14 Dec 2016 · 8Comments · Source: NixOS/nixpkgs

Issue description

When trying to run systemctl stop ntpd.service (either directly, or indirectly e.g. as part of the shutdown process) for the first time after boot, the command will wait for 90 seconds, then presumably kill the process. Output in journalctl -x when doing this:

Dec 14 00:06:31 ouroboros sudo[4324]: pam_unix(sudo:session): session opened for user root by (uid=0)
Dec 14 00:06:31 ouroboros systemd[1]: Stopping NTP Daemon...
Dec 14 00:08:01 ouroboros systemd[1]: ntpd.service: State 'stop-sigterm' timed out. Killing.
Dec 14 00:08:01 ouroboros systemd[1]: ntpd.service: Killing process 1069 (ntpd) with signal SIGKILL.
Dec 14 00:08:01 ouroboros systemd[1]: ntpd.service: Main process exited, code=killed, status=31/SYS
Dec 14 00:08:01 ouroboros systemd[1]: Stopped NTP Daemon.
Dec 14 00:08:01 ouroboros systemd[1]: ntpd.service: Unit entered failed state.
Dec 14 00:08:01 ouroboros systemd[1]: ntpd.service: Failed with result 'signal'.
Dec 14 00:08:01 ouroboros sudo[4324]: pam_unix(sudo:session): session closed for user root

During the period that systemd is waiting for the service to exit cleanly, ps afx reports that the ntpd process is actually defunct.

However, after systemctl stop ntpd.service finally completes, the service can then be started and stopped without any failures.

Steps to reproduce

Boot the system
systemctl stop ntpd.service

Technical details

System: NixOS 17.03pre96925.1c50bdd (Gorilla)
Nix version: nix-env (Nix) 1.11.4
Nixpkgs version: 17.03pre96925.1c50bdd

Source

kierdavis

Most helpful comment

The patch that supposedly fixes this was merged.

joachifm on 6 Apr 2017

🎉3

All 8 comments

P.S. my configuration.nix is split into about 12 files to make it easier for me to find options, but much harder to attach it to a Github issue. At some point in the near future I'll merge as much of it as I can into one file and attach it, in case it helps diagnose this problem. It seems like the problem must be fairly system- or configuration-specific, since if this issue affected other people the extra 90 seconds added to the time it takes to shut down would surely be noticeable.

kierdavis on 14 Dec 2016

ntpd probably hangs in some system call during that period. The 90 seconds should give you enough time to use strace -p to find out which one. This should more or less explain what is going on wrong your machine.

Mic92 on 14 Dec 2016

Hi,
same issue here, but when I boot, ntpd is already as defunct. I attach some info and I hope this could be useful for you:

$ ps aux | grep ntp
ntp        894  0.0  0.0      0     0 ?        Zsl  13:26   0:00 [ntpd] <defunct>

strace at that moment:

strace -f -s99999 -v -p 894
strace: Process 894 attached
futex(0x55ae0f586508, FUTEX_WAIT_BITSET_PRIVATE|FUTEX_CLOCK_REALTIME, 0, NULL, ffffffff

strace while systemctl stop ntpd.service is executed:

) = ? ERESTARTSYS (To be restarted if SA_RESTART is set)
futex(0x55ae0f586508, FUTEX_WAIT_BITSET_PRIVATE|FUTEX_CLOCK_REALTIME, 0, NULL, ffffffff <unfinished ...>

and strace when the process is finally killed:

+++ killed by SIGKILL +++

areina on 14 Dec 2016

Thanks. I didn't actually check whether ntpd was defunct or not before
stopping it, so it's quite likely we have the same problem.

EDIT: formatting

kierdavis on 14 Dec 2016

Can confirm that ntpd is defunct at startup and that strace gives the same output as in @areina's comment.

kierdavis on 14 Dec 2016

Ok, this is just an ordinary mutex. In this situation I would have expect that the process is reaped by systemd - maybe the SIGTERM signal was masked by ntpd? I wonder if systemd.services.ntpd.serviceConfig.Type = "simple"; and services.ntp.extraFlags = ["-n"] could fix the problem.

Mic92 on 14 Dec 2016

@Mic92 sadly adding those options doesn't seem to have affected it at all - ntpd is still defunct at startup, and the same strace output is produced when trying to stop the service.

kierdavis on 14 Dec 2016

The patch that supposedly fixes this was merged.

joachifm on 6 Apr 2017

🎉3

Was this page helpful?

0 / 5 - 0 ratings