When trying to run systemctl stop ntpd.service (either directly, or indirectly e.g. as part of the shutdown process) for the first time after boot, the command will wait for 90 seconds, then presumably kill the process. Output in journalctl -x when doing this:
Dec 14 00:06:31 ouroboros sudo[4324]: pam_unix(sudo:session): session opened for user root by (uid=0)
Dec 14 00:06:31 ouroboros systemd[1]: Stopping NTP Daemon...
Dec 14 00:08:01 ouroboros systemd[1]: ntpd.service: State 'stop-sigterm' timed out. Killing.
Dec 14 00:08:01 ouroboros systemd[1]: ntpd.service: Killing process 1069 (ntpd) with signal SIGKILL.
Dec 14 00:08:01 ouroboros systemd[1]: ntpd.service: Main process exited, code=killed, status=31/SYS
Dec 14 00:08:01 ouroboros systemd[1]: Stopped NTP Daemon.
Dec 14 00:08:01 ouroboros systemd[1]: ntpd.service: Unit entered failed state.
Dec 14 00:08:01 ouroboros systemd[1]: ntpd.service: Failed with result 'signal'.
Dec 14 00:08:01 ouroboros sudo[4324]: pam_unix(sudo:session): session closed for user root
During the period that systemd is waiting for the service to exit cleanly, ps afx reports that the ntpd process is actually defunct.
However, after systemctl stop ntpd.service finally completes, the service can then be started and stopped without any failures.
systemctl stop ntpd.serviceP.S. my configuration.nix is split into about 12 files to make it easier for me to find options, but much harder to attach it to a Github issue. At some point in the near future I'll merge as much of it as I can into one file and attach it, in case it helps diagnose this problem. It seems like the problem must be fairly system- or configuration-specific, since if this issue affected other people the extra 90 seconds added to the time it takes to shut down would surely be noticeable.
ntpd probably hangs in some system call during that period. The 90 seconds should give you enough time to use strace -p to find out which one. This should more or less explain what is going on wrong your machine.
Hi,
same issue here, but when I boot, ntpd is already as defunct. I attach some info and I hope this could be useful for you:
$ ps aux | grep ntp
ntp 894 0.0 0.0 0 0 ? Zsl 13:26 0:00 [ntpd] <defunct>
strace at that moment:
strace -f -s99999 -v -p 894
strace: Process 894 attached
futex(0x55ae0f586508, FUTEX_WAIT_BITSET_PRIVATE|FUTEX_CLOCK_REALTIME, 0, NULL, ffffffff
strace while systemctl stop ntpd.service is executed:
) = ? ERESTARTSYS (To be restarted if SA_RESTART is set)
futex(0x55ae0f586508, FUTEX_WAIT_BITSET_PRIVATE|FUTEX_CLOCK_REALTIME, 0, NULL, ffffffff <unfinished ...>
and strace when the process is finally killed:
+++ killed by SIGKILL +++
Thanks. I didn't actually check whether ntpd was defunct or not before
stopping it, so it's quite likely we have the same problem.
EDIT: formatting
Can confirm that ntpd is defunct at startup and that strace gives the same output as in @areina's comment.
Ok, this is just an ordinary mutex. In this situation I would have expect that the process is reaped by systemd - maybe the SIGTERM signal was masked by ntpd? I wonder if systemd.services.ntpd.serviceConfig.Type = "simple"; and services.ntp.extraFlags = ["-n"] could fix the problem.
@Mic92 sadly adding those options doesn't seem to have affected it at all - ntpd is still defunct at startup, and the same strace output is produced when trying to stop the service.
The patch that supposedly fixes this was merged.
Most helpful comment
The patch that supposedly fixes this was merged.