While it looks like the same issue as #738 zerotier-one.service currently does not exit on timely manner, causing shutdown is blocked when the network is not connected. (e.g. when Wi-Fi is not connected on my laptop)
$ journalctl -b -1 -u zerotier-one.service outputs:
Dec 18 22:40:13 KBUMSIK-LG-Arch zerotier-one[617]: sendto: Network is unreachable
Dec 18 22:45:18 KBUMSIK-LG-Arch zerotier-one[617]: sendto: Network is unreachable
Dec 18 22:50:23 KBUMSIK-LG-Arch zerotier-one[617]: sendto: Network is unreachable
Dec 18 22:55:45 KBUMSIK-LG-Arch systemd[1]: Stopping ZeroTier One...
Dec 18 22:57:15 KBUMSIK-LG-Arch systemd[1]: zerotier-one.service: State 'stop-sigterm' timed out. Killing.
Dec 18 22:57:15 KBUMSIK-LG-Arch systemd[1]: zerotier-one.service: Killing process 617 (zerotier-one) with signal SIGKILL.
Dec 18 22:57:15 KBUMSIK-LG-Arch systemd[1]: zerotier-one.service: Main process exited, code=killed, status=9/KILL
Dec 18 22:57:15 KBUMSIK-LG-Arch systemd[1]: zerotier-one.service: Failed with result 'timeout'.
Dec 18 22:57:15 KBUMSIK-LG-Arch systemd[1]: Stopped ZeroTier One.
OS: Arch Linux 64-bit
uname -a: Linux KBUMSIK-LG-Arch 4.19.8-arch1-1-ARCH #1 SMP PREEMPT Sat Dec 8 13:49:11 UTC 2018 x86_64 GNU/Linux
ZeroTier version: 1.2.12 (from the Arch Linux package)
systemd version: systemd-239.303-1
FWIW also seeing this on FreeBSD.
Have you tried this? https://github.com/zerotier/ZeroTierOne/issues/738#issuecomment-387697755
@obadz Yes. I checked it and that line is already applied in my version.
This is applied in my system as well and I am seeing similar behavior. The following is a strace of the main process with -f flag when I issued a systemctl stop zerotier-one.service command:
The process recieved the SIGTERM, but continued to run.
Systemd issued the SIGKILL after timeout (I believe).
[pid 14637] <... restart_syscall resumed> ) = ? ERESTART_RESTARTBLOCK (Interrupted by signal)
[pid 14639] <... select resumed> ) = ? ERESTARTNOHAND (To be restarted if no handler)
[pid 14628] <... select resumed> ) = ? ERESTARTNOHAND (To be restarted if no handler)
[pid 14637] --- SIGTERM {si_signo=SIGTERM, si_code=SI_USER, si_pid=1, si_uid=0} ---
[pid 14637] --- SIGCONT {si_signo=SIGCONT, si_code=SI_USER, si_pid=1, si_uid=0} ---
[pid 14639] select(12, [9 11], [], [], NULL <unfinished ...>
[pid 14637] write(5, "\20", 1 <unfinished ...>
[pid 14628] select(36, [7 8 10 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35], [], [], {tv_sec=0, tv_usec=474915} <unfinished ...>
[pid 14637] <... write resumed> ) = 1
[pid 14637] rt_sigreturn({mask=[]}) = -1 EINTR (Interrupted system call)
[pid 14637] socket(AF_INET, SOCK_DGRAM, IPPROTO_IP) = 36
[pid 14637] fcntl(36, F_GETFL) = 0x2 (flags O_RDWR)
[pid 14637] fcntl(36, F_SETFL, O_RDWR|O_NONBLOCK) = 0
[pid 14637] openat(AT_FDCWD, "/proc/net/route", O_RDONLY) = 37
[pid 14637] fstat(37, {st_mode=S_IFREG|0444, st_size=0, ...}) = 0
[pid 14637] read(37, "Iface\tDestination\tGateway \tFlags"..., 1024) = 640
[pid 14637] close(37) = 0
[pid 14637] connect(36, {sa_family=AF_INET, sin_port=htons(5351), sin_addr=inet_addr("192.168.5.1")}, 16) = 0
[pid 14637] sendto(36, "\0\0", 2, 0, NULL, 0) = 2
[pid 14637] gettimeofday({tv_sec=1547931566, tv_usec=750067}, NULL) = 0
[pid 14637] select(1024, [36], NULL, NULL, {tv_sec=0, tv_usec=249866}) = 1 (in [36], left {tv_sec=0, tv_usec=248928})
[pid 14637] recvfrom(36, 0x7f658e742e60, 16, 0, 0x7f658e742e70, [16]) = -1 ECONNREFUSED (Connection refused)
[pid 14637] gettimeofday({tv_sec=1547931566, tv_usec=751390}, NULL) = 0
[pid 14637] close(36) = 0
[pid 14637] socket(AF_UNIX, SOCK_STREAM, 0) = 36
[pid 14637] setsockopt(36, SOL_SOCKET, SO_RCVTIMEO, "\3\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0", 16) = 0
[pid 14637] setsockopt(36, SOL_SOCKET, SO_SNDTIMEO, "\3\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0", 16) = 0
[pid 14637] connect(36, {sa_family=AF_UNIX, sun_path="/var/run/minissdpd.sock"}, 110) = -1 ENOENT (No such file or directory)
[pid 14637] close(36) = 0
[pid 14637] socket(AF_INET, SOCK_DGRAM, IPPROTO_IP) = 36
[pid 14637] setsockopt(36, SOL_SOCKET, SO_REUSEADDR, [1], 4) = 0
[pid 14637] setsockopt(36, SOL_IP, IP_MULTICAST_TTL, "\2", 1) = 0
[pid 14637] bind(36, {sa_family=AF_INET, sin_port=htons(0), sin_addr=inet_addr("0.0.0.0")}, 16) = 0
[pid 14637] sendto(36, "M-SEARCH * HTTP/1.1\r\nHOST: 239.2"..., 94, 0, {sa_family=AF_INET, sin_port=htons(1900), sin_addr=inet_addr("239.255.255.250")}, 16) = 94
[pid 14637] poll([{fd=36, events=POLLIN}], 1, 5000 <unfinished ...>
[pid 14628] <... select resumed> ) = 0 (Timeout)
[pid 14637] <... poll resumed> ) = 0 (Timeout)
[pid 14637] close(36) = 0
[pid 14637] nanosleep({tv_sec=300, tv_nsec=0}, <unfinished ...>
[pid 14639] <... select resumed> ) = 1 (in [9])
[pid 14639] read(9, "33\0\0\203\204\6\251J8\220\217\206\335`\ftr\1\25\21\1\376\200\0\0\0\0\0\0\4\251"..., 10064) = 331
[pid 14639] gettimeofday({tv_sec=1547931581, tv_usec=333972}, NULL) = 0
[pid 14639] select(12, [9 11], [], [], NULL) = 1 (in [9])
[pid 14639] read(9, "33\0\0\203\204\6\251J8\220\217\206\335`\ftr\1\25\21\1\376\200\0\0\0\0\0\0\4\251"..., 10064) = 331
[pid 14639] gettimeofday({tv_sec=1547931611, tv_usec=334451}, NULL) = 0
[pid 14639] select(12, [9 11], [], [], NULL^F) = 1 (in [9])
[pid 14639] read(9, "33\0\0\203\204\6\251J8\220\217\206\335`\ftr\1\25\21\1\376\200\0\0\0\0\0\0\4\251"..., 10064) = 331
[pid 14639] gettimeofday({tv_sec=1547931641, tv_usec=333917}, NULL) = 0
[pid 14639] select(12, [9 11], [], [], NULL <unfinished ...>) = ?
[pid 14637] <... nanosleep resumed> <unfinished ...>) = ?
[pid 14639] +++ killed by SIGKILL +++
[pid 14637] +++ killed by SIGKILL +++
+++ killed by SIGKILL +++
attempting fix used in NixOS: https://github.com/NixOS/nixpkgs/pull/49423 by removing After and using BindsTo
BindsTo did not resolve the issue.
A fresh installation on ubuntu 18.04.2 LTS applying the https://github.com/zerotier/ZeroTierOne/issues/738#issuecomment-387697755 does not solve the problem. Still hangs while shutting down.
Pretty annoying minor bug in my opinion as it prevents shut downs. Previous work arounds with the service unit have failed me. Any updates on this?
Personally I have:
/etc/systemd/system/multi-user.target.wants/zerotier-one.service :
[service]
TimeoutSec=10
ZT1 has 10 seconds to do the deed or it gets the axe. Works well.
It's obviously not an optimal solution by any means, but it beats either waiting 120 seconds or Unlimited seconds, depending on which environment it's running under.
I'll try the time out option. Thanks for the tip @Arffeh.
Why was this bug closed? It seems to be still an issue and the timeout is a dirty trick.
This bug really should be reopened - it's not fixed, only addressed with a cheap hack to make it less noticeable.
I'm still experiencing this issue on Ubuntu 18.04 and ZT-One 1.4.6.
Ubuntu 20.04 - still an issue :/
Most helpful comment
Why was this bug closed? It seems to be still an issue and the timeout is a dirty trick.