Zerotierone: 1.2.8 systemd service does not exit in a timely manner

Created on 8 May 2018  路  17Comments  路  Source: zerotier/ZeroTierOne

Summary

With 1.2.8, the zerotier-one service no longer exits in a timely manner, instead being killed after a 1m30 timeout. 1.2.6 works as expected (terminates cleanly on system halt/reboot).

Details

Something in the 1.2.6->1.2.8 changeset is preventing the zerotier-one service from exiting in a timely manner. On system reboot or poweroff the service is killed after a 1m30 timeout.

I've tested PR #722 against 1.2.6 and it's not that bit.

I'll start bisecting but am submitting now in case there's a known/obviously suspect commit.

Log output

May 07 22:00:06 box systemd[1]: Stopping ZeroTier One...
May 07 22:01:36 box systemd[1]: zerotier-one.service: State 'stop-sigterm' timed out. Killing.
May 07 22:01:36 box systemd[1]: zerotier-one.service: Killing process 1371 (zerotier-one) with signal SIGKILL.
May 07 22:01:36 box systemd[1]: zerotier-one.service: Main process exited, code=killed, status=9/KILL
May 07 22:01:36 box systemd[1]: zerotier-one.service: Failed with result 'timeout'.
May 07 22:01:36 box systemd[1]: Stopped ZeroTier One

System details

OS: Manjaro Linux 64-bit
Tested kernels: 4.16.7
ZeroTier version: 1.2.8 (using packaging files from https://www.archlinux.org/packages/community/x86_64/zerotier-one/)
systemd version: 238.76

Most helpful comment

I am using Ubuntu 18.04, and I was having this very issue just today, which led me here. I saw what jonathonf wrote about the systemd unit, which had me looking at the .service file located at:

/lib/systemd/system/zerotier-one.service

which looks like this:

[Unit]
Description=ZeroTier One
After=network-online.target
Wants=network-online.target

[Service]
ExecStart=/usr/sbin/zerotier-one
Restart=always
KillMode=process

[Install]
WantedBy=multi-user.target

This ^^ was still causing the issue , but I had changed it to this:

[Unit]
Description=ZeroTier One
After=network.target
Wants=network-online.target

[Service]
ExecStart=/usr/sbin/zerotier-one
Restart=always
KillMode=process

[Install]
WantedBy=multi-user.target

Which works just fine now :)

All 17 comments

That was quicker than I thought. There's a change to service/OneService.cpp here:

https://github.com/zerotier/ZeroTierOne/compare/1.2.6...1.2.8?diff=unified&name=1.2.8#diff-6bd361de78e54e7db32d9b4e502a88b0

After reverting this with the below patch I can shut down/reboot without systemd triggering the "kill" timer.

I can't see the #ifdef __NetBSD__ section having any effect on a Linux system (unless clang is doing something _very_ odd) so it's pointing towards the "Check global blocklists" section. I see a Mutex::Lock which might be involved?

I'll keep running with this configuration to test it more fully; the issue might rely on other factors before it manifests.

--- a/service/OneService.cpp
+++ b/service/OneService.cpp
@@ -120,10 +120,6 @@ namespace ZeroTier { typedef WindowsEthernetTap EthernetTap; }
 #include "../osdep/BSDEthernetTap.hpp"
 namespace ZeroTier { typedef BSDEthernetTap EthernetTap; }
 #endif // __FreeBSD__
-#ifdef __NetBSD__
-#include "../osdep/NetBSDEthernetTap.hpp"
-namespace ZeroTier { typedef NetBSDEthernetTap EthernetTap; }
-#endif // __NetBSD__
 #ifdef __OpenBSD__
 #include "../osdep/BSDEthernetTap.hpp"
 namespace ZeroTier { typedef BSDEthernetTap EthernetTap; }
@@ -2418,22 +2414,7 @@ class OneServiceImpl : public OneService
                    return false;
            }
        }
+
-       {
-           // Check global blacklists
-           const std::vector<InetAddress> *gbl = (const std::vector<InetAddress> *)0;
-           if (ifaddr.ss_family == AF_INET) {
-               gbl = &_globalV4Blacklist;
-           } else if (ifaddr.ss_family == AF_INET6) {
-               gbl = &_globalV6Blacklist;
-           }
-           if (gbl) {
-               Mutex::Lock _l(_localConfig_m);
-               for(std::vector<InetAddress>::const_iterator a(gbl->begin());a!=gbl->end();++a) {
-                   if (a->containsAddress(ifaddr))
-                       return false;
-               }
-           }
-       }
        {
            Mutex::Lock _l(_nets_m);
            for(std::map<uint64_t,NetworkState>::const_iterator n(_nets.begin());n!=_nets.end();++n) {

This can be more trivially worked around with a change to the systemd unit:

--- a/debian/zerotier-one.service
+++ b/debian/zerotier-one.service
@@ -1,6 +1,7 @@
 [Unit]
 Description=ZeroTier One
 After=network.target
+Wants=network-online.target

 [Service]
 ExecStart=/usr/sbin/zerotier-one

I don't think this is the ideal solution though:

What does this mean for me, a Developer?

If you are a developer, instead of wondering what to do about network.target, please just fix your program to be friendly to dynamically changing network configuration. That way you will make your users happy because things just start to work, and you will get fewer bug reports as your stuff is just rock solid. You also make the boot faster for your users, as they don't have to delay arbitrary services for the network anymore (which is particularly annoying for folks with slow address assignment replies from a DHCP server).

https://www.freedesktop.org/wiki/Software/systemd/NetworkTarget/

I am using cifs mounting with x-systemd.automount in fstab. I need to manually unmount them before shutdown or else I get a timeout since the host is unavailable.
This only happens with zerotier hosts, normal local LAN hosts it works fine with.

Could this help with zerotier closing down before network so that the mounts unmount at the right time?

My system is KDE NEON (ubuntu 16.04).

Hello,

This issue affects me too in F28.

Will the "Wants" line be the official fix?

Thanks!

I think this is done. I see Wants=network-online.target in the unit file.

I am using Ubuntu 18.04, and I was having this very issue just today, which led me here. I saw what jonathonf wrote about the systemd unit, which had me looking at the .service file located at:

/lib/systemd/system/zerotier-one.service

which looks like this:

[Unit]
Description=ZeroTier One
After=network-online.target
Wants=network-online.target

[Service]
ExecStart=/usr/sbin/zerotier-one
Restart=always
KillMode=process

[Install]
WantedBy=multi-user.target

This ^^ was still causing the issue , but I had changed it to this:

[Unit]
Description=ZeroTier One
After=network.target
Wants=network-online.target

[Service]
ExecStart=/usr/sbin/zerotier-one
Restart=always
KillMode=process

[Install]
WantedBy=multi-user.target

Which works just fine now :)

This worked for me too. Thanks @JackPShafer

Still an issue

Same here.

I am using Ubuntu 18.04, and I was having this very issue just today, which led me here. I saw what jonathonf wrote about the systemd unit, which had me looking at the .service file located at:

/lib/systemd/system/zerotier-one.service

which looks like this:

[Unit]
Description=ZeroTier One
After=network-online.target
Wants=network-online.target

[Service]
ExecStart=/usr/sbin/zerotier-one
Restart=always
KillMode=process

[Install]
WantedBy=multi-user.target

This ^^ was still causing the issue , but I had changed it to this:

[Unit]
Description=ZeroTier One
After=network.target
Wants=network-online.target

[Service]
ExecStart=/usr/sbin/zerotier-one
Restart=always
KillMode=process

[Install]
WantedBy=multi-user.target

Which works just fine now :)

It just works for me too! Thank you!

For me it still hangs for 1min30s on shutdown even with this change:
https://github.com/zerotier/ZeroTierOne/issues/738#issuecomment-418460657

@guraltsev same here

@JackPShafer comment is the solution :)

Note: It is still a problem in 1.4.4 in Ubuntu 18.04.

@AlexisTM This change will be replicated to Fedora rpms?

https://askubuntu.com/questions/659267/how-do-i-override-or-configure-systemd-services
http://man7.org/linux/man-pages/man1/systemctl.1.html ("edit")

Just fyi, even if it did get changed/fixed, I don't think a release is going out any time in the near future

per @guraltsev the service file change above alone, did not solve the issue for me. I additionally used the 'systemctl edit' method to add a TimeoutStopSec value much lower than the system default, which at least mitigated the issue. I'm using encrypted home directories.. the process for which seems to stay running until ZeroTier dies. Could be related to the issue in general?

[Unit]
Description=ZeroTier One
After=network.target
Wants=network-online.target

[Service]
ExecStart=/usr/sbin/zerotier-one
Restart=always
KillMode=process
TimeoutStopSec=10

[Install]
WantedBy=multi-user.target
Was this page helpful?
0 / 5 - 0 ratings

Related issues

kblackcn picture kblackcn  路  3Comments

bstin picture bstin  路  3Comments

hhhnb picture hhhnb  路  4Comments

coretemp picture coretemp  路  3Comments

MaskRay picture MaskRay  路  4Comments