Zerotierone: Running ZeroTierOne on a system with a large number of IP's results in a disconnect loop

Created on 1 May 2017 · 5Comments · Source: zerotier/ZeroTierOne

When attempting to start ZeroTier on several development systems with a large number of IP(s) bound, ZeroTier briefly starts before entering a disconnect loop, wherein the service remains available for 30-60 seconds at a time, followed by 30-120 seconds of downtime.

NIC's are bound in the standard numbered alias format (e.g. eth0:0, eth0:1, [...])

We'd only be interested in binding ports on 1.1.1.3

interfacePrefixBlacklist was also tried as an alternative to blacklisting by range, with the same result.

This occurs both with default package configuration and with a local.conf file as follows:

{
    "physical": {
        "1.1.1.3/32": {
            "blacklist": false
        }
        "1.1.2.0/24": {
            "blacklist": true
        },
        "1.1.3.0/24": {
            "blacklist": true
        },
        "1.1.4.0/24": {
            "blacklist": true
        }
    },
    "settings": {
        "primaryPort": 12345,
        "portMappingEnabled": false,
        "softwareUpdate": "apply",
        "softwareUpdateChannel": "release"
    }
}

Although I'm unfamiliar with the code base, it seems just from observation with netstat, etc. that ZeroTier eventually disconnects once a certain number of IP's are bound (~64) to the configured port and begins a reset cycle which loops back to the initial state.

Any guidance is appreciated; I'm sure many larger scale deployments would benefit from the ability to specify an interface bind limitation. Apologies for not being familiar enough to submit a pull request myself.

bug

Source

williamheinz

Most helpful comment

Wow. That is a disgusting amount of interfaces. Amazing.

Arffeh on 5 May 2017

👍2

All 5 comments

Wow... I've never tested it with this many IPs before. What kind of configuration do you have and what are you trying to do here? A high throughput mesh fabric or something?

adamierymenko on 3 May 2017

This may be a static buffer or some other kind of silly limitation somewhere.

adamierymenko on 3 May 2017

You pretty much nailed it, an edge node on a mesh network used for data processing. Unfortunately for security / policy reasons, we can only run something like ZeroTier on certain edge systems. In turn, these systems also have to have a significant number of virtual interfaces (128 - 256) across 8x physical 10GbE SFP+ ports in order to peer with all platform segments.

Despite the odd configuration and high throughput, we had no other issues whatsoever besides this simple hangup. A quick fix would be to offer explicit interface / IP binding which bypasses enumeration. I can think of quite a few deployments which would benefit from explicit binding versus blacklisting due to odd / dynamic network configurations, internal policy, or a combination of the two.

With that being said, time permitting I'll try to provide either additional details / debug information that might address the root cause if it's not easily discernible currently.

For background and context, I evaluated quite a few options in the simple-configuration SDN/Overlay/VPN network industry prior to pitching ZeroTier to our team. This included everything from early stage to enterprise such as: PeerVPN, WireGuard, tinc, ZeroTier, SoftEther, OpenContrail, OpenVPN-AS, Cradlepoint NetCloud etc.

ZeroTier was both straightforward enough to be setup by non-technical individuals who needed access and still offered very competitive network performance, which was impressive to say the least.

williamheinz on 5 May 2017

👍1

Wow. That is a disgusting amount of interfaces. Amazing.

Arffeh on 5 May 2017

👍2

I doubled the max number of bound sockets in dev to 256, which is ridiculous. :)

adamierymenko on 9 Jan 2018

Was this page helpful?

0 / 5 - 0 ratings