Substrate: Local nodes keep disconnecting.

Created on 2 Dec 2019 · 15Comments · Source: paritytech/substrate

Two nodes running on localhost keep disconnecting from each other. Disconnect is initiated by libp2p and not by sync or reputation change apparently. Logs have no useful information on why the disconnect has happened.

Node 1 started as:

polkadot -d /tmp/polkadot/ -lpeerset=trace,sync=trace,sub-libp2p=trace,libp2p=trace --out-peers 0

Node2 started as:

polkadot -d /tmp/polkadot2 --reserved-nodes /ip4/127.0.0.1/tcp/30333/p2p/QmYiJFSrLQdvWJLLF64RnWGvk92zSjaN49EP9rtNLDbNEo --reserved-only -l peerset=trace,sub-libp2p=trace,libp2p=trace

Nodes stay connected for about 10 seconds before disconnect and reconnect happens.

Nodes should stay connected.
Logs should have disconnect reason.

I2-bug 🐜

Source

arkpar

👍1

Most helpful comment

In my opinion the solution is to handle multiple simultaneous connections per node (https://github.com/libp2p/rust-libp2p/issues/912).
There have already been several tricky issues caused by the decision of enforcing a unique connection, proving that it probably wasn't a good idea.

tomaka on 13 Jan 2020

👍2

All 15 comments

For what it's worth, disabling the discovery mechanism fixes the issue. It's still unclear to me what is happening.

tomaka on 3 Dec 2019

What seems to happen is the following:

Node 2 connects to node 1.
Both nodes also perform regular (Kusama?) DHT discovery queries in regular (exponentially increasing) intervals, starting with seconds and capped at 60 seconds.
It appears that local nodes connecting to the (Kusama?) DHT put their local address into the DHT, i.e. /ip4/127.0.0.1/tcp/30333. This is obviously necessary for nodes in the same local network to discover and talk to each other through the DHT, but these are also seen by remote nodes.
As a result, Node 2 will discover a node with a different peer ID from Node 1 but with address /ip4/127.0.0.1/tcp/30333. It will try to connect to it as part of the DHT lookups.
Thus Node 2 will make another connection attempt to Node 1, thinking it is some other node (i.e. expecting a different peer ID this time). It still has the existing connection to Node 1 for the moment.
Node 1 receives the incoming connection from Node 2, replacing the existing connection (i.e. dropping it) due to the single-connection-per-peer policy. The reason(s) for always taking the new connection over the old, even if a node has the role of listener in both connections (i.e. it is not a "simultaneous connect" scenario) are not entirely clear to me (@tomaka?).
Node 2 finishes setting up the new connection accepted by Node 1, but then discovers that the expected peer ID does not match the actual peer ID, so it drops this connection (MITM protection). (Node 2 quickly gets a BrokenPipe error on the old connection through the StreamMuxer usually. )
Node 1 gets a ConnectionReset error on the new connection.

This sequence more or less repeats ad infinitum, since Node 2's discovery will continue to encounter these peer IDs differing from that of Node 1 with address /ip4/127.0.0.1/tcp/30333 to which it will try to connect during DHT lookups.

If I'm not mistaken, this seems to be an interesting way for a node C to directly influence the connectivity between some nodes A and B, e.g. by advertising public addresses of A in the DHT under its own peer ID. When node B picks these up during lookups, it would disturb the connection between A and B in the above manner. This may be primarily a pitfall of the single-connection-per-node policy, though it may possibly be prevented by establishing a preference for the old (existing) connection, if a node receives a second connection as a listener when it is already a listener in the existing connection, but I'm not entirely clear about all the possible consequences at the moment.

romanb on 13 Jan 2020

👍1

The reason(s) for always taking the new connection over the old, even if a node has the role of listener in both connections (i.e. it is not a "simultaneous connect" scenario) are not entirely clear to me (@tomaka?).

The reason is that typically when a node opens a new connection, it's because the old one is dead.

Example situation: Node 1 and Node 2 are connected. Node 2 loses its Internet connection, realizes it, and kills all existing sockets. No FIN is actually being sent because of no Internet access. Node 2 then gains back its Internet connection and tries to re-connect to Node 1. Node 1 isn't aware that the previous connection is dead.
In this situation, the new connection is the right choice.

tomaka on 13 Jan 2020

👍2

Node 1 isn't aware that the previous connection is dead.

Doesn't libp2p have keep-alive or ping protocol to handle this?

arkpar on 14 Jan 2020

Doesn't libp2p have keep-alive or ping protocol to handle this?

It does, but it takes something like 30 seconds to trigger.

I'm not actually sure that my scenario above is realistic, but the general idea is that we expect that when a node opens a second connection it is because the existing one is unusable.

tomaka on 14 Jan 2020

Doesn't libp2p have keep-alive or ping protocol to handle this?

It does, but it takes something like 30 seconds to trigger.

I'm not actually sure that my scenario above is realistic, but the general idea is that we expect that when a node opens a second connection it is because the existing one is unusable.

Note though that even when permitting multiple connections per peer, which I'm currently looking into, you will want to have a configurable limit (per peer). In a sense, the current single-connection-per-peer policy can be seen as a hard-coded limit of 1. Whatever the limit, I don't think it is a good idea to enforce the limit by dropping existing connections in favor of new ones at the lower networking layers. Rather, timely detection of broken connections is up to application protocols (or by configuring timeouts on a lower-layer protocol), and in particular what "timely" is supposed to be exactly as per the requirements of the protocol. The ping protocol can be aptly configured and used for this purpose, if desired.

romanb on 16 Jan 2020

How exactly allowing multiple connections will solve this issue? This is a fairly straightforward scenario, where none of the peers misbehave or lose connectivity. Why would we want multiple connections here?

As a result, Node 2 will discover a node with a different peer ID from Node 1 but with address /ip4/127.0.0.1/tcp/30333. It will try to connect to it as part of the DHT lookups.

It seems to me that it should not attempt dialing an address that's already connected in the first place.

arkpar on 7 Feb 2020

How exactly allowing multiple connections will solve this issue? This is a fairly straightforward scenario, where none of the peers misbehave or lose connectivity. Why would we want multiple connections here?

As I explained in an earlier comment, the immediate cause of this issue is that the "listener" closes its existing connection, preferring the new over the old connection in the attempt to enforce a single connection per peer (the "dialer" then later, upon discovering the peer ID mismatch closes the new connection as well, and the connect/disconnect dance begins in this way). While my first reaction was that it doesn't seem right that new connections are preferred over old ones in this scheme, and I'd rather swap that around, @tomaka had some concerns on doing that. In any case, removing the single-connection-per-peer policy is a strictly more general and desirable solution, not just in light of this issue. In this particular scenario the "listener" then no longer has to make a choice between these connections, the old connection remains unaffected, the "dialer" eventually closes its new connection attempt upon discovering the peer ID mismatch.

As a result, Node 2 will discover a node with a different peer ID from Node 1 but with address /ip4/127.0.0.1/tcp/30333. It will try to connect to it as part of the DHT lookups.

It seems to me that it should not attempt dialing an address that's already connected in the first place.

In general and at the level of libp2p, I don't think it is desirable to disallow multiple connections to the same address. Of course - and I think that is what you are referring to - in the context of a specific protocol, like Kademlia, one could argue that connections should be uniquely identified by such an address. However, the (logical) overlay network of Kademlia only operates opaquely on a uniformly distributed keyspace, which also contains the node / peer IDs. Kademlia only uniquely identifies peers by these IDs and addresses to connect to are a secondary implementation artifact. In the scenario here Kademlia sees a different peer ID for the same address, i.e. another peer that supposedly also has that address (among others, possibly). While you could argue that it should disregard the peer ID in this case, seeing that it already has a connection to the same address, even though with a different peer ID, I'm really not sure this is a good idea. If multiple peers are seen advertising the same address, who is to decide which is "right", i.e. which to connect to, respectively which connection to keep and which others to ignore.

romanb on 8 Feb 2020

Example situation: Node 1 and Node 2 are connected. Node 2 loses its Internet connection, realizes it, and kills all existing sockets. No FIN is actually being sent because of no Internet access. Node 2 then gains back its Internet connection and tries to re-connect to Node 1. Node 1 isn't aware that the previous connection is dead.
In this situation, the new connection is the right choice.

I'd argue that new connection should not replace existing. Dead connections will eventually drop because we have keep-alive or ping protocols. Waiting for 30 seconds to restore connectivity is fine for substrate.

If multiple peers are seen advertising the same address, who is to decide which is "right", i.e. which to connect to, respectively which connection to keep and which others to ignore.

You don't drop existing connections. Otherwise it's an attack vector.

I'd like to clarify that "multiple connections" being discussed are to support connections to the same address with different node IDs. Multiple connections to the same address/node_id still won't be allowed, right?

Regarding multiple connections to the same node id, we probably don't want that in substrate/polkadot. Proposed use case sounds like: "Let's allow the second connection because the first one might be actually dead" sounds like a hack. What if the first one never closes after all? It look like we are struggling with managing connections even now, when duplicates are not allowed. This looks like it will introduce a lot of unneeded complexity for no good reason.

Additionally, can there be an additional authentication mechanism introduced to Kademlia layer? Devp2p discovery would not propagate unconfirmed addresses. "Confirmed" here means that there was a signed UDP ping/pong exchange with that address first.

arkpar on 10 Feb 2020

Example situation: Node 1 and Node 2 are connected. Node 2 loses its Internet connection, realizes it, and kills all existing sockets. No FIN is actually being sent because of no Internet access. Node 2 then gains back its Internet connection and tries to re-connect to Node 1. Node 1 isn't aware that the previous connection is dead.
In this situation, the new connection is the right choice.

I'd argue that new connection should not replace existing. Dead connections will eventually drop because we have keep-alive or ping protocols. Waiting for 30 seconds to restore connectivity is fine for substrate.

I agree, hence my first reaction was to change that, as I mentioned at the end of my first comment. The same thing (i.e. not dropping the existing connection) also happens with libp2p-core permitting multiple connections.

If multiple peers are seen advertising the same address, who is to decide which is "right", i.e. which to connect to, respectively which connection to keep and which others to ignore.

You don't drop existing connections. Otherwise it's an attack vector.

Sure, I hinted at the same thing at the end of my first comment. I think we are on the same page here.

I'd like to clarify that "multiple connections" being discussed are to support connections to the same address with different node IDs. Multiple connections to the same address/node_id still won't be allowed, right?

I see no reason for a general-purpose networking library like libp2p to disallow that, so yes, that will be allowed.

Regarding multiple connections to the same node id, we probably don't want that in substrate/polkadot. Proposed use case sounds like: "Let's allow the second connection because the first one might be actually dead" sounds like a hack. What if the first one never closes after all? It look like we are struggling with managing connections even now, when duplicates are not allowed. This looks like it will introduce a lot of unneeded complexity for no good reason.

It's fine if substrate/polkadot do not intentionally make use of multiple connections per peer. Indeed, in https://github.com/libp2p/rust-libp2p/pull/1440 even libp2p-swarm retains these semantics. Nevertheless, two peers may connect to each other "simultaneously" and that is the part where trying to enforce a single connection per peer at all times adds complexity that is removed in https://github.com/libp2p/rust-libp2p/pull/1440. If even a temporary second connection is undesirable for a specific application protocol, it is up to that protocol to decide which connection to close and when.

Additionally, can there be an additional authentication mechanism introduced to Kademlia layer? Devp2p discovery would not propagate unconfirmed addresses. "Confirmed" here means that there was a signed UDP ping/pong exchange with that address first.

That would need to be laid out in more detail in order for me to make an informed comment. In general I expressed my desire in the past to allow better curation of Kademlia's k-buckets through the public API offered by libp2p-kad, i.e. to provide more control over which peers and addresses are in the routing table (and thus advertised to others) at any time. This may or may not already be sufficient to implement such a use-case. There is also some related work proposed in https://github.com/libp2p/rust-libp2p/issues/1352, though that is primarily a means for prioritizing entries in already full k-buckets.

romanb on 10 Feb 2020

We are seeing this behavior in our private network as well due to IP reuse of the nodes, here's a way of producing it using Docker:

First create an internal docker network for this:

docker network create \
  --internal \
  --subnet 172.19.0.0/16 \
  --opt "com.docker.network.bridge.name=substrate" \
  substrate

Start nodes alice, bob, and charlie:

docker run --rm --name alice --network substrate --ip 172.19.1.1 \
  parity/substrate:2.0.0-646e7fb \
  --chain local \
  --validator \
  --node-key 0000000000000000000000000000000000000000000000000000000000000001 \
  --alice \
  --no-mdns

docker run --rm --name bob --network substrate --ip 172.19.1.2 \
  parity/substrate:2.0.0-646e7fb \
  --chain local \
  --validator \
  --node-key 0000000000000000000000000000000000000000000000000000000000000002 \
  --bob \
  --no-mdns \
  --bootnodes /ip4/172.19.1.1/tcp/30333/p2p/QmRpheLN4JWdAnY7HGJfWFNbfkQCb6tFf4vvA6hgjMZKrR



md5-6e035259c0d9582147567777b65fccd7



docker run --rm --name charlie --network substrate --ip 172.19.1.3 \
  parity/substrate:2.0.0-646e7fb \
  --chain local \
  --validator \
  --node-key 0000000000000000000000000000000000000000000000000000000000000003 \
  --charlie \
  --no-mdns \
  --bootnodes /ip4/172.19.1.1/tcp/30333/p2p/QmRpheLN4JWdAnY7HGJfWFNbfkQCb6tFf4vvA6hgjMZKrR



md5-cefa88d73c8f782f0ece8f85b4f00afd



docker run --rm --name charlie --network substrate --ip 172.19.1.2 \
  parity/substrate:2.0.0-646e7fb \
  --chain local \
  --validator \
  --node-key 0000000000000000000000000000000000000000000000000000000000000003 \
  --charlie \
  --no-mdns \
  --bootnodes /ip4/172.19.1.1/tcp/30333/p2p/QmRpheLN4JWdAnY7HGJfWFNbfkQCb6tFf4vvA6hgjMZKrR

Then charlie will repeatedly try to connect to alice which the connection will stay connected for very short moment then gets dropped.

And alice will repeatedly try to find bob at 172.19.1.2 but every time it sees charlie's peer ID instead so the connection then gets dropped. This is a problem on its own too as bob will never be at that IP anymore.

ghost on 23 Feb 2020

Should have been fixed by #5278, although I didn't verify that it actually is.

tomaka on 9 Apr 2020

It is now much worse

020-04-09 11:55:41.460 main-tokio- TRACE sync  Connecting QmX6ck5cwxsiSUbWrCZNKUeY188AAb4dvreYEFh6BtMcPQ
2020-04-09 11:55:41.460 main-tokio- TRACE sync  New peer QmX6ck5cwxsiSUbWrCZNKUeY188AAb4dvreYEFh6BtMcPQ Status { version: 6, min_supported_version: 3, roles: FULL, best_number: 1181553, best_hash: 0x06ba23f8e56cd2b99ef5998bb5eab05bd6ed81afc76699b9f313abcaf59e92d1, genesis_hash: 0xb0a8d493285c2df73290dfb7e61f870f17b41801197a149ca93654499ea3dafe, chain_status: [] }
2020-04-09 11:55:41.460 main-tokio- DEBUG sync  Connected QmX6ck5cwxsiSUbWrCZNKUeY188AAb4dvreYEFh6BtMcPQ
2020-04-09 11:55:45.808 main-tokio- TRACE sync  QmX6ck5cwxsiSUbWrCZNKUeY188AAb4dvreYEFh6BtMcPQ disconnected
2020-04-09 11:55:45.824 main-tokio- TRACE sync  QmX6ck5cwxsiSUbWrCZNKUeY188AAb4dvreYEFh6BtMcPQ disconnected
2020-04-09 11:55:51.164 main-tokio- TRACE sync  QmX6ck5cwxsiSUbWrCZNKUeY188AAb4dvreYEFh6BtMcPQ disconnected
2020-04-09 11:55:51.264 main-tokio- TRACE sync  QmX6ck5cwxsiSUbWrCZNKUeY188AAb4dvreYEFh6BtMcPQ disconnected
2020-04-09 11:55:53.20 main-tokio- TRACE sync  QmX6ck5cwxsiSUbWrCZNKUeY188AAb4dvreYEFh6BtMcPQ disconnected
2020-04-09 11:55:53.30 main-tokio- TRACE sync  QmX6ck5cwxsiSUbWrCZNKUeY188AAb4dvreYEFh6BtMcPQ disconnected
2020-04-09 11:55:53.148 main-tokio- TRACE sync  QmX6ck5cwxsiSUbWrCZNKUeY188AAb4dvreYEFh6BtMcPQ disconnected
2020-04-09 11:55:56.144 main-tokio- TRACE sync  QmX6ck5cwxsiSUbWrCZNKUeY188AAb4dvreYEFh6BtMcPQ disconnected
2020-04-09 11:55:56.233 main-tokio- TRACE sync  QmX6ck5cwxsiSUbWrCZNKUeY188AAb4dvreYEFh6BtMcPQ disconnected
2020-04-09 11:56:03.729 main-tokio- TRACE sync  QmX6ck5cwxsiSUbWrCZNKUeY188AAb4dvreYEFh6BtMcPQ disconnected
2020-04-09 11:56:03.872 main-tokio- TRACE sync  QmX6ck5cwxsiSUbWrCZNKUeY188AAb4dvreYEFh6BtMcPQ disconnected
2020-04-09 11:56:03.904 main-tokio- TRACE sync  QmX6ck5cwxsiSUbWrCZNKUeY188AAb4dvreYEFh6BtMcPQ disconnected
2020-04-09 11:56:05.132 main-tokio- TRACE sync  QmX6ck5cwxsiSUbWrCZNKUeY188AAb4dvreYEFh6BtMcPQ disconnected

The peer is immediately reported as disconnected after connection. After that TCP connection stays open, and block requests from that peer come through. So they still consider the connection to be active.

Also, there are multiple disconnect notifications.

Update: Apparently this is resolved by https://github.com/paritytech/substrate/pull/5595

arkpar on 9 Apr 2020

Does this affect nodes behind NAT? E.g. if we have two nodes running behind a NAT router, they share the same IP address, but of course with different port number and peer id. Is there any reference how the DHT stores the peer id?