Lisk-sdk: P2P errors while syncing with testnet

Created on 16 May 2019  路  9Comments  路  Source: LiskHQ/lisk-sdk

Expected behavior

While syncing with testnet or mainnet, the node shouldn't make too many disconnects or the other errors mentioned below.

Actual behavior

Too many P2P errors while syncing with test network.

14:46:17.406Z ERROR lisk-framework: Event 'rpc-request' was aborted due to a bad connection
14:46:13.092Z ERROR lisk-framework: Unable to match an address to the peer
14:46:13.073Z ERROR lisk-framework: Connection id does not match with corresponding peer

Steps to reproduce

Start 2.0.0-alpha.4 node with test network.

Which version(s) does this affect? (Environment, OS, etc...)

2.0.0

bug elementP2P

Most helpful comment

Move to the trace level log after classified as a regular behavior.

All 9 comments

This errors also occurring in isolated 2.0.0-alpha.x network.
for debugging this use any of the nodes from network http://alpha7-seed-01.liskdev.net:6040/networkMonitor

Move to the trace level log after classified as a regular behavior.

I was debugging and found out that these two errors are coming specifically from 1.6 versions,

Error: Unable to match an address to the peer

Verison of connected peer: '1.6.0-rc.3' and also some older version


Error: Connection id does not match with corresponding peer

Verison of connected peer: '1.6.0-rc.4' 

closing this issue as the errors don't occur

14:46:13.092Z ERROR lisk-framework: Unable to match an address to the peer
14:46:13.073Z ERROR lisk-framework: Connection id does not match with corresponding peer

only 14:46:17.406Z ERROR lisk-framework: Event 'rpc-request' was aborted due to a bad connection occurs and its very less.

In version 2.0.0-alpha.10 the node is still trying to connect to itself frequently:

09:30:35.899Z DEBUG lisk-framework: Peer disconnect event: Outbound connection of peer 134.209.90.103:5000 was closed with code 4101 and reason: Peer cannot connect to itself
09:30:35.900Z ERROR lisk-framework: Event 'rpc-request' was aborted due to a bad connection
09:30:35.900Z DEBUG lisk-framework: Socket connection closed with status code 4101 and reason: Peer cannot connect to itself

This is not the correct behavior.

@4miners This is because the node doesn't remember who it was so it keeps rediscovering itself through peers. This problem shouldn't lead to any negative side effects though and will fix itself once we have implemented the LIP partial view feature.

My view is that it would be better to just let the node to connect and interact with itself - This would lead to simpler more consistent logic with fewer possible states. At worst, it would be a very minor performance overhead (probably less than 1% in a network with 100 peers) to just let the node send itself a transaction/signature/block once in a while.

We can avoid this error occurring frequently by adding our public ip to blacklist in p2p in-memory, the first time we see it. In this way, it will always be discarded before even throwing these errors even if it comes from the peer list of other peers during the discovery. So we will only see this error one time.

That said it's probably a pretty easy fix.

@ishantiw Yes I think this is fine. We do something like this in the 0.x and 1.x versions; the node bans itself.

Was this page helpful?
0 / 5 - 0 ratings

Related issues

toschdev picture toschdev  路  3Comments

MaciejBaj picture MaciejBaj  路  4Comments

yatki picture yatki  路  3Comments

willclarktech picture willclarktech  路  4Comments

Isabello picture Isabello  路  4Comments