The connected peer count should be accurate and not fluctuate randomly.
When getting the peer count from the API, it misreports the number of connected peers.
This issue may be caused by the fact that state is not a native property of P2PPeerInfo (it doesn't come from the peer itself); it's something that our own node injects into the P2PPeerInfo instance for each peer. It's possible that nodes are sharing each other's P2PPeerInfo list through the discovery process and overriding each other's peer state properties.
The state property is needed for backwards compatibility with existing 1.x nodes but we should tyr to not rely on it to perform calculations. We should use a different approach for getting connected peers here: https://github.com/LiskHQ/lisk-sdk/blob/4797f8d7c4a4d245e69c8531139b00018da4fc20/framework/src/modules/http_api/controllers/peers.js#L72
^ Either that, or we should come up with a better mechanism for storing the peer state. Maybe like we discussed in the last P2P meeting and add a wrapper around the P2PPeerInfo object to store private properties (the ones that we do not want to share with peers); the state could go in there instead of inside the P2PPeerInfo.
On testnet, check the peer count as reported by the API. The one which comes from here: https://github.com/LiskHQ/lisk-sdk/blob/4797f8d7c4a4d245e69c8531139b00018da4fc20/framework/src/modules/http_api/controllers/peers.js#L81
2.0.0-rc.0
Further clarification on how to reproduce:
Keep querying /api/peers endpoint, from time to time you will get a window in which many disconnected or not reachable peers are reported as connected (have state: 2), like this (counted by script, testnet network):
Request with limit: 100, offset: 0 got 100 of 743 peers
Request with limit: 100, offset: 100 got 100 of 743 peers
Request with limit: 100, offset: 200 got 100 of 743 peers
Request with limit: 100, offset: 300 got 100 of 743 peers
Request with limit: 100, offset: 400 got 100 of 743 peers
Request with limit: 100, offset: 500 got 100 of 743 peers
Request with limit: 100, offset: 600 got 100 of 743 peers
Request with limit: 100, offset: 700 got 43 of 743 peers
Found 743 unique peers
PEERS - Total: 743, Connected: 651, Open API: 23, Duplicated: 25, Pingable: 275
It should update state value of each peer correctly because it checks with getNetworkStatus everytime and updates value of state based on whether it is in connectedPeers or not. Below is the function that updates it everytime we call network:getPeersCountByFilter I will revisit this part of code again but I'm also suspecting if its not something related to PeerPool that reports disconnected peers are connected ones.
As I suspected, the issue is coming from p2p-lib not related to state value, I logged connectedPeers, newPeers and triedPeers and I can see that the connected peers count fluctuates drastically.
tried peers count: 123
new peers count: 224
Connected peers count: 118
tried peers count: 123
new peers count: 224
Connected peers count: 725
tried peers count: 123
new peers count: 676
Connected peers count: 725
My guess is that when we go through discovery, we find these inactive peers and we try to connected with them again. While connecting to them we also add them meanwhile to PeerPool's _peerMap.
Later we disconnect with them but for the short time they reside in PeerPool. So the bug is in the way we share connectedPeers list through P2P lib, I think we should only add a peer to connectedPeers list when we make a successful connection.
This issue is probably also causing dead peers to be constantly propagated through the network. The list is growing, while connected peers stay at the same level:
PEERS - Total: 743, Connected: 122, Open API: 23, Duplicated: 20, Pingable: 277
after a few days:
PEERS - Total: 839, Connected: 121, Open API: 26, Duplicated: 30, Pingable: 300
Most helpful comment
As I suspected, the issue is coming from p2p-lib not related to
statevalue, I loggedconnectedPeers,newPeersandtriedPeersand I can see that the connected peers count fluctuates drastically.My guess is that when we go through discovery, we find these inactive peers and we try to connected with them again. While connecting to them we also add them meanwhile to
PeerPool's_peerMap.https://github.com/LiskHQ/lisk-sdk/blob/4797f8d7c4a4d245e69c8531139b00018da4fc20/elements/lisk-p2p/src/peer_pool.ts#L308
Later we disconnect with them but for the short time they reside in PeerPool. So the bug is in the way we share
connectedPeerslist through P2P lib, I think we should only add a peer to connectedPeers list when we make a successful connection.