Is your feature request related to a problem/context ? Please describe if applicable.
Having just implemented #1610 I was able to observe how a node after successfully bootstrapping from the official IOHK peers will become "stuck" for an excessive amount of time.
Shortly after bootstrapping I observed that 2 of the 3 "trusted" peers (the initial nodes used for gossip) were immediately quarantined:
MBP:jormungandr michaelfazio$ jcli rest v0 node stats get --host http://localhost:3100/api
---
blockRecvCnt: 0
lastBlockContentSize: 315
lastBlockDate: "37.40949"
lastBlockFees: 800000
lastBlockHash: eb322e2f7ac960cfa8b5c2c29afb853366c5092fd35594d1c6624b6c4ce85630
lastBlockHeight: "115750"
lastBlockSum: 93667164000
lastBlockTime: "2020-01-20T17:58:35+00:00"
lastBlockTx: 1
lastReceivedBlockTime: ~
peerAvailableCnt: "3"
peerQuarantinedCnt: "2"
peerUnreachableCnt: "0"
state: Running
txRecvCnt: 0
uptime: 54
version: jormungandr 0.8.6-546497fc+
After a few more seconds, all five official IOHK nodes were quarantined:
blockRecvCnt: 0
lastBlockContentSize: 315
lastBlockDate: "37.40949"
lastBlockFees: 800000
lastBlockHash: eb322e2f7ac960cfa8b5c2c29afb853366c5092fd35594d1c6624b6c4ce85630
lastBlockHeight: "115750"
lastBlockSum: 93667164000
lastBlockTime: "2020-01-20T17:58:35+00:00"
lastBlockTx: 1
lastReceivedBlockTime: ~
peerAvailableCnt: "0"
peerQuarantinedCnt: "5"
peerUnreachableCnt: "0"
state: Running
txRecvCnt: 0
uptime: 98
version: jormungandr 0.8.6-546497fc+
```
These nodes were quarantined because a connection could not be established to any of them. Presumably the official IOHK nodes are under significant load and refusing new connections. _I came across some anecdotal evidence that each official bootstrap node is set to maximum of 600 peers._
On the face of it, this is not so big a problem if the bootstrap nodes could service a reasonable set of peers, however as they cannot, and because the default quarantine policy for Jormungandr is 30 minutes, a node that experiences immediate quarantine of all trusted peers will never bootstrap. This is because by the time all the trusted peers are removed from quarantine, and assuming at least one can accept the new peer, the blockchain at this point has certainly advanced too far to catch up.
Describe the solution you'd like
Remove or reduce quarantine settings for trusted peers only; or,
Allow the quarantine settings for trusted peers to be configured differently from other peers.
Additional context
Add any other context about the feature request here.
Note that a viable workaround for this issue is to set the default quarantine period to a very low value (e.g. 30 seconds). I don't recommend this as a longer term solution however.
I think that "trusted" peers should remain "trusted". In other words, never quarantined. Often-times, trusted peers are your own passive nodes and might be bootstrapping or otherwise busy. You would not want them to be added to a quarantined list in any circumstances.
I think that "trusted" peers should remain "trusted". In other words, never quarantined. Often-times, trusted peers are your own passive nodes and might be bootstrapping or otherwise busy. You would not want them to be added to a quarantined list in any circumstances.
I think this is a fair point.
Just found a flow-on bug that makes this even more critical. I've detailed in #1617. Essentially, if none of the trusted peers gossip before the quarantine period elapses they are removed entirely from both quarantine and available peer list:
MBP:~ michaelfazio$ jcli rest v0 node stats get --host http://localhost:3100/api
---
blockRecvCnt: 0
lastBlockContentSize: 0
lastBlockDate: "38.42138"
lastBlockFees: 0
lastBlockHash: 17b672a21c258c52da086c892d9561439bb41b8ac01f427dfea39c7af880d307
lastBlockHeight: "118986"
lastBlockSum: 0
lastBlockTime: "2020-01-21T18:38:13+00:00"
lastBlockTx: 0
lastReceivedBlockTime: ~
peerAvailableCnt: "0"
peerQuarantinedCnt: "0"
peerUnreachableCnt: "0"
state: Running
txRecvCnt: 0
uptime: 629
version: jormungandr 0.8.6-be6b1d86
Very easy to reproduce by setting the quarantine value quite low.
As per #1386 we will switch to only use trusted peers for discovery of p2p nodes.
Most helpful comment
As per #1386 we will switch to only use trusted peers for discovery of p2p nodes.