Waiting for first block on node 2
-- App crashed --
E[10-19|12:41:45.443] Error dialing peer module=p2p err="duplicate CONN<127.0.0.1:26656>: %!s(<nil>)"
(node:248) PromiseRejectionHandledWarning: Promise rejection was handled asynchronously (rejection id: 2)
Node 2 is running
Too long with no output (exceeded 2m0s)
This looks like a Tendermint bug:
I[10-20|16:43:03.588] Inbound Peer rejected module=p2p err="auth failure: secrect conn failed: EOF" numPeers=1
I[10-20|16:42:04.778] Will dial address module=p2p [email protected]:26656
I[10-20|16:42:04.778] Dialing peer module=p2p [email protected]:26656
I[10-20|16:42:04.779] Starting Peer module=p2p peer=0xa5e8e0 impl="Peer{MConn{127.0.0.1:26656} 39b7b1e17e9fac30803366da9dc7e54b74cd5a40 out}"
I[10-20|16:42:04.779] Starting MConnection module=p2p peer=0xa5e8e0 impl=MConn{127.0.0.1:26656}
I[10-20|16:42:04.779] Added peer module=p2p peer="Peer{MConn{127.0.0.1:26656} 39b7b1e17e9fac30803366da9dc7e54b74cd5a40 out}"
I[10-20|16:42:05.089] Executed block module=state height=1 validTxs=0 invalidTxs=0
I[10-20|16:42:05.090] Committed state module=state height=1 txs=0 appHash=47FB9D1E979742CCA1179934162168316C70C91C
I[10-20|16:42:05.090] Recheck txs module=mempool numtxs=0 height=1
I[10-20|16:42:05.091] Indexed block module=txindex height=1
I[10-20|16:42:05.531] Dialing peer module=p2p [email protected]:26656
E[10-20|16:42:05.531] Error dialing peer module=p2p err="duplicate CONN<127.0.0.1:26656>: %!s(<nil>)"
I reproduced this as well but rerunning the tests resulted in a pass.
Need to set allow_duplicate_ip = true under [p2p] in the config
thank you @ebuchman for fixing this
this is still an issue 馃槶
So there's a known bug in Tendermint that's causing us to dial peers we're already connected to and fail to connect to them (https://github.com/tendermint/tendermint/issues/2716), but that shouldn't be causing issues with the chain halting.
I'm not sure how to understand the output we're seeing in CI, eg. from the link David posted with the failure:
Redirecting node 2 output to /home/circleci/project/testArtifacts/node_home_2/process.log
Waiting for first block on node 2
-- App crashed --
E[10-20|16:42:05.531] Error dialing peer module=p2p err="duplicate CONN<127.0.0.1:26656>: %!s(<nil>)"
(node:252) PromiseRejectionHandledWarning: Promise rejection was handled asynchronously (rejection id: 2)
Node 2 is running
Too long with no output (exceeded 2m0s)
Failing to dial the peer wouldn't crash the app. What does App crashed imply here and why/how does it output just a single line of logs from Tendermint? Is it possible to see the output (eg. /home/circleci/project/testArtifacts/node_home_2/process.log in this case) when this happens?
Here you go: https://9475-99653950-gh.circle-artifacts.com/0/home/circleci/project/testArtifacts/node_home_2/process.log
We record all the logs and node folders in the e2e tests
So I'm not seeing any issues in that log.
Waiting for first block on node 2
Too long with no output (exceeded 2m0s)
It committed 240 blocks over the 2-minutes in that log, so I'm not sure how to understand what the test is saying actually went wrong?
I see the error. We fail the script if the outputs see an error. There are errors, but those are not breaking the node. My apologies. I will change this today.
Most helpful comment
Need to set
allow_duplicate_ip = trueunder[p2p]in the config