On GoS6, I changed the values in gaiad.toml in two of my nodes (one validating node, one not-validating node). After stopping and restarting the node again immediately, I get the following:
I[2019-02-11|15:53:34.544] Starting ABCI with Tendermint module=main
E[2019-02-11|15:53:35.878] Error dialing peer module=p2p err="dial tcp X:X:X:X:12345: i/o timeout"
E[2019-02-11|15:53:36.032] Corrupted entry. Skipping... module=consensus wal=/home/ubuntu/.gaiad/data/cs.wal/wal err="DataCorruptionError[failed to read data: EOF]"
E[2019-02-11|15:53:36.202] data has been corrupted in last height of consensus WAL module=consensus err="DataCorruptionError[failed to read data: EOF]" height=1944
E[2019-02-11|15:53:36.202] Encountered corrupt WAL file module=consensus err="DataCorruptionError[failed to read data: EOF]"
E[2019-02-11|15:53:36.202] Please repair the WAL file before restarting module=consensus
You can attempt to repair the WAL as follows:
----
WALFILE=~/.tendermint/data/cs.wal/wal
cp $WALFILE ${WALFILE}.bak # backup the file
go run scripts/wal2json/main.go $WALFILE > wal.json # this will panic, but can be ignored
rm $WALFILE # remove the corrupt file
go run scripts/json2wal/main.go wal.json $WALFILE # rebuild the file without corruption
----
E[2019-02-11|15:53:36.202] Error starting conS module=consensus err="DataCorruptionError[failed to read data: EOF]"
E[2019-02-11|15:53:36.669] Error dialing peer module=p2p err="dial tcp X:X:X:X:12345: connect: connection refused"
E[2019-02-11|15:53:37.559] Error dialing peer module=p2p err="dial tcp X:X:X:X:12345: i/o timeout"
E[2019-02-11|15:53:37.813] Connection failed @ recvRoutine (reading byte) module=p2p peer=censoredcensored@X:X:X:X:12345 conn=MConn{X:X:X:X:12345} err=EOF
E[2019-02-11|15:53:37.813] Stopping peer for error module=p2p peer="Peer{MConn{X:X:X:X:12345} censoredcensored out}" err=EOF
E[2019-02-11|15:53:38.146] Connection failed @ recvRoutine (reading byte) module=p2p peer=censoredcensored@X:X:X:X:12345 conn=MConn{X:X:X:X:12345} err=EOF
E[2019-02-11|15:53:38.146] Stopping peer for error module=p2p peer="Peer{MConn{X:X:X:X:12345} censoredcensored out}" err=EOF
E[2019-02-11|15:53:38.146] MConnection flush failed module=p2p peer=censoredcensored@X:X:X:X:12345 err="write tcp X:X:X:X:59324->X:X:X:X:12345: use of closed network connection"
E[2019-02-11|15:53:38.479] Connection failed @ recvRoutine (reading byte) module=p2p peer=censoredcensored@X:X:X:X:12345 conn=MConn{X:X:X:X:12345} err=EOF
This caused me to miss a few blocks on Game of Stakes 6. The issue was solved by stopping the node again, waiting for a few seconds and then restarting it.
cosmos-sdk: 0.31.1
git commit: b9e523212ec47910a00db00be2f1b7935e201ee7
vendor hash: 85e6c5c7a700e822cccf169c97f4a3974312dfd1
go version go1.11.5 linux/amd64
gaiad startI've been able to reproduce this issue:
sudo systemctl stop gaiad && sudo systemctl start gaiad && journalctl -u gaiad -f
Feb 11 16:49:29 gaiaeleven gaiad[22266]: I[2019-02-11|16:49:29.917] Committed state module=state height=24937 txs=0 appHash=42DA9DE2D2A0B110652AEC2A8D0F063B9F67463C7DACBF9AE87F0F4B4C863859
Feb 11 16:49:35 gaiaeleven gaiad[22266]: I[2019-02-11|16:49:35.918] Executed block module=state height=24938 validTxs=1 invalidTxs=0
Feb 11 16:49:35 gaiaeleven gaiad[22266]: I[2019-02-11|16:49:35.933] Committed state module=state height=24938 txs=1 appHash=91FCE175DC0C3F44B30FDFB6F40F17C157984473E5A5FE3569142410F13B9FBC
Feb 11 16:49:46 gaiaeleven gaiad[22266]: I[2019-02-11|16:49:46.537] Executed block module=state height=24939 validTxs=0 invalidTxs=0
Feb 11 16:49:46 gaiaeleven gaiad[22266]: I[2019-02-11|16:49:46.552] Committed state module=state height=24939 txs=0 appHash=E376AF81A12BE4556D89786FFF3B6A479D3F36D77F479992960BC4B7738A21EB
Feb 11 16:49:52 gaiaeleven systemd[1]: Stopping Cosmos Gaia Node...
Feb 11 16:49:52 gaiaeleven systemd[1]: gaiad.service: Main process exited, code=exited, status=143/n/a
Feb 11 16:49:52 gaiaeleven systemd[1]: gaiad.service: Failed with result 'exit-code'.
Feb 11 16:49:52 gaiaeleven systemd[1]: Stopped Cosmos Gaia Node.
Feb 11 16:49:52 gaiaeleven systemd[1]: Started Cosmos Gaia Node.
Feb 11 16:49:52 gaiaeleven gaiad[6922]: I[2019-02-11|16:49:52.535] Starting ABCI with Tendermint module=main
Feb 11 16:49:53 gaiaeleven gaiad[6922]: E[2019-02-11|16:49:53.560] Dialing failed module=pex addr=4e013719b1a3769455ff858a20b771bcd1c08596@[2a05:d014:60:7302:d020:36fe:9d7c:cbce]:26656 err="dial tcp [2a05:d014:60:7302:d020:36fe:9d7c:cbce]:26656: connect: network is unreachable" attempts=0
Feb 11 16:49:53 gaiaeleven gaiad[6922]: E[2019-02-11|16:49:53.563] Dialing failed module=pex addr=ed840a7f45e998de1189088504c8bb7ab71be2b4@[2a05:d014:60:7302:84d8:f984:23e1:8077]:26656 err="dial tcp [2a05:d014:60:7302:84d8:f984:23e1:8077]:26656: connect: network is unreachable" attempts=0
Feb 11 16:49:53 gaiaeleven gaiad[6922]: E[2019-02-11|16:49:53.563] Dialing failed module=pex addr=8d00cb7315bccfbe02c4cd7415086a1b47718a6f@[2a05:d014:60:7302:d6ab:5d68:75c4:3bb0]:26656 err="dial tcp [2a05:d014:60:7302:d6ab:5d68:75c4:3bb0]:26656: connect: network is unreachable" attempts=0
Feb 11 16:49:53 gaiaeleven gaiad[6922]: E[2019-02-11|16:49:53.580] Dialing failed module=pex [email protected]:26656 err="dial tcp 35.247.46.115:26656: connect: connection refused" attempts=0
Feb 11 16:49:54 gaiaeleven gaiad[6922]: E[2019-02-11|16:49:54.063] Dialing failed module=pex [email protected]:26656 err="incompatible: Peer is on a different network. Got game_of_stakes_6, expected gaia-11001" attempts=0
Feb 11 16:49:54 gaiaeleven gaiad[6922]: E[2019-02-11|16:49:54.515] Corrupted entry. Skipping... module=consensus wal=/home/johnzampolin/.gaiad/data/cs.wal/wal err="DataCorruptionError[failed to read data: EOF]"
Feb 11 16:49:54 gaiaeleven gaiad[6922]: E[2019-02-11|16:49:54.597] Dialing failed module=pex [email protected]:26656 err="dial tcp 123.56.250.241:26656: i/o timeout" attempts=0
Feb 11 16:49:54 gaiaeleven gaiad[6922]: E[2019-02-11|16:49:54.598] Dialing failed module=pex [email protected]:32656 err="dial tcp 35.196.99.139:32656: i/o timeout" attempts=0
Feb 11 16:49:54 gaiaeleven gaiad[6922]: E[2019-02-11|16:49:54.598] Dialing failed module=pex [email protected]:26656 err="dial tcp 188.3.142.201:26656: i/o timeout" attempts=0
Feb 11 16:49:54 gaiaeleven gaiad[6922]: E[2019-02-11|16:49:54.598] Dialing failed module=pex [email protected]:26656 err="dial tcp 35.198.162.20:26656: i/o timeout" attempts=0
Feb 11 16:49:54 gaiaeleven gaiad[6922]: E[2019-02-11|16:49:54.797] Corrupted entry. Skipping... module=consensus wal=/home/johnzampolin/.gaiad/data/cs.wal/wal err="DataCorruptionError[failed to read data: EOF]"
Feb 11 16:49:54 gaiaeleven gaiad[6922]: E[2019-02-11|16:49:54.800] Corrupted entry. Skipping... module=consensus wal=/home/johnzampolin/.gaiad/data/cs.wal/wal err="DataCorruptionError[failed to read data: EOF]"
Feb 11 16:49:55 gaiaeleven gaiad[6922]: E[2019-02-11|16:49:55.084] data has been corrupted in last height of consensus WAL module=consensus err="DataCorruptionError[failed to read data: EOF]" height=24940
Feb 11 16:49:55 gaiaeleven gaiad[6922]: E[2019-02-11|16:49:55.085] Encountered corrupt WAL file module=consensus err="DataCorruptionError[failed to read data: EOF]"
Feb 11 16:49:55 gaiaeleven gaiad[6922]: E[2019-02-11|16:49:55.086] Please repair the WAL file before restarting module=consensus
Feb 11 16:49:55 gaiaeleven gaiad[6922]: You can attempt to repair the WAL as follows:
Feb 11 16:49:55 gaiaeleven gaiad[6922]: ----
Feb 11 16:49:55 gaiaeleven gaiad[6922]: WALFILE=~/.tendermint/data/cs.wal/wal
Feb 11 16:49:55 gaiaeleven gaiad[6922]: cp $WALFILE ${WALFILE}.bak # backup the file
Feb 11 16:49:55 gaiaeleven gaiad[6922]: go run scripts/wal2json/main.go $WALFILE > wal.json # this will panic, but can be ignored
Feb 11 16:49:55 gaiaeleven gaiad[6922]: rm $WALFILE # remove the corrupt file
Feb 11 16:49:55 gaiaeleven gaiad[6922]: go run scripts/json2wal/main.go wal.json $WALFILE # rebuild the file without corruption
Feb 11 16:49:55 gaiaeleven gaiad[6922]: ----
Feb 11 16:49:55 gaiaeleven gaiad[6922]: E[2019-02-11|16:49:55.086] Error starting conS module=consensus err="DataCorruptionError[failed to read data: EOF]"
It looks like this is a tendermint bug. I'm going to go ahead and re-open this issue over there.
Most helpful comment
It looks like this is a tendermint bug. I'm going to go ahead and re-open this issue over there.