I[07-02|03:03:13.313] Absent validator 747A72E570DD41F95687D3C2DD40128A9DBF798A at height 721617 module=x/slashing
I[07-02|03:03:13.314] Validator 747A72E570DD41F95687D3C2DD40128A9DBF798A past min height of 582143 and below signed blocks threshold of 50 module=x/slashing
E[07-02|03:03:13.333] CONSENSUS FAILURE!!! module=consensus err="Attempted to slash a nonexistent validator with address 747A72E570DD41F95687D3C2DD40128A9DBF798A" stack="goroutine 2537 [running]:\nruntime/debug.Stack(0xc49e5225b0, 0xbf46c0, 0xc42ce1a560)\n\t/usr/lib/go-1.10/src/runtime/debug/stack.go:24 +0xa7\ngithub.com/cosmos/cosmos-sdk/vendor/github.com/tendermint/tendermint/consensus.(*ConsensusState).receiveRoutine.func1(0xc435b74000)\n\t/home/fish/go/src/github.com/cosmos/cosmos-sdk/vendor/github.com/tendermint/tendermint/consensus/state.go:538 +0x57\npanic(0xbf46c0, 0xc42ce1a560)\n\t/usr/lib/go-1.10/src/runtime/panic.go:505 +0x229\ngithub.com/cosmos/cosmos-sdk/x/stake.Keeper.Slash(0xed9bc0, 0xc421136590, 0xc42007e480, 0xed9bc0, 0xc421136570, 0xee65a0, 0xc420bb4730, 0xc42007e480, 0x4, 0xee03a0, ...)\n\t/home/fish/go/src/github.com/cosmos/cosmos-sdk/x/stake/keeper.go:807 +0x8b6\ngithub.com/cosmos/cosmos-sdk/x/slashing.Keeper.handleValidatorSignature(0xed9bc0, 0xc4211365a0, 0xc42007e480, 0xee3e00, 0xc420bb4780, 0xa, 0xee03a0, 0xc433520a80, 0xc43886c380, 0x9, ...)\n\t/home/fish/go/src/github.com/cosmos/cosmos-sdk/x/slashing/keeper.go:87 +0x634\ngithub.com/cosmos/cosmos-sdk/x/slashing.BeginBlocker(0xee03a0, 0xc433520a80, 0xc43886c380, 0x9, 0xc43cfb18e0, 0x14, 0x20, 0xc499d9e750, 0x9, 0xb02d1, ...)\n\t/home/fish/go/src/github.com/cosmos/cosmos-sdk/x/slashing/tick.go:40 +0x5a2\ngithub.com/cosmos/cosmos-sdk/cmd/gaia/app.(*GaiaApp).BeginBlocker(0xc4200aca00, 0xee03a0, 0xc433520a80, 0xc43886c380, 0x9, 0xc43cfb18e0, 0x14, 0x20, 0xc499d9e750, 0x9, ...)\n\t/home/fish/go/src/github.com/cosmos/cosmos-sdk/cmd/gaia/app/app.go:117 +0xc3\ngithub.com/cosmos/cosmos-sdk/cmd/gaia/app.(*GaiaApp).BeginBlocker-fm(0xee03a0, 0xc433520a80, 0xc43886c380, 0x9, 0xc43cfb18e0, 0x14, 0x20, 0xc499d9e750, 0x9, 0xb02d1, ...)\n\t/home/fish/go/src/github.com/cosmos/cosmos-sdk/cmd/gaia/app/app.go:90 +0xa0\ngithub.com/cosmos/cosmos-sdk/baseapp.(*BaseApp).BeginBlock(0xc4200da380, 0xc43cfb18e0, 0x14, 0x20, 0xc499d9e750, 0x9,
0xb02d1, 0x5b3995f0, 0x0, 0x6c74, ...)\n\t/home/fish/go/src/github.com/cosmos/cosmos-sdk/baseapp/baseapp.go:386 +0x157\ngithub.com/cosmos/cosmos-sdk/vendor/github.com/tendermint/abci/client.(*localClient).BeginBlockSync(0xc4601dac60, 0xc43cfb18e0, 0x14, 0x20, 0xc499d9e750, 0x9, 0xb02d1, 0x5b3995f0, 0x0, 0x6c74, ...)\n\t/home/fish/go/src/github.com/cosmos/cosmos-sdk/vendor/github.com/tendermint/abci/client/local_client.go:206 +0xab\ngithub.com/cosmos/cosmos-sdk/vendor/github.com/tendermint/tendermint/proxy.(*appConnConsensus).BeginBlockSync(0xc437e325a0, 0xc43cfb18e0, 0x14, 0x20, 0xc499d9e750, 0x9, 0xb02d1, 0x5b3995f0, 0x0, 0x6c74, ...)\n\t/home/fish/go/src/github.com/cosmos/cosmos-sdk/vendor/github.com/tendermint/tendermint/proxy/app_conn.go:69 +0x78\ngithub.com/cosmos/cosmos-sdk/vendor/github.com/tendermint/tendermint/state.execBlockOnProxyApp(0xee0aa0, 0xc449a1c7c0, 0xee38c0, 0xc437e325a0, 0xc43fefeb90, 0xc46c8ff6e0, 0xee6f20, 0xc42cb455c8, 0x1, 0xc4240a3280, ...)\n\t/home/fish/go/src/github.com/cosmos/cosmos-sdk/vendor/github.com/tendermint/tendermint/state/execution.go:190 +0x547\ngithub.com/cosmos/cosmos-sdk/vendor/github.com/tendermint/tendermint/state.(*BlockExecutor).ApplyBlock(0xc4601db440, 0xc4697ad990, 0x9, 0xb02d0, 0x6c74, 0xc45d45cfe0, 0x14, 0x20, 0x1, 0xc4240a3280, ...)\n\t/home/fish/go/src/github.com/cosmos/cosmos-sdk/vendor/github.com/tendermint/tendermint/state/execution.go:76 +0x12f\ngithub.com/cosmos/cosmos-sdk/vendor/github.com/tendermint/tendermint/consensus.(*ConsensusState).finalizeCommit(0xc435b74000, 0xb02d1)\n\t/home/fish/go/src/github.com/cosmos/cosmos-sdk/vendor/github.com/tendermint/tendermint/consensus/state.go:1267 +0xbb2\ngithub.com/cosmos/cosmos-sdk/vendor/github.com/tendermint/tendermint/consensus.(*ConsensusState).tryFinalizeCommit(0xc435b74000, 0xb02d1)\n\t/home/fish/go/src/github.com/cosmos/cosmos-sdk/vendor/github.com/tendermint/tendermint/consensus/state.go:1198 +0x468\ngithub.com/cosmos/cosmos-sdk/vendor/github.com/tendermint/tendermint/consensus.(*ConsensusState).enterComm
it.func1(0xc435b74000, 0x0, 0xb02d1)\n\t/home/fish/go/src/github.com/cosmos/cosmos-sdk/vendor/github.com/tendermint/tendermint/consensus/state.go:1146 +0x98\ngithub.com/cosmos/cosmos-sdk/vendor/github.com/tendermint/tendermint/consensus.(*ConsensusState).enterCommit(0xc435b74000, 0xb02d1, 0x0)\n\t/home/fish/go/src/github.com/cosmos/cosmos-sdk/vendor/github.com/tendermint/tendermint/consensus/state.go:1175 +0x802\ngithub.com/cosmos/cosmos-sdk/vendor/github.com/tendermint/tendermint/consensus.(*ConsensusState).addVote(0xc435b74000, 0xc4758f08c0, 0xc45ef49560, 0x28, 0x0, 0x0, 0xc469841658)\n\t/home/fish/go/src/github.com/cosmos/cosmos-sdk/vendor/github.com/tendermint/tendermint/consensus/state.go:1532 +0xbb4\ngithub.com/cosmos/cosmos-sdk/vendor/github.com/tendermint/tendermint/consensus.(*ConsensusState).tryAddVote(0xc435b74000, 0xc4758f08c0, 0xc45ef49560, 0x28, 0xf6, 0xc47ade3e00)\n\t/home/fish/go/src/github.com/cosmos/cosmos-sdk/vendor/github.com/tendermint/tendermint/consensus/state.go:1390 +0x56\ngithub.com/cosmos/cosmos-sdk/vendor/github.com/tendermint/tendermint/consensus.(*ConsensusState).handleMsg(0xc435b74000, 0xbf6fa0, 0xc4269f4420, 0xc45ef49560, 0x28)\n\t/home/fish/go/src/github.com/cosmos/cosmos-sdk/vendor/github.com/tendermint/tendermint/consensus/state.go:607 +0x643\ngithub.com/cosmos/cosmos-sdk/vendor/github.com/tendermint/tendermint/consensus.(*ConsensusState).receiveRoutine(0xc435b74000, 0x0)\n\t/home/fish/go/src/github.com/cosmos/cosmos-sdk/vendor/github.com/tendermint/tendermint/consensus/state.go:560 +0x6c0\ncreated by github.com/cosmos/cosmos-sdk/vendor/github.com/tendermint/tendermint/consensus.(*ConsensusState).OnStart\n\t/home/fish/go/src/github.com/cosmos/cosmos-sdk/vendor/github.com/tendermint/tendermint/consensus/state.go:272 +0x140\n"
I[07-02|03:03:13.632] Ensure peers module=p2p numOutPeers=10 numInPeers=30 numDialing=0 numToDial=0
E[07-02|03:03:13.753] Send failed module=p2p peer=206.81.7.62:55370 channel=32 conn=MConn{206.81.7.62:55370} msgBytes=C96A6FAB0900000000000B02CD100018042000280004
This may be related to https://github.com/cosmos/cosmos-sdk/issues/1466 and https://github.com/cosmos/cosmos-sdk/issues/1484
Right before this consensus failure, I killed off a bunch of my ex-validators. They haven't been voting or bonded for a long time since I didn't have enough steak (you needed 3/4 steaks due to the issues referenced above)
This may be related: https://github.com/cosmos/cosmos-sdk/issues/1427
This is definitely related: https://github.com/cosmos/cosmos-sdk/issues/1470
The only missing validator in the latest height is the zero voting power validator 747A72E570DD41F95687D3C2DD40128A9DBF798A.
This validator - 747A72E570DD41F95687D3C2DD40128A9DBF798A / cosmosvaladdr1w3a89etsm4qlj45860pd6sqj32wm77v28s72rk (bech32) had zero power at this block (as returned by Tendermint through ABCI) and shoudn't have been in the validator set. Debugging further...
Possibly a mismatch in Tendermint & SDK-side validator set tracking - https://github.com/cosmos/cosmos-sdk/issues/1470#issuecomment-401990231.
Following up, I believe this is a symptom of the combination of #1519 and #1513 - validator 747A72E570DD41F95687D3C2DD40128A9DBF798A was mistakenly stored as the cliff validator and kept in the Tendermint validator set even after it has been unbonded (and its candidacy had been removed). Later, that validator eventually went offline, and we tried to slash it - but, of course, it wasn't in the application state since it shouldn't have been in the Tendermint validator set in the first place.
Checking that hypothesis by tracing store states.
At block 582090, validator 747A72E570DD41F95687D3C2DD40128A9DBF798A was unbonded and deleted from the store:
gaiadebug tx "1gHwYl3rD6U26qMKFHaNOFi6UxrMkOx+K4mpynX7VHKeEhR2jThYulMazJDsfiuJqcp1+1RynhoDTUFYBBMOAwERAAAAAAAAAAAEEQAAAAAAAw1ABB4DAQ8WJN5iIOs8d8HI0lIJO+4fGflR+3MT9KJCStTuu81E32eVTHfiFz2h2ypAbWL3f+v6lgW2RQzpnbD/0rHtT5D4Jeal8OPD/yGNbda7mcyUNSwqn5xuJSNp0bjuHqgcrqEGdvGjl5bAsjcAABkAAAAAAAAA5yEAAAAAAAAABAQE"
{
"type": "8EFE47F0625DE8",
"value": {
"msg": {
"type": "ECCB5CA536EAA0",
"value": {
"delegator_addr": "768D3858BA531ACC90EC7E2B89A9CA75FB54729E",
"validator_addr": "768D3858BA531ACC90EC7E2B89A9CA75FB54729E",
"shares": "MAX"
}
},
"fee": {
"amount": [
{
"denom": "",
"amount": 0
}
],
"gas": 200000
},
"signatures": [
{
"pub_key": {
"type": "AC26791624DE60",
"value": "6zx3wcjSUgk77h8Z+VH7cxP0okJK1O67zUTfZ5VMd+I="
},
"signature": {
"type": "6BF5903DA1DB28",
"value": "bWL3f+v6lgW2RQzpnbD/0rHtT5D4Jeal8OPD/yGNbda7mcyUNSwqn5xuJSNp0bjuHqgcrqEGdvGjl5bAsjcAAA=="
},
"account_number": 231,
"sequence": 4
}
]
}
}
At that block, this validator wasn't bonded in the application state - so the app did not instruct Tendermint to remove it (no validator set updates) - but it was in Tendermint's validator set, so it remained there (and kept signing blocks!) until later going offline just before the consensus fault, when we tried to slash the validator, couldn't find it in the app state, and panicked. Why was it in Tendermint's validator set? Continuing to trace store updates, we find:
At block 571445, an unrevoke transaction was sent for validator 631E2B3AF1F9FE4873D21B7B74CEF568E834F77D.
gaiadebug tx "uwHwYl3rD7A8uusKFGMeKzrx+f5Ic9Ibe3TO9WjoNPd9BBMOAwERAAAAAAAAAAAEEQAAAAAABhqABB4DAQ8WJN5iIGok5Vt9nXlgYBnpQxbHGQ236WtH17Cy/KBKoTJI4W49Fz2h2ypAZSbGwBzookjGj//4ncSsrzO6eXrWMpF0QqpGbR3gYkRatijiS5LpKYV7m+sq/Iw5V50ILZ3s3qVhhRXADQ41ChkAAAAAAAABLCEAAAAAAAAABAQE"
{
"type": "8EFE47F0625DE8",
"value": {
"msg": {
"type": "A73B89B03CBAE8",
"value": {
"address": "631E2B3AF1F9FE4873D21B7B74CEF568E834F77D"
}
},
"fee": {
"amount": [
{
"denom": "",
"amount": 0
}
],
"gas": 400000
},
"signatures": [
{
"pub_key": {
"type": "AC26791624DE60",
"value": "aiTlW32deWBgGelDFscZDbfpa0fXsLL8oEqhMkjhbj0="
},
"signature": {
"type": "6BF5903DA1DB28",
"value": "ZSbGwBzookjGj//4ncSsrzO6eXrWMpF0QqpGbR3gYkRatijiS5LpKYV7m+sq/Iw5V50ILZ3s3qVhhRXADQ41Cg=="
},
"account_number": 300,
"sequence": 4
}
]
}
}
This resulted in two validator set updates. The first was validator 631E2B3AF1F9FE4873D21B7B74CEF568E834F77D being unrevoked, as expected, but the second was validator 747A72E570DD41F95687D3C2DD40128A9DBF798A being passed with a power of zero - which was then added to Tendermint's validator set at the next block.
Debugging that transaction, logging bonded/unbonded validators, we see:
ID CommitID{[218 151 19 112 168 117 250 210 47 69 183 61 136 96 161 151 201 225 160 109]:B02D0}
LastBlockHeight 721616
Txn: BB01F0625DEB0FB03CBAEB0A14631E2B3AF1F9FE4873D21B7B74CEF568E834F77D04130E030111000000000000000004110000000000061A80041E03010F1624DE62206A24E55B7D9D79606019E94316C7190DB7E96B47D7B0B2FCA04AA13248E16E3D173DA1DB2A406526C6C01CE8A248C68FFFF89DC4ACAF33BA797AD632917442AA466D1DE062445AB628E24B92E929857B9BEB2AFC8C39579D082D9DECDEA5618515C00D0E350A19000000000000012C2100000000000000040404
bonded validator 5DE0434D4E5ACA408BB0823B7243B79B1B251717
bonded validator 747A72E570DD41F95687D3C2DD40128A9DBF798A
unbonded validator: 747A72E570DD41F95687D3C2DD40128A9DBF798A
I[07-05|23:00:41.255] Validator 5DE0434D4E5ACA408BB0823B7243B79B1B251717 unrevoked module=x/stake
This validator, stored with zero power - #1513 - was incorrectly set as the cliff validator - #1519 - then rebonded and unbonded through the logic in updateBondedValidators. When a validator is rebonded and unbonded, the SDK will send it with zero power back to Tendermint - which if it was not previously in Tendermint's validator set (as in this case), will cause Tendermint to add it with zero power!
Closing in favor of #1513 / #1519.
makes sense as far as I can tell
Most helpful comment
Following up, I believe this is a symptom of the combination of #1519 and #1513 - validator
747A72E570DD41F95687D3C2DD40128A9DBF798Awas mistakenly stored as the cliff validator and kept in the Tendermint validator set even after it has been unbonded (and its candidacy had been removed). Later, that validator eventually went offline, and we tried to slash it - but, of course, it wasn't in the application state since it shouldn't have been in the Tendermint validator set in the first place.Checking that hypothesis by tracing store states.