When someone double signed at the first block, the slashing module will not handle it correctly.
When generating the first block, someone may double sign and trigger the slashing module.
log:
Confirmed double sign from FDC8E91E2F361F4BFCCEFA80C1ABAD138E5175AE at height 1, age of 0 less than max age of 120 module=x/slashing
E[09-21|13:27:34.653] CONSENSUS FAILURE!!! module=consensus err="impossible attempt to slash future infraction at height 1 but we are at height 0" stack="goroutine 1193 [running]:\nruntime/debug.Stack(0xc424c4cea0, 0xd5e1c0, 0xc426777690)\n\t/usr/lib/go-
CC @cwgoes
That's unusual, can you replicate this in a testcase or describe more about the order in which you ran the daemons and double-signed? I don't understand why you would get evidence for a double-sign at height 1 before block 0 was committed.
@cwgoes we did some analysis and figured out double-sign evidence was included in the first block, then got broadcasted. However, there's mismatch to treat double-sign evidence. in tendermint the infractionHeight is 1 but when slashing module process the evidence, ctx.BlockHeight() is still 0.
We tried to solve the issue with this PR: https://github.com/irisnet/cosmos-sdk/pull/109
Hope to get more input from you.
@cwgoes
I think it is possible to include evidence at the same height block. For instance, if the network is poor and the network come to consensus of a block in multiple rounds,then then double sign evidence maybe happen to be included by a block proposal. Then in applying the block,the infractionHeight will equal to ctx.BlockHeight()+1.
In this situation, cosmos will throw panic error.
If a new validator is slashed at the first height after it becomes the real validator, cosmos may panic. Because currently the validator has no sign info which is necessary for figuring out jail period. I think this may be a bug too.
If a new validator is slashed at the first height after it becomes the real validator, cosmos may panic.
This definitely seems possible.
I'm still a bit lost on how evidence reaches the block height +1 except maybe the case of the first block?
@alexanderbez
Suppose the blockchain height is A. The block store height is A. ctx.blockHeight() in delieverTx is A. Now the consensus height is A+1. In the consensus process, due to poor network, the network may come to consensus of new block in multiple rounds. Suppose the new validator has done double sign in round N, but in this round, the consensus is failed. Then in next round, the double sign evidence may be included by a block proposal. Thus in applying block, evidence height euqals to ctx.blockHeight() + 1 .
How do you think about the above assumption?
Thus in applying block, evidence height equals to ctx.blockHeight() + 1 .
Yes, but wont the application also be onto the next block?
@alexanderbez
This is low probability thing. Only when the evidence is broadcasted to the block proposal and then it is included in the proposal block, then this will happed. In most case, the evidence will be included in next block.
Suppose the blockchain height is A. The block store height is A. ctx.blockHeight() in delieverTx is A. Now the consensus height is A+1. In the consensus process, due to poor network, the network may come to consensus of new block in multiple rounds. Suppose the new validator has done double sign in round N, but in this round, the consensus is failed. Then in next round, the double sign evidence may be included by a block proposal. Thus in applying block, evidence height euqals to ctx.blockHeight() + 1 .
We don't care about the consensus round, just the consensus height. If we're attempting to come to consensus at some height n, and I double-sign in round 1 but we don't come to consensus until round 3, the evidence passed to the ABCI app when the block is executed after round 3 should still have height n. Have you found otherwise?
If a new validator is slashed at the first height after it becomes the real validator, cosmos may panic. Because currently the validator has no sign info which is necessary for figuring out jail period. I think this may be a bug too.
This is why evidence handling is run after validator signature handling in the slashing tick function. Do you mean as a result of slashing from elsewhere (e.g. governance), or do you have a testcase which replicate this (which seems like a separate issue to evidence at a future height)?
We tried to solve the issue with this PR: irisnet#109
Although this will probably fix the symptom, I don't think this is the right approach. It's illogical to handle double-signature evidence for a block at a height we haven't started processing yet - if Tendermint is indeed sending such evidence, that's a bug in Tendermint.
Do you know how I might replicate either issue, either by running commands locally or with a test case? That's the next step to debug the root cause.
develop and build gaiadmake get_vendor_deps build-linux
I have synced my develop with the latest cosmos-sdk develop branch
develop-byzantine and build gaiad-byzantine. gaiad-byzantine will always vote nil. Here is my changegaiad, mark them as node0, node1, node2, node3 and node4gaiad-byzantine, make them as node-byz-0 and node-byz-1.node1 and node-byz-0 share one validator address. node2 and node-byz-1 share another one validator address. So there will be two groups conflict voting.node1 , node-byz-0, node2, node-byz-1 node0node4.E[10-10|16:26:26.205] CONSENSUS FAILURE!!! module=consensus err="Expected signing info for validator cosmosvalcons15ad46aj0u39n6grh7gjyk494d380jen6xp89w4 but not found" stack="goroutine 709 [running]:\nruntime/debug.Stack(0xc42302eae0, 0xd6c6e0, 0xc42300d600)
/usr/local/go/src/runtime/debug/stack.go:24 +0xa7\ngithub.com/cosmos/cosmos-sdk/vendor/github.com/tendermint/tendermint/consensus.(*ConsensusState).receiveRoutine.func2(0xc4200f0900, 0x105f450)
/home/lhy/go/src/github.com/cosmos/cosmos-sdk/vendor/github.com/tendermint/tendermint/consensus/state.go:585 +0x57\npanic(0xd6c6e0, 0xc42300d600)
/usr/local/go/src/runtime/panic.go:502 +0x229\ngithub.com/cosmos/cosmos-sdk/x/slashing.Keeper.handleDoubleSign(0x10dc4e0, 0xc420b55720, 0xc420a46cb0, 0x10eb700, 0xc4200dd1d0, 0xc420a46cb0, 0x10dc4e0, 0xc420b55750, 0xa, 0x10e4060, ...)
/home/lhy/go/src/github.com/cosmos/cosmos-sdk/x/slashing/keeper.go:84 +0xe75\ngithub.com/cosmos/cosmos-sdk/x/slashing.BeginBlocker(0x10e4060, 0xc422fbcde0, 0xc420413040, 0xc, 0xc42301b800, 0x14, 0x20, 0xc422e5e710, 0x5, 0x1, ...)
/home/lhy/go/src/github.com/cosmos/cosmos-sdk/x/slashing/tick.go:33 +0x6ac\ngithub.com/cosmos/cosmos-sdk/cmd/gaia/app.(*GaiaApp).BeginBlocker(0xc4200c8000, 0x10e4060, 0xc422fbcde0, 0xc420413040, 0xc, 0xc42301b800, 0x14, 0x20, 0xc422e5e710, 0x5, ...)
/home/lhy/go/src/github.com/cosmos/cosmos-sdk/cmd/gaia/app/app.go:140 +0xea\ngithub.com/cosmos/cosmos-sdk/cmd/gaia/app.(*GaiaApp).BeginBlocker-fm(0x10e4060, 0xc422fbcde0, 0xc420413040, 0xc, 0xc42301b800, 0x14, 0x20, 0xc422e5e710, 0x5, 0x1, ...)
/home/lhy/go/src/github.com/cosmos/cosmos-sdk/cmd/gaia/app/app.go:111 +0xba\ngithub.com/cosmos/cosmos-sdk/baseapp.(*BaseApp).BeginBlock(0xc420f7f680, 0xc42301b800, 0x14, 0x20, 0xc422e5e710, 0x5, 0x1, 0x3aca512d, 0xed34faeaa, 0x0, ...)
/home/lhy/go/src/github.com/cosmos/cosmos-sdk/baseapp/baseapp.go:433 +0x23a\ngithub.com/cosmos/cosmos-sdk/vendor/github.com/tendermint/tendermint/abci/client.(*localClient).BeginBlockSync(0xc420b670e0, 0xc42301b800, 0x14, 0x20, 0xc422e5e710, 0x5, 0x1, 0x3aca512d, 0xed34faeaa, 0x0, ...)
/home/lhy/go/src/github.com/cosmos/cosmos-sdk/vendor/github.com/tendermint/tendermint/abci/client/local_client.go:206 +0x9e\ngithub.com/cosmos/cosmos-sdk/vendor/github.com/tendermint/tendermint/proxy.(*appConnConsensus).BeginBlockSync(0xc420480b80, 0xc42301b800, 0x14, 0x20, 0xc422e5e710, 0x5, 0x1, 0x3aca512d, 0xed34faeaa, 0x0, ...)
/home/lhy/go/src/github.com/cosmos/cosmos-sdk/vendor/github.com/tendermint/tendermint/proxy/app_conn.go:69 +0x6b\ngithub.com/cosmos/cosmos-sdk/vendor/github.com/tendermint/tendermint/state.execBlockOnProxyApp(0x10e4ce0, 0xc420b2b260, 0x10e9d80, 0xc420480b80, 0xc4228ec8c0, 0xc422fbc870, 0x10ee220, 0xc42000e918, 0x0, 0x0, ...)
/home/lhy/go/src/github.com/cosmos/cosmos-sdk/vendor/github.com/tendermint/tendermint/state/execution.go:209 +0x421\ngithub.com/cosmos/cosmos-sdk/vendor/github.com/tendermint/tendermint/state.(*BlockExecutor).ApplyBlock(0xc420b29860, 0xc420b388a0, 0x5, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, ...)
/home/lhy/go/src/github.com/cosmos/cosmos-sdk/vendor/github.com/tendermint/tendermint/state/execution.go:77 +0x111\ngithub.com/cosmos/cosmos-sdk/vendor/github.com/tendermint/tendermint/consensus.(*ConsensusState).finalizeCommit(0xc4200f0900, 0x1)
/home/lhy/go/src/github.com/cosmos/cosmos-sdk/vendor/github.com/tendermint/tendermint/consensus/state.go:1334 +0xa7f\ngithub.com/cosmos/cosmos-sdk/vendor/github.com/tendermint/tendermint/consensus.(*ConsensusState).tryFinalizeCommit(0xc4200f0900, 0x1)
/home/lhy/go/src/github.com/cosmos/cosmos-sdk/vendor/github.com/tendermint/tendermint/consensus/state.go:1265 +0x462\ngithub.com/cosmos/cosmos-sdk/vendor/github.com/tendermint/tendermint/consensus.(*ConsensusState).enterCommit.func1(0xc4200f0900, 0x5, 0x1)
/home/lhy/go/src/github.com/cosmos/cosmos-sdk/vendor/github.com/tendermint/tendermint/consensus/state.go:1213 +0x98\ngithub.com/cosmos/cosmos-sdk/vendor/github.com/tendermint/tendermint/consensus.(*ConsensusState).enterCommit(0xc4200f0900, 0x1, 0x5)
/home/lhy/go/src/github.com/cosmos/cosmos-sdk/vendor/github.com/tendermint/tendermint/consensus/state.go:1242 +0x72c\ngithub.com/cosmos/cosmos-sdk/vendor/github.com/tendermint/tendermint/consensus.(*ConsensusState).addVote(0xc4200f0900, 0xc422f8caa0, 0xc420a18360, 0x28, 0x1060690, 0xf5, 0x7fc4f3379000)
/home/lhy/go/src/github.com/cosmos/cosmos-sdk/vendor/github.com/tendermint/tendermint/consensus/state.go:1649 +0xb7f\ngithub.com/cosmos/cosmos-sdk/vendor/github.com/tendermint/tendermint/consensus.(*ConsensusState).tryAddVote(0xc4200f0900, 0xc422f8caa0, 0xc420a18360, 0x28, 0xc4208dc960, 0xc422f98200, 0xf5)
/home/lhy/go/src/github.com/cosmos/cosmos-sdk/vendor/github.com/tendermint/tendermint/consensus/state.go:1507 +0x59\ngithub.com/cosmos/cosmos-sdk/vendor/github.com/tendermint/tendermint/consensus.(*ConsensusState).handleMsg(0xc4200f0900, 0xdab4a0, 0xc422a4d888, 0xc420a18360, 0x28)
/home/lhy/go/src/github.com/cosmos/cosmos-sdk/vendor/github.com/tendermint/tendermint/consensus/state.go:659 +0x667\ngithub.com/cosmos/cosmos-sdk/vendor/github.com/tendermint/tendermint/consensus.(*ConsensusState).receiveRoutine(0xc4200f0900, 0x0)
/home/lhy/go/src/github.com/cosmos/cosmos-sdk/vendor/github.com/tendermint/tendermint/consensus/state.go:616 +0x6a2\ncreated by github.com/cosmos/cosmos-sdk/vendor/github.com/tendermint/tendermint/consensus.(*ConsensusState).OnStart
/home/lhy/go/src/github.com/cosmos/cosmos-sdk/vendor/github.com/tendermint/tendermint/consensus/state.go:310 +0x140\n"
node0 is the first proposer. We start byzantine nodes first, then when the proposer is started, it will include the received evidence into the first block.@cwgoes
Currently, tendermint does include evidence in blocks with the same height. If you think my fix is not the right approach, maybe we should change tendermint to ensure evidence will only be included in later blocks.
Thanks @HaoyangLiu - I'm trying to replicate this, working on a shell script at https://gist.github.com/cwgoes/3ab18585a72966bb9e68e8a9428a28b8 to make it easier.
So far I can replicate the Byzantine duplicate-voting but not the crash - the VoteInfos are processed before the evidence, as expected, and blocks continue onwards.
@cwgoes
Maybe it is much diffcult to reproduce this issue just by your script. Because, to reproduce this issue, we have to control the node starting order, as well as the validator order in genesis.json. It should be the same as the node names. Here the name order is node0, node1, node2, node3, node4. But the genesis.json which is generated by command gaiad init will make the order random.
Here I just created a repository to provide a easier way to reproduce.
https://github.com/HaoyangLiu/cosmos_issue_reproduction
Just clone this repository and run run.sh.
Thanks @HaoyangLiu, I can replicate the issue now. On the first block, it looks like the vote information sent by Tendermint is empty (which I suppose makes sense, since that's for the last commit) - but the SDK currently expects to have seen a validator signature in LastCommit before evidence can be submitted for the validator.
I wonder if this is also a problem in general - does Tendermint send evidence for e.g. double-signatures at the current height which the SDK will not necessarily have seen yet in the LastCommit vote info list?
Ah OK - we should set a SigningInfo for a validator whenever we instruct Tendermint to add that validator to the validator set, including at genesis. That should solve this issue, this was just subtly flawed design on the SDK end.
https://github.com/cosmos/cosmos-sdk/pull/2480 fixes already, trying to incorporate https://github.com/cosmos/cosmos-sdk/issues/1867#issuecomment-408957770 into that PR too.
Most helpful comment
@cwgoes we did some analysis and figured out double-sign evidence was included in the first block, then got broadcasted. However, there's mismatch to treat double-sign evidence. in tendermint the
infractionHeightis 1 but when slashing module process the evidence,ctx.BlockHeight()is still 0.We tried to solve the issue with this PR: https://github.com/irisnet/cosmos-sdk/pull/109
Hope to get more input from you.