When a validator is removed from the validator set because it was on the cliff and is later readded to the voting set the SigningInfo is not reset and the validator is slashed as if it was not signing in the period in which it was not in the validator set.
https://figment.network/cosmos/hubble/gaia-7003/validators/02C231DA2DFF636671D3789B5D651AED09B1B834
Our secondary validator was on the cliff in block 2943 and replaced by another validator with more voting power. As a result it was removed from the active validator-set.
On block 10004 the first slashing period ended and inactive validators (those who really missed blocks) were dropped from the validator-set. As a result our validator was readded to the validator-set.
However the SigningInfo was not reset so StartHeight and SignedBlocksCounter were still in the state in which they were in block <2943.
That later triggered the slashing code since we obviously did not sign blocks while we were not in the active validator-set but the code still thought we started signing on height 0 (StartHeight).
That way the following condition was true even though:
There are many ways to solve this of which some keep the number of missed blocks and others drop it.
The easiest would be resetting the SigningInfo but that would allow Byzantine validators to reset their SigningState by redelegating between multiple validators and cliffing themselves which we obviously not want.
Another way would be tracking the LastSignedHeight and adapting StartHeight (possible causing writes on every block).
Since this is only a fraction of possible solutions and this certainly requires some discussion in the SDK team the Certus.One team decided to not create a PR and rather start a discussion about what would be the best solution.
Thanks for this bug report! We should investigate soon. CC @cwgoes @alexanderbez
Talking with @alexanderbez bout this now
I think that this is a general situation which applies to all validators who have left the validator set and are entering back into the validator set - not just the cliff validators.
We probably need to implement some a custom way of updating the SigningInfo if this situation occurs
The easiest would be resetting the SigningInfo but that would allow Byzantine validators to reset their SigningState by redelegating between multiple validators and cliffing themselves which we obviously not want.
I kind of get the gist of what you're saying, but do you want to elaborate on this a bit more?
Another way would be tracking the LastSignedHeight and adapting StartHeight (possible causing writes on every block).
Interesting idea, but indeed, as you pointed out, this would not be very efficient. Perhaps like @rigelrozanski we could have a special case of updating SigningInfo where if we know the validator was once in the set and we also know the last block that validator signed, we can use this information to update SigningInfo accordingly when said validator is added back into the set.
I think this specific reported problem is only an issue when a full SignedBlocksWindow hasn't yet passed.
In the usual case, the validator has been signing for awhile and signInfo.SignedBlocksCounter >> MinSignedPerWindow - regardless of how they dip in and out of the bonded set, the counter will just track whether they've signed at least MinSignedPerWindow of the last SignedBlocksWindow blocks they should have signed, which may not be contiguous.
However, if a full SignedBlocksWindow hasn't yet passed and the validator is unbonded for awhile, when they come back in to the bonded set the counter will only reflect (at best) the number of blocks that they could have signed so far.
I think there are two ways to fix this that preserve the semantics we want:
SignedBlocksWindow - MinSignedPerWindow, or initially set the counter to SignedBlocksWindow (assume the validator has signed some imaginary past blocks).height for the comparision (couldHaveSigned >= SignedBlocksWindow).The latter requires extra store operations, so I think the former is preferable.
I think we also need to perform additional logic upon entering/leaving the bonded validator set later on after unrevoking, so that validators have at least SignedBlocksWindow blocks to update the counter - if they cycle in and out of the bonded set after unrevoking, that might not happen since the height >= minHeight comparision only tracks the block height. Instead, perhaps, we could reset the counter (and make sure the jail period is greater than SignedBlocksWindow blocks) along with tracking missed instead of signed.
The first thing we should do here (after deciding upon an approach) is update the spec.
I'm going to try to deal with this before Game of Stakes.
Latest approach outlined at https://github.com/cosmos/cosmos-sdk/pull/2480#issue-222275216
Most helpful comment
I think this specific reported problem is only an issue when a full
SignedBlocksWindowhasn't yet passed.In the usual case, the validator has been signing for awhile and
signInfo.SignedBlocksCounter >> MinSignedPerWindow- regardless of how they dip in and out of the bonded set, the counter will just track whether they've signed at leastMinSignedPerWindowof the lastSignedBlocksWindowblocks they should have signed, which may not be contiguous.However, if a full
SignedBlocksWindowhasn't yet passed and the validator is unbonded for awhile, when they come back in to the bonded set the counter will only reflect (at best) the number of blocks that they could have signed so far.I think there are two ways to fix this that preserve the semantics we want:
SignedBlocksWindow - MinSignedPerWindow, or initially set the counter toSignedBlocksWindow(assume the validator has signed some imaginary past blocks).heightfor the comparision (couldHaveSigned >= SignedBlocksWindow).The latter requires extra store operations, so I think the former is preferable.
I think we also need to perform additional logic upon entering/leaving the bonded validator set later on after unrevoking, so that validators have at least
SignedBlocksWindowblocks to update the counter - if they cycle in and out of the bonded set after unrevoking, that might not happen since theheight >= minHeightcomparision only tracks the block height. Instead, perhaps, we could reset the counter (and make sure the jail period is greater thanSignedBlocksWindowblocks) along with tracking missed instead of signed.The first thing we should do here (after deciding upon an approach) is update the spec.