Enable validators to enter "maintenance mode" to avoid missed blocks and/or slashing during planned downtime, effectively withdrawing from the validator set for a period of time.
This would be accomplished by submitting a Tx stating that validator is entering maintenance mode and wishes to be removed from the validator set. A different Tx is submitted once the validator wishes to exit maintenance mode.
Today, validators have no way of signalling an intention to go offline. This seems desirable to reduce block times/timeouts and also to avoid accidental slashing during maintenance.
Thanks @mdyring, interesting proposal! It's not immediately obvious to me if there any safety concerns to be had with this feature or about the technical difficulty of its implementation. It almost sounds like you wish to just temporarily unbond and then rebond?
It almost sounds like you wish to just temporarily unbond and then rebond?
Yes, similar to validator unbonding and rebonding, but without loosing delegations in the process. :-)
I think disabling downtime slashing for a defined period is probably safe - though perhaps some fee should be paid - disabling slashing is not safe / not applicable (see https://github.com/tendermint/tendermint/issues/3244).
I don't entirely understand the benefit, though - @mdyring can you elucidate? At the moment downtime slashing is pretty lenient, validators can be down for hours with no penalties. Are you intending to perform particularly time-intensive maintenance?
@cwgoes agree this should only affect downtime slashing. Some advantages I see:
1) Improves determinism re. blocktimes. We won't be timing out waiting for validators in maintenance mode, which I'd expected to be much more common than outages.
2) Will allow "standby" validators not in validator set to take over responsibility while maintenance is ongoing.
3) Although minor, avoid slashing in case of extended maintenance.
4) Missed blocks in various explorers will become a better metric for reliability/availability. For instance, all our missed blocks were due to planned maintenance so far, which hurts our eyes. Calling Mr. Vain. ;-)
Seems reasonable, Another point however, is that when the validator is in maintenance mode they would have to be removed from the "top 100" validator set and stop collecting any rewards for themselves or their delegators.
I don't think a fee is required, additionally no need to get slashed because you're no longer a part of the validator set. You're really just artificially pausing your validator, it could be for any reason whatsoever - hence it would make sense to send a description of why your paused your validator to the delegators so they don't worry like - for ex. you could include "maintenance for 2 hours" with your pause transaction.
@rigelrozanski yep, idea is that a validator is removed from the active validator set temporarily (similar to jailing, just voluntary).
I think a validator should still be slashable for double signing, even if maintenance mode, since otherwise it could be come a strategy to enter maintenance mode immediately after double singning (but before evidence is posted).
oh of course, during maintenance more you're effectively entering unbonding... This is actually identical in protocol to a voluntary jail (then unjail), so the mechanism is already here, we just need to implement the voluntary jail tx and change the name so people don't get the wrong idea
Maintenance mode could be a very useful primitive when facing validator compromize. Please check https://forum.cosmos.network/t/validator-node-compromize-analysis/2321
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.
Most helpful comment
@cwgoes agree this should only affect downtime slashing. Some advantages I see:
1) Improves determinism re. blocktimes. We won't be timing out waiting for validators in maintenance mode, which I'd expected to be much more common than outages.
2) Will allow "standby" validators not in validator set to take over responsibility while maintenance is ongoing.
3) Although minor, avoid slashing in case of extended maintenance.
4) Missed blocks in various explorers will become a better metric for reliability/availability. For instance, all our missed blocks were due to planned maintenance so far, which hurts our eyes. Calling Mr. Vain. ;-)