Hi, I noticed there were time gap between srml/staking and srml/session for validators query.
. However, session/Validators still stayed as total 4 validators
So there were time gap between above 3 and 4. That will result in direct slashing at the very beginning for the two new validators.
See logs below:
2019-09-05 10:35:36 //////on_before_session_ending function current elected length 6, keys length 4
2019-09-05 10:35:36 //////report_offence function offenders count 2, validator set 4
2019-09-05 10:35:36 /////on_offence function current era 6
2019-09-05 10:35:36 /////on offence slash_exposure9999999999327072
2019-09-05 10:35:36 /////slash validator total slash 374999999974765
2019-09-05 10:35:36 /////slash validator own_slash 374999999974765
2019-09-05 10:35:36 /////on offence slash_exposure9999999999327072
2019-09-05 10:35:36 /////slash validator total slash 374999999974765
2019-09-05 10:35:36 /////slash validator own_slash 374999999974765
I think it's unreasonable since the two new validators got slashing at the very beginning, but they did nothing wrong.
Is the time gap for validator count between staking and session module as expected?
Is the using previous im-online record for the new round of era at which new nodes may join reasonable?
Thanks.
I think this is a serious problem. Any new validators should not be slashed at the very beginning of their validating era, but it happens even with the latest commit.
@leahnoah I will look into this today.
With a moderate inspection I can tell:
Is the time gap for validator count between staking and session module as expected?
I am not aware of the latest changes in session but from the looks of it, yes. Look at the code in pub fn rotate_session() in session, only the QueuedKeys<T> is updated with the new validators (coming from T::OnSessionEnding::on_session_ending()), not the Validators<T>. Validators<T> is updated from the previous QueuedKeys<T>.
I think it's unreasonable since the two new validators got slashing at the very beginning, but they did nothing wrong.
Indeed, I couldn't figure out how this (Updating Validators<T> with one session delay) causes a slashing to happen but if true, then it is for sure wrong. This dooms every new validator to start its round with being slashed. cc @tomusdrw @cmichi
Is the using previous im-online record for the new round of era at which new nodes may join reasonable?
I can't understand the link to i'm online. Can you elaborate?
Would also be helpful to share more of your chain-spec and execution detail.
hmm @tomusdrw is this related?
https://github.com/paritytech/substrate/blob/5420de3face1349a97eb954ae71c5b0b940c31de/srml/im-online/src/lib.rs#L455-L458
@leahnoah @kianenigma Does removing the ImOnline module from your runtime lead to the issue being fixed?
@kianenigma
The link to im-online:
Actually, what I mean is:
/// The current set of keys that may issue a heartbeat.
Keys get(keys): Vec<T::AuthorityId>;
It's actually what you said "fn rotate_session() in session, only the QueuedKeysT::OnSessionEnding::on_session_ending()), not the Validators<T>. Validators<T> is updated from the previous QueuedKeys<T>."
I found this issue with my own chain, then I tried Substrate itself, the same too. The chain-spec is only modified with the initial_authorities.
The run cmd is:
$ nohup ./target/release/substrate --pruning archive --validator --keystore-path node1_keystore --chain=staging --rpc-port=9933 -d ./tmp/node1 --ws-external --name node1 --rpc-cors all --rpc-external > nohup1.out 2>&1 &
......
nohup ./target/release/substrate --validator --keystore-path node6'_keystore' --chain=staging -d ./tmp/node6 --bootnodes /ip4/127.0.0.1/tcp/30333/p2p/QmTk3WbKzScQjXe5SdXoPMkBNzwSom1GPr8Yw6i3scUxfJ --port 30338 --name node6 --rpc-cors all --ws-external --rpc-external --rpc-port=9938 > nohup6.out 2>&1 &
The execution details:
@leahnoah @kianenigma Does removing the
ImOnlinemodule from your runtime lead to the issue being fixed?
@rphmeier I tried removing all the storage/event/module parts of im-online module, only keeping the AuthorityId part. Yes, there's no slash at new validators' beginning round. However, this leaded to another problem: no slashing any more, even I killed the two two new validators, in which situation the two new validators should be slashed.
The issue of slashing validators was with im online and I believe @tomusdrw will soon open a fix PR. Thanks for the report :)