Describe the bug
Running NEO3 testnet for about 6 hours, CN3(index=3) node hadn't open wallet so that every 7 blocks when CN3 is primary, recovery will be sent and then CN2(index=2) will be the Primary to generate block and go on. However, there's a time when Primary is turn to CN2 from CN3, because of something unknown, CN2 is time out then view changed to CN3 again. After that, there're always recovery message received and view hasn't changed anymore. The next day, when we restart all consensus nodes, it returns null Payloads on neo\neo\Consensus\ConsensusService.cs:line 149
To Reproduce
Steps to reproduce the behavior:
Expected behavior
Private net was broken...
Screenshots

Platform:
The issue block is on 1326
You guys are testing everything. hauaahah
Congratulations.
I think there are two issues.
The first is that the consensus stops. From the log point of view, the consensus node seems to be split into two groups, 0, 4, 5 and 1, 2, 6. It looks a bit like a fork.
The second is that restarting the consensus node will cause a NullReferenceException on recovery. This has been fixed in #1083.
@neo-project/core Please check the first issue.
@erikzhang, an interesting thing was that one of the recovery payloads had a changeview that was not valid, because there was a print saying 3/4.
I am having some difficulties for opening the logs. I just saw briefly on @superboyiii 's computer
@superboyiii @cloud8little @erikzhang,
First of all, thanks for you help and attention for providing the logs.
Perhaps one of the possible errors can be seen here:
[Info][17:47:20.753] send commit (C2)
[Info][17:47:20.787] OnCommitReceived: height=1326 view=2 index=2 nc=1 nf=1 (C3)
[Info][17:47:20.797] OnCommitReceived: height=1326 view=2 index=0 nc=2 nf=1 (C1)
[Info][17:47:20.825] OnCommitReceived: height=1326 view=2 index=6 nc=2 nf=1 (C7)
[Info][17:47:20.837] OnCommitReceived: height=1326 view=2 index=5 nc=3 nf=1 (C6)
[Info][17:47:20.925] OnCommitReceived: height=1326 view=2 index=0 nc=3 nf=1
[Info][17:47:21.040] OnCommitReceived: height=1326 view=2 index=5 nc=3 nf=1
[Info][17:47:21.041] OnPrepareResponseReceived: height=1326 view=2 index=4
[Info][17:47:21.092] OnCommitReceived: height=1326 view=2 index=4 nc=3 nf=1 (C5)
Take a look at this, we already have 6 commits! aheuahuea
Block should have been produced.
For some reason the checkcommits() is not being triggered or not passing.
There is something really strange going on, the logs also keep processing the commit payloads...
We should improve the logs.
Since the OnCommitReceived message is still hapenning, it makes sense that the following situation is not !=null:
ref ConsensusPayload existingCommitPayload = ref context.CommitPayloads[payload.ValidatorIndex];
if (existingCommitPayload != null)
{
if (existingCommitPayload.Hash != payload.Hash)
Log($"{nameof(OnCommitReceived)}: different commit from validator! height={payload.BlockIndex} index={payload.ValidatorIndex} view={commit.ViewNumber} existingView={existingCommitPayload.ConsensusMessage.ViewNumber}", LogLevel.Warning);
return;
}
For some reason the payloads are not being stored.
Something weird, in CN3's logs

But if is watchOnly, this couldn't happen...

Also i can't find any height=1326 view=0 index=1 role=Primary :S
@shargon Owen means index=3, which actually is this node 2019-09-03(c4).log.
How many seconds between blocks?
@shargon 15secs
@shargon This is already fixed in https://github.com/neo-project/neo/pull/1083. We're making more test on this PR to ensure no addtional issues appear.
@superboyiii both issues?
@superboyiii both issues?
Yes, both.
And do you know why this happens?
https://github.com/neo-project/neo/issues/1084#issuecomment-529792093
@superboyiii how can we ensure this has been solved? Are there steps we can follow to reproduce this error? Can it be reproduced using unit tests?
It has been fixed, I think. The cause of this bug has been found.
Most helpful comment
Yes, both.