Neo: Consensus. Limit the maximum time between view changes

Created on 10 Sep 2019  路  14Comments  路  Source: neo-project/neo

Summary
Limit TimeSpan between view changes, currently this counter has an exponential increase with each view change. The function of this is to give reasonable time to the consensus nodes to receive all pending transactions and messages.

The problem is that after a certain change of view this time is excessive and instead of helping, it delays the consensus recovery process.

This is the current increase.

View: 0 - TimeSpan: 15  - Total delay: 15
View: 1 - TimeSpan: 30  - Total delay: 45
View: 2 - TimeSpan: 60  - Total delay: 105
View: 3 - TimeSpan: 120 - Total delay: 225
View: 4 - TimeSpan: 240 - Total delay: 465
View: 5 - TimeSpan: 480 - Total delay: 945
View: 6 - TimeSpan: 960 - Total delay: 1905
...

Do you have any solution you want to propose?
I think we don't need to increase TimeSpan after the fifth ChangeView setting a maximum of 8 minutes between view changes.

This small change can help a lot, but like all changes in consensus should be treated carefully.

Where in software does this update applies to?

  • Consensus
discussion

Most helpful comment

If there is a fixed timeout 16 minutes, let's see:

view 0: 30s
view 1: 1m
view 2: 2m
view 3: 4m
view 4: 8m
view 5: 16m
view 6: 16m
.......
view n: 16m

A view 0 node can never catch up with a view 5 node.

All 14 comments

Agree, there are no sense for wait half hour. This time must have a max.

I am not in favor of this right now.
This breaks liveness, a basic principle of our consensus.

In the future, as consequence of other improvements that are on the way, we may be able to apply better change view strategies.

In the future, as consequence of other improvements that are on the way, we may be able to apply better change view strategies.

This is compatible with all of other improvements, what is the sense of wait 2 days if the view was increased too much?

I'm in favor of this. It does not make sense to wait an unlimited amount of time, but maybe limit it to 8 minutes may not be the best option too. Maybe after 8 minutes, it should increase one minute at time? What do you think?

How many times until nowadays that consensus passed view 6+ on Mainnet or Testnet?

Agree with @vncoelho. It breaks liveness.

But only if we put a hard limit, right?
I'm not in favor of a hard limit, but we could use some different math when the value is too high, or is there a reason why the math is done that way? Maybe make it linear increase instead of exponential

I don't see how it breaks liveness more than increase it. Maybe we can increment TimeSpan in a way that is not exponential.
It is more important when the nodes that are failing are consecutive in the consensus order.

In the past we have seen some excessive changes of view, I am trying to collect some example but I do not have the Logs.

In order to ensure liveness, the nodes that changes views later must have the ability to catch up with the nodes that change views first. Therefore, the nodes that change views first must change slower, and the nodes that change views must change faster, so that there is a possibility of catching up. If there is a limit to the timeout for view changing, it means that the restarted node can never catch up with other nodes. This breaks liveness.

So the increase must be exponential?

After reading your comments it may not be a good change, I think this discussion does not contribute much at this time. We can close the issue and reopen it in the future if necessary.

In order to ensure liveness, the nodes that changes views later must have the ability to catch up with the nodes that change views first. Therefore, the nodes that change views first must change slower, and the nodes that change views must change faster, so that there is a possibility of catching up. If there is a limit to the timeout for view changing, it means that the restarted node can never catch up with other nodes. This breaks liveness.

I think that this could happen in 10 minutes max as exponential. I still doesn't understanding the problem with a max time. Could you provide one example when a max time is worse than a exponential?

If there is a fixed timeout 16 minutes, let's see:

view 0: 30s
view 1: 1m
view 2: 2m
view 3: 4m
view 4: 8m
view 5: 16m
view 6: 16m
.......
view n: 16m

A view 0 node can never catch up with a view 5 node.

mmm Now i understand the problem! 馃拑 You are right ! thanks for the explanation... we discard the packets by view number, so we require to be in the same view, not in the same "time" ... @belane you can close it

Was this page helpful?
0 / 5 - 0 ratings

Related issues

lock9 picture lock9  路  62Comments

SueNEO picture SueNEO  路  30Comments

erikzhang picture erikzhang  路  72Comments

erikzhang picture erikzhang  路  31Comments

Tommo-L picture Tommo-L  路  30Comments