Describe the bug
var start_index = Math.Max(StateRootSentIndex, payload.StartIndex);
var count = Math.Min(payload.Count, StateRootsPayload.MaxStateRootsCount);
var end_index = payload.StartIndex + count;
if (end_index <= start_index) return;
It looks like StateRootSentIndex is updated after replying to "getroots".However, how will be the reply to other node that request roots?
Each RemoteNode has a StateRootSentIndex, they are independent of each other.
I see, but if messages are lost between peers due to shalldrop?
It can connect some new nodes to re-request the state root. It seems no big problem.
It is mostly a problem on privatenets and shared privatenets, @Tommo-L, where a have a limited and fixed connections.
It is also an impact on testnet and mainnet.
Got, maybe it'll be better to decied by the receiver?
I think so, maybe we should have a timer.
If the timer expires it restarts StateRootSentIndex with the LastBlockIndex from that session.
What do you think? Do we have access to LastBlockIndex of the session on the ProtocolHandler?
Another configuration is private readonly uint MaxRootCacheCount = 1000;. Maybe on privatenets we should be able to increase this limit because we usually deal with faster block times. Thus, it could be on some easier configuration scheme.
I think Neo3 will have a different synchronization mechanism for state root. 馃槃
Aehauehahuea, yes, maybe yes.
I was talking with @igormcoelho before opening this issue for trying to fix this.
However, we still need support for Neo 2x for 1 year maybe. In this sense, we plan to leave NeoCompiler Eco V2 running until that happens.
This sync problem is forcing us to take manual fix on our infrastructure...aehauhuea
And I believe it will also improve mainnet behavior a little bit until 2x becomes inactive.
@KickSeason can help fix it later
Sounds good, count with me as well. Take a look with CN's maintainers as well if any sync problem with state root has been happening. This will be another motivation for us to move this forward quicker.
@Tommo-L, we conducted some experiments and created a modified client.
It looks like the problem of state root syncing was fixed for us.
However, we modified both the StateRootSentIndex and increased MaxRootCacheCount;.
We will analyze the results after 7 days running the network with 1s and then we will proceed with others tests without changing on the cache size MaxRootCacheCount;
@Tommo-L, state root sync issues have been solved by following the aforementioned methodology.
We have run a share privatenet for 4 days running with a 1s network under load, no state root sync problem was detected.
Thanks @vncoelho , and we can close this issue now.