Neo: Seed nodes are still lagging behind

Created on 5 Mar 2019 · 20Comments · Source: neo-project/neo

Even in local private nets Seed nodes are not being able to follow CN relayed blocks.

This is a critical issue that needs to be tracked.
An effort should be made for improving TaskManager and ProtocolHandler classes.

bug critical

Source

vncoelho

👍1

All 20 comments

I am still planning to work on this during the next 3 weeks.

jsolman on 5 Mar 2019

❤1

This issue is probably a duplicate of #542

jsolman on 5 Mar 2019

Let's try to unify and list all possible open issues and ideas, there is also #522 and #366.

We should identify a line future directions for improving the P2P.

vncoelho on 5 Mar 2019

Also, I have one simple change that improves it, that I may put a PR out for first; if you are observing seed nodes not following in a private network and you are running fast blocks you will need to change the code that relays last 2 blocks when receiving multiple blocks to relaying like the last 10 instead.

jsolman on 5 Mar 2019

👀1 👍1

https://github.com/neo-project/neo/blob/6f5d0314acfcf0cb6243638b89f437c8884d5bae/neo/Ledger/Blockchain.cs#L324

just change this line to:

if (blocksPersisted++ < blocksToPersistList.Count - 10) continue;

jsolman on 5 Mar 2019

This doesn't really matter when running slower blocks though or in a bigger network. In a really small private network that runs blocks at high speed. you need it though.

jsolman on 5 Mar 2019

👍1

Let's create a formulate based on seconds in order to make it more generic.

Another thing is that StartHeight, as we were just discussing in that other thread.

vncoelho on 5 Mar 2019

@vncoelho Sounds good. Maybe you could create the PR that uses a formula for this based on block time and I will review it.

jsolman on 5 Mar 2019

I'm pretty sure that will fix your seed node lagging issue. Even without needing #522 . I already tested it locally when I was running fast blocks and it fixed the issue for me.

jsolman on 5 Mar 2019

A draft was created here https://github.com/neo-project/neo/pull/621
Take a look and fell free to adjust.

vncoelho on 5 Mar 2019

Jeff, even with PR 621 we will not really solve the whole issue, because once the node is lagged we still have problems in getting some blocks that were previously lost.
That will help but not really solve the Blocks Request Communication Procedure.

vncoelho on 5 Mar 2019

621 should help. Have you observed it still have a problem with #621 ? Maybe before I issue the PR with improved `TaskManager` I will propose a simpler modification first to ensure that it will be able to request missed blocks from a node it is currently connected to for some time. We can adjust so that a node will be able to know the height of all its connected peers instead of just their starting height. It wouldn’t be necessary if it was guaranteed that it would never skip sending inv messages, but since the code doesn’t have that guarantee currently, it will help in that rare scenario.

jsolman on 6 Mar 2019

One way to do it would be to have the response to inv messages contain the responders node height.

jsolman on 6 Mar 2019

@jsolman did you have time to progress on this?
I made a PR for a small mechanism for updating heights #673 What's is equally useful is requesting data a second time (incase it was corrupt, got lost or whatever) without having to reconnect seems like a must to me. I think there should be a minimum interval on requesting the same data, not a hard deny. This timeout can be as simple as a periodic task that cleans up the list of historically requested hashes. This also prevents the historical hash list to grow potentially huge (if a client syncs from 0 and never disconnects).

ixje on 1 Apr 2019

I have not started work for #542 yet.

jsolman on 3 Apr 2019

@vncoelho Have you observed nodes still getting behind in any of your privatenets still that wasn’t because of default max connections of 3?

jsolman on 3 Apr 2019

is currently happening?

shargon on 10 May 2019

Yes, we still need to optimize connection/disconnection and blocks requests.

vncoelho on 10 May 2019

Is already solved?

shargon on 12 Sep 2019

@shargon, perhaps it has been solved.
In the last 1 month we have monitored our nodes and the behavior was solved with the improvements on the P2P.

It still happens but now the node catchs up and syncs again.
We expect an even better performance after #1397

vncoelho on 9 Jan 2020

Was this page helpful?

0 / 5 - 0 ratings

Related issues

Limit attacks by malicious nodes

Tommo-L · 4Comments

Isolate unit tests

shargon · 4Comments

auto-claim gas on tx submission

igormcoelho · 3Comments

On the way to dBFT 3.0

vncoelho · 4Comments

Release v2.10.3

vncoelho · 3Comments