Neo: Limit KnownHashes size

Created on 6 Dec 2018 · 13Comments · Source: neo-project/neo

Currently, it looks like that the HashSet<UInt256> knownHashes has no size limit.
https://github.com/neo-project/neo/blob/7883e587a9448330d21669ebe748402871da3a50/neo/Network/P2P/TaskManager.cs#L27

In my local experiment it quickly reaches huge sizes.
Maybe it can reach something around 1M+ in MainNet operation.
Is it a problem to the client or it is expected to handle it in a smooth manner?

If we decide to limit it, which kind of representation do you recommend?

discussion

Source

vncoelho

👍1

Most helpful comment

Accidentally closed, heh; I really wish the mobile interface would ask ‘are you sure’ if you accidentally hit 'close and comment'.

jsolman on 9 Dec 2018

😄2

All 13 comments

Well observed brother... I don't know exactly how is this Set managed, but it looks like a typical application of Bloom Filters. Using these, these millions of tx could be represented with few bytes. If that's the case, the risk of false positives could be managed in two ways: (i) giving transparency of the table to the public so users would avoid such false positives (which is hard because no one knows consensus nodes) (ii) by attaching quick expiring on transactions and also the current timestamp, so tx hashes will always be constantly refreshed (even if one hit happens, it's very hard to happen multiple times).

igormcoelho on 7 Dec 2018

False positives are not acceptable for this as it could block transactions erroneously, so a bloom filter is not a good choice. An LRU cache structure could work. Imagine a false positive on a block hash; the node would never be able to receive that block until restarted...

jsolman on 7 Dec 2018

👍1

The best way is a stored list in the chain

shargon on 7 Dec 2018

It’s fine if the structure allowed a duplicate because it will check it against leveldb later and see it as a dupicate and reject it. For instance, after a node starts, knownHashes is always empty, and it may receive a hash already in the chain and have to incur a disk read to discover it is already there. So it’s fine if an expiring cache is used here; the whole purpose to have knownHashes here AFAIK is just to improve performance (not waste time when receiving the same hash again).

jsolman on 9 Dec 2018

I agree, @igor, @shargon, @jsolman .
All 3 points are exactly good solutions.

ps: perhaps some less exact (deterministic). ahauahahaia

vncoelho on 9 Dec 2018

@vncoelho
I appreciate everyone’s ideas and input; however, I am still not seeing how a bloom filter would be a good solution. We need the opposite of a bloom filter, because we can tolerate false negatives but not false positives.

jsolman on 9 Dec 2018

Accidentally closed, heh; I really wish the mobile interface would ask ‘are you sure’ if you accidentally hit 'close and comment'.

jsolman on 9 Dec 2018

😄2

Men, that is chaos theory, ahauahhaa
I am kidding, Jeff, we need to think about that carefully, it is an idea that we like to think.

For a real implementation we need to check non-dominated solutions and the trade-off.....ahauaha

vncoelho on 9 Dec 2018

I guess we could use a bloom filter, but if it says it is already there, would need to double check that it really is already there by checking if the blockchain knows about the hash either in the mempool or as a persisted block or in unverified blocks.

jsolman on 10 Dec 2018

👍1

I agree that Bloom Filter may not be ideal,but we can manage it. Anyway, we can try to find another data structure that is the opposite of BF, because double checking every positive is lots of waste of computing power :)

igormcoelho on 11 Dec 2018

That is great, @igormcoelho.
Let's think about it.

A simple idea is just to limit the KnowHashes HashSet to a given size, because after some minutes old tasks will not arrive anymore.
Since most of the clients are reestarted and, even with the current KnowHashses, they do not consider the whole history of tasks. I believe it is a tool more designed for speed up in normal operation and not for attack/spam cases.

vncoelho on 11 Dec 2018

Well closed, thanks to everyone that investigated and performed benchmarks for finding the most adequated solution.

vncoelho on 8 Apr 2019

Was this page helpful?

0 / 5 - 0 ratings

Related issues

Export contract storage to disk

igormcoelho · 4Comments

NEP7 break the chain without versioning

canesin · 3Comments

Lightweight state checksums in blocks

realloc · 4Comments

Avoid conditions with one line

shargon · 3Comments

Spam Attack in current Oracle design

doubiliu · 3Comments