Neo: Limit KnownHashes size

Created on 6 Dec 2018  Â·  13Comments  Â·  Source: neo-project/neo

Currently, it looks like that the HashSet<UInt256> knownHashes has no size limit.
https://github.com/neo-project/neo/blob/7883e587a9448330d21669ebe748402871da3a50/neo/Network/P2P/TaskManager.cs#L27

In my local experiment it quickly reaches huge sizes.
Maybe it can reach something around 1M+ in MainNet operation.
Is it a problem to the client or it is expected to handle it in a smooth manner?

If we decide to limit it, which kind of representation do you recommend?

discussion

Most helpful comment

Accidentally closed, heh; I really wish the mobile interface would ask ‘are you sure’ if you accidentally hit 'close and comment'.

All 13 comments

Well observed brother... I don't know exactly how is this Set managed, but it looks like a typical application of Bloom Filters. Using these, these millions of tx could be represented with few bytes. If that's the case, the risk of false positives could be managed in two ways: (i) giving transparency of the table to the public so users would avoid such false positives (which is hard because no one knows consensus nodes) (ii) by attaching quick expiring on transactions and also the current timestamp, so tx hashes will always be constantly refreshed (even if one hit happens, it's very hard to happen multiple times).

False positives are not acceptable for this as it could block transactions erroneously, so a bloom filter is not a good choice. An LRU cache structure could work. Imagine a false positive on a block hash; the node would never be able to receive that block until restarted...

The best way is a stored list in the chain

It’s fine if the structure allowed a duplicate because it will check it against leveldb later and see it as a dupicate and reject it. For instance, after a node starts, knownHashes is always empty, and it may receive a hash already in the chain and have to incur a disk read to discover it is already there. So it’s fine if an expiring cache is used here; the whole purpose to have knownHashes here AFAIK is just to improve performance (not waste time when receiving the same hash again).

I agree, @igor, @shargon, @jsolman .
All 3 points are exactly good solutions.

ps: perhaps some less exact (deterministic). ahauahahaia

@vncoelho
I appreciate everyone’s ideas and input; however, I am still not seeing how a bloom filter would be a good solution. We need the opposite of a bloom filter, because we can tolerate false negatives but not false positives.

@vncoelho
I appreciate everyone’s ideas and input; however, I am still not seeing how a bloom filter would be a good solution. We need the opposite of a bloom filter, because we can tolerate false negatives but not false positives.

Accidentally closed, heh; I really wish the mobile interface would ask ‘are you sure’ if you accidentally hit 'close and comment'.

Men, that is chaos theory, ahauahhaa
I am kidding, Jeff, we need to think about that carefully, it is an idea that we like to think.

For a real implementation we need to check non-dominated solutions and the trade-off.....ahauaha

I guess we could use a bloom filter, but if it says it is already there, would need to double check that it really is already there by checking if the blockchain knows about the hash either in the mempool or as a persisted block or in unverified blocks.

I agree that Bloom Filter may not be ideal,but we can manage it. Anyway, we can try to find another data structure that is the opposite of BF, because double checking every positive is lots of waste of computing power :)

That is great, @igormcoelho.
Let's think about it.

A simple idea is just to limit the KnowHashes HashSet to a given size, because after some minutes old tasks will not arrive anymore.
Since most of the clients are reestarted and, even with the current KnowHashses, they do not consider the whole history of tasks. I believe it is a tool more designed for speed up in normal operation and not for attack/spam cases.

Well closed, thanks to everyone that investigated and performed benchmarks for finding the most adequated solution.

Was this page helpful?
0 / 5 - 0 ratings

Related issues

igormcoelho picture igormcoelho  Â·  4Comments

canesin picture canesin  Â·  3Comments

realloc picture realloc  Â·  4Comments

shargon picture shargon  Â·  3Comments

doubiliu picture doubiliu  Â·  3Comments