Nano-node: Multithread signature_checker

Created on 18 Jan 2019 · 9Comments · Source: nanocurrency/nano-node

Create hardware_concurrency() number of threads and divide each signature set between the threads.

We probably don't want to divide trivial number of verifications, so some good lower-level cutoff where we'd only use one thread.

enhancement

Source

zhyatt

Most helpful comment

8 cores results with #1614 : 2x from 1000 (single) to 1001 (16 threads). Of course with the 256 batch size it's not using all of them.

Modified the cut-off to 100k just for testing and it yields a 7.5x speedup.

guilhermelawless on 23 Jan 2019

🚀2

All 9 comments

Added a pull request for this #1611, it's threading the nano::signature_checker::verify function based on std::thread::hardware_concurrency() number of threads.

Not sure what should be set as a trivial number of verifications, I just used the number of the many test case. Added a new test case as well with +1 verifications to trigger the multi-threaded version of the code.

Seeing around a 2x speedup on my 4 core laptop.

./core_test --gtest_filter=signature*
Running main() from gtest_main.cc
Note: Google Test filter = signature*
[==========] Running 4 tests from 1 test case.
[----------] Global test environment set-up.
[----------] 4 tests from signature_checker
[ RUN      ] signature_checker.empty
[       OK ] signature_checker.empty (0 ms)
[ RUN      ] signature_checker.many
[       OK ] signature_checker.many (35 ms)
[ RUN      ] signature_checker.many_threaded
[       OK ] signature_checker.many_threaded (17 ms)
[ RUN      ] signature_checker.one
[       OK ] signature_checker.one (0 ms)
[----------] 4 tests from signature_checker (52 ms total)

[----------] Global test environment tear-down
[==========] 4 tests from 1 test case ran. (52 ms total)
[  PASSED  ] 4 tests.

termoose on 21 Jan 2019

👍1

@termoose thanks for the effort! This is something we would like to get in soon, so let me know if you need any help with the PR remarks. In essence we want a thread pool, where each thread is operating on 256 batched chunks.

wezrule on 21 Jan 2019

Hi thanks for the feedback! Will run some benchmarks with 256 sized chunks and update the the branch after testing it out. What’s the most common case? >256 verifications or less? I want to put the most common case first so we can save some cycles on returning early from this function.

termoose on 21 Jan 2019

Hard to say. When bootstrapping from 0 there will be a lot, but in real-time probably not that many (at least yet). I think this task is to improve bootstrapping so I would give that priority.

wezrule on 21 Jan 2019

Closed the old PR, opened a new one #1614. It might be a bit faster than spawning threads each time, around 2.2x speedup on my 4 cores. We could experiment with the batch_size as well as the cut-off value for doing the single-threaded version. The current values seem fine from the tests I ran locally.

termoose on 22 Jan 2019

8 cores results with #1614 : 2x from 1000 (single) to 1001 (16 threads). Of course with the 256 batch size it's not using all of them.

Modified the cut-off to 100k just for testing and it yields a 7.5x speedup.

guilhermelawless on 23 Jan 2019

🚀2

@wezrule isn't this why we see only up to 2048 blocks at once?

guilhermelawless on 27 Jan 2019

@guilhermelawless yes probably, but I'm not changing it :)
Edit: It was changed in the end to 2048 * (num_threads + 1)

wezrule on 27 Jan 2019

😄1

Resolved with #1651

zhyatt on 29 Jan 2019