After the objective "Improve transaction processing efficiency" is completed, benchmark the processing time of all transaction types and compare the results with Lisk Core 1.6.0.
2.0
These tests have been performed by executing stress test transactions type 0 tests from lisk-core-qa repository and changing stressAmount variable which corresponds to Tx sent column so the tx_sent/tx_block is always 20.
Machine: Macbook Pro 15', 2018 16GB RAM, 2.2Ghz Intel Core i7
Network: devnet (local)
Versions: 1.5, 2.0
I couldn't perform the tests on 1.6 because New Relic is not able to instrument for some reasons. I will take a look in the future.
Transaction Pool settings used during all tests (v2.0) to keep it consistent:
*MAX_TRANSACTIONS_PER_BLOCK corresponds to the values of the second column
2.0| | | | | | | | | delegatesNextForge (ms) | | | |
|---------|----------|------|---------------|---------|----------------|----------------|--------------|-------------------------|-------|------|------|
| Tx sent | Tx/block | Tx/s | Slot time (s) | Version | #Slots missed | #Forged blocks | #Full blocks | min | max | avg | SD |
| 500 | 25 | 2.5 | 10 | 2.0 | 0 | 20 | 20 | 0.473 | 1360 | 58.2 | 179 |
| 2000 | 100 | 10 | 10 | 2.0 | 0 | 20 | 20 | 0.143 | 6020 | 411 | 968 |
| 5000 | 250 | 25 | 10 | 2.0 | 2 | 20 | 20 | 0.113 | 16900 | 794 | 3420 |
| 10000 | 500 | 50 | 10 | 2.0 | 20 | 20 | 19 | 0.118 | 35900 | 1030 | 4630 |
| 20000 | 1000 | 100 | 10 | 2.0 | 11 | 4 | 2 | 0.517 | 45000 | 266 | 3040 |
| 100000 | 5000 | 500 | 10 | 2.0 | 2 | 4 | 0 | 0.460 | 74000 | 414 | 4620 |
*Standard Deviation
1.5| | | | | | | | | delegatesNextForge (ms) | | | |
|---------|----------|------|---------------|---------|----------------|----------------|--------------|-------------------------|-------|------|------|
| Tx sent | Tx/block | Tx/s | Slot time (s) | Version | #Slots missed | #Forged blocks | #Full blocks | min | max | avg | SD |
| 500 | 25 | 2.5 | 10 | 1.5 | 0 | 20 | 20 | 0.348 | 931 | 94.3 | 245 |
| 2000 | 100 | 10 | 10 | 1.5 | 0 | 20 | 20 | 0.319 | 8750 | 486 | 1220 |
| 5000 | 250 | 25 | 10 | 1.5 | 8 | 20 | 20 | 0.297 | 21900 | 754 | 1460 |
| 10000 | 500 | 50 | 10 | 1.5 | 20 | 11 | 11 | 0.371 | 44900 | 593 | 4050 |
| 20000 | 1000 | 100 | 10 | 1.5 | 4 | 5 | 4 | 0.35 | 62800 | 400 | 4470 |
| 100000 | 5000 | 500 | 10 | 1.5 | - | 0 | - | 0.342 | 53.5 | 6.93 | 8.04 |
On 1.5 when transactions per block were <= 250 results were similar, however, blocks with over 250 transactions performed much worse, missing slots more often and missing a greater amount of transactions than in 2.0. With 5000 tx/block the application didn't manage to fill any block.
Here's a comparison of the block table between 1.5 and 2.0:
| | | | |
|---------|-----------|--------|----------------------|
| version | timestamp | height | numberOfTransactions |
| 1 | 94837200 | 11 | 250 |
| 1 | 94837230 | 12 | 250 |
| 1 | 94837250 | 13 | 250 |
| 1 | 94837270 | 14 | 250 |
| 1 | 94837280 | 15 | 250 |
| 1 | 94837300 | 16 | 250 |
| 1 | 94837310 | 17 | 250 |
| 1 | 94837320 | 18 | 250 |
| 1 | 94837330 | 19 | 250 |
| 1 | 94837350 | 20 | 250 |
| 1 | 94837360 | 21 | 250 |
| 1 | 94837370 | 22 | 250 |
| 1 | 94837380 | 23 | 250 |
| 1 | 94837400 | 24 | 250 |
| 1 | 94837410 | 25 | 250 |
| 1 | 94837420 | 26 | 250 |
| 1 | 94837430 | 27 | 250 |
| 1 | 94837450 | 28 | 250 |
| 1 | 94837460 | 29 | 250 |
| 1 | 94837470 | 30 | 250 |
| | | | |
|---------|-----------|--------|----------------------|
| version | timestamp | height | numberOfTransactions |
| 1 | 94839580 | 31 | 250 |
| 1 | 94839590 | 32 | 250 |
| 1 | 94839610 | 33 | 250 |
| 1 | 94839630 | 34 | 250 |
| 1 | 94839640 | 35 | 250 |
| 1 | 94839650 | 36 | 250 |
| 1 | 94839660 | 37 | 250 |
| 1 | 94839670 | 38 | 250 |
| 1 | 94839680 | 39 | 250 |
| 1 | 94839690 | 40 | 250 |
| 1 | 94839700 | 41 | 250 |
| 1 | 94839710 | 42 | 250 |
| 1 | 94839720 | 43 | 250 |
| 1 | 94839730 | 44 | 250 |
| 1 | 94839740 | 45 | 250 |
| 1 | 94839750 | 46 | 250 |
| 1 | 94839760 | 47 | 250 |
| 1 | 94839770 | 48 | 250 |
| 1 | 94839780 | 49 | 250 |
| 1 | 94839790 | 50 | 250 |
You can observe how the number of missed slots in 1.5 is 8, however in 2.0 the number of missed slots reduces to 2, which give us room for better optimization to reduce it consistently to 0. Of course, these results came only from this particular test, but repeated tests have achieved approximate results concluding that 2.0 is more stable when forging blocks with 250 transactions than on 1.5, but is still unreliable.
I have observed the following behaviors when comparing the two versions:
delegatesNextForge is faster on 2.0 out performing 1.5 in all the four indicatorstx/block <= 100 then 1.5 and 2.0 perform very similar, blocks are forged, fully filled and slots are not missed.tx/block > 100:2.0 performs better than 1.5 but it is still unstable (it misses slots and blocks)2.0 is able to noticeably forge more blocks with 500 transactions than 1.52.0 is even able to forge some blocks with 1000 transactions each.tx/block == 100 ~ 10tx/s.While working with @Usman and reading the reports of New Relic and manual logs we came up to the conclusion that the new Transaction Pool was not a bottleneck, however, we noticed that on full blocks with 250 transactions or more, Postgres activity was spiking. New Relic reported that a triple nested SELECT query on mem_accounts2delegates table was being called repeatedly causing the slowdown. This query is executed when the extended flag of AccountStore.cache method is set to true.
Query:
SELECT "address", ENCODE("publicKey", 'hex') AS "publicKey", ENCODE("secondPublicKey", 'hex') AS "secondPublicKey", "username", "isDelegate"::int::boolean, "secondSignature"::int::boolean, "balance", "asset", "multimin" AS "multiMin", "multilifetime" AS "multiLifetime", "nameexist"::int::boolean AS "nameExist", "missedBlocks", "producedBlocks", "rank", "fees", "rewards", "vote", case when "producedBlocks" + "missedBlocks" = 0 then 0 else ROUND((("producedBlocks"::float / ("producedBlocks" + "missedBlocks")) * 100.0)::numeric, 2)::float end AS productivity, (SELECT array_agg("dependentId") FROM mem_accounts2delegates WHERE "accountId" = mem_accounts.address ) AS "votedDelegatesPublicKeys", (SELECT array_agg("dependentId") FROM mem_accounts2multisignatures WHERE "accountId" = mem_accounts.address ) AS "membersPublicKeys" FROM mem_accounts WHERE ("address" = '16313739661670634666L') OR ("address" = '13457048459696651162L') ORDER BY "balance" ASC, "address" ASC LIMIT 101 OFFSET 0
By setting the flag to false for Type 0 transactions, performance increases significantly.
Further todo:
1.6. It's doing it fine on 1.5.1.6 and 2.0 in an isolated machine, possibly in parallel to speed things up. Average the results and compute the Standard Deviation for each one to identify spikes.@limiaspasdaniel Its great that you looked into this issue with extensive details. I would suggest few points if you had to iterate this once again.
For benchmarking any indicator for the network.
And most importantly, mention the exact steps how you came up with the final numbers. So any one in team and community can repeat the steps to compare their self.
Apparently you may think its just internal comparison, but once you put those numbers together, unintentionally in team and publicly people will talk about these numbers, as TPS is a vital indicator for any blockchain performance. Also many future decisions would be influenced by these numbers. So its better to do it very accurate and in the right environment with the right configurations.
Thanks for your input @nazarhussain. I will have this in mind for the next iteration.
Most helpful comment
These tests have been performed by executing
stress test transactions type 0tests fromlisk-core-qarepository and changingstressAmountvariable which corresponds to Tx sent column so thetx_sent/tx_blockis always 20.Machine: Macbook Pro 15', 2018 16GB RAM, 2.2Ghz Intel Core i7
Network: devnet (local)
Versions:
1.5,2.0I couldn't perform the tests on
1.6because New Relic is not able to instrument for some reasons. I will take a look in the future.Transaction Pool settings used during all tests (v2.0) to keep it consistent:
*MAX_TRANSACTIONS_PER_BLOCK corresponds to the values of the second column
Results on
2.0| | | | | | | | | delegatesNextForge (ms) | | | |
|---------|----------|------|---------------|---------|----------------|----------------|--------------|-------------------------|-------|------|------|
| Tx sent | Tx/block | Tx/s | Slot time (s) | Version | #Slots missed | #Forged blocks | #Full blocks | min | max | avg | SD |
| 500 | 25 | 2.5 | 10 | 2.0 | 0 | 20 | 20 | 0.473 | 1360 | 58.2 | 179 |
| 2000 | 100 | 10 | 10 | 2.0 | 0 | 20 | 20 | 0.143 | 6020 | 411 | 968 |
| 5000 | 250 | 25 | 10 | 2.0 | 2 | 20 | 20 | 0.113 | 16900 | 794 | 3420 |
| 10000 | 500 | 50 | 10 | 2.0 | 20 | 20 | 19 | 0.118 | 35900 | 1030 | 4630 |
| 20000 | 1000 | 100 | 10 | 2.0 | 11 | 4 | 2 | 0.517 | 45000 | 266 | 3040 |
| 100000 | 5000 | 500 | 10 | 2.0 | 2 | 4 | 0 | 0.460 | 74000 | 414 | 4620 |
*Standard Deviation
Results on
1.5| | | | | | | | | delegatesNextForge (ms) | | | |
|---------|----------|------|---------------|---------|----------------|----------------|--------------|-------------------------|-------|------|------|
| Tx sent | Tx/block | Tx/s | Slot time (s) | Version | #Slots missed | #Forged blocks | #Full blocks | min | max | avg | SD |
| 500 | 25 | 2.5 | 10 | 1.5 | 0 | 20 | 20 | 0.348 | 931 | 94.3 | 245 |
| 2000 | 100 | 10 | 10 | 1.5 | 0 | 20 | 20 | 0.319 | 8750 | 486 | 1220 |
| 5000 | 250 | 25 | 10 | 1.5 | 8 | 20 | 20 | 0.297 | 21900 | 754 | 1460 |
| 10000 | 500 | 50 | 10 | 1.5 | 20 | 11 | 11 | 0.371 | 44900 | 593 | 4050 |
| 20000 | 1000 | 100 | 10 | 1.5 | 4 | 5 | 4 | 0.35 | 62800 | 400 | 4470 |
| 100000 | 5000 | 500 | 10 | 1.5 | - | 0 | - | 0.342 | 53.5 | 6.93 | 8.04 |
On
1.5when transactions per block were<= 250results were similar, however, blocks with over 250 transactions performed much worse, missing slots more often and missing a greater amount of transactions than in 2.0. With 5000 tx/block the application didn't manage to fill any block.Here's a comparison of the
blocktable between1.5and2.0:1.5:
| | | | |
|---------|-----------|--------|----------------------|
| version | timestamp | height | numberOfTransactions |
| 1 | 94837200 | 11 | 250 |
| 1 | 94837230 | 12 | 250 |
| 1 | 94837250 | 13 | 250 |
| 1 | 94837270 | 14 | 250 |
| 1 | 94837280 | 15 | 250 |
| 1 | 94837300 | 16 | 250 |
| 1 | 94837310 | 17 | 250 |
| 1 | 94837320 | 18 | 250 |
| 1 | 94837330 | 19 | 250 |
| 1 | 94837350 | 20 | 250 |
| 1 | 94837360 | 21 | 250 |
| 1 | 94837370 | 22 | 250 |
| 1 | 94837380 | 23 | 250 |
| 1 | 94837400 | 24 | 250 |
| 1 | 94837410 | 25 | 250 |
| 1 | 94837420 | 26 | 250 |
| 1 | 94837430 | 27 | 250 |
| 1 | 94837450 | 28 | 250 |
| 1 | 94837460 | 29 | 250 |
| 1 | 94837470 | 30 | 250 |
2.0:
| | | | |
|---------|-----------|--------|----------------------|
| version | timestamp | height | numberOfTransactions |
| 1 | 94839580 | 31 | 250 |
| 1 | 94839590 | 32 | 250 |
| 1 | 94839610 | 33 | 250 |
| 1 | 94839630 | 34 | 250 |
| 1 | 94839640 | 35 | 250 |
| 1 | 94839650 | 36 | 250 |
| 1 | 94839660 | 37 | 250 |
| 1 | 94839670 | 38 | 250 |
| 1 | 94839680 | 39 | 250 |
| 1 | 94839690 | 40 | 250 |
| 1 | 94839700 | 41 | 250 |
| 1 | 94839710 | 42 | 250 |
| 1 | 94839720 | 43 | 250 |
| 1 | 94839730 | 44 | 250 |
| 1 | 94839740 | 45 | 250 |
| 1 | 94839750 | 46 | 250 |
| 1 | 94839760 | 47 | 250 |
| 1 | 94839770 | 48 | 250 |
| 1 | 94839780 | 49 | 250 |
| 1 | 94839790 | 50 | 250 |
You can observe how the number of missed slots in
1.5is8, however in2.0the number of missed slots reduces to2, which give us room for better optimization to reduce it consistently to0. Of course, these results came only from this particular test, but repeated tests have achieved approximate results concluding that2.0is more stable when forging blocks with 250 transactions than on1.5, but is still unreliable.I have observed the following behaviors when comparing the two versions:
delegatesNextForgeis faster on2.0out performing1.5in all the four indicatorstx/block <= 100then1.5and2.0perform very similar, blocks are forged, fully filled and slots are not missed.tx/block > 100:2.0performs better than1.5but it is still unstable (it misses slots and blocks)2.0is able to noticeably forge more blocks with 500 transactions than1.52.0is even able to forge some blocks with 1000 transactions each.tx/block == 100~10tx/s.While working with @Usman and reading the reports of New Relic and manual logs we came up to the conclusion that the new Transaction Pool was not a bottleneck, however, we noticed that on full blocks with 250 transactions or more, Postgres activity was spiking. New Relic reported that a triple nested
SELECTquery onmem_accounts2delegatestable was being called repeatedly causing the slowdown. This query is executed when theextendedflag of AccountStore.cache method is set totrue.Query:
By setting the flag to
falseforType 0transactions, performance increases significantly.Further todo:
1.6. It's doing it fine on1.5.1.6and2.0in an isolated machine, possibly in parallel to speed things up. Average the results and compute the Standard Deviation for each one to identify spikes.