Lisk-sdk: Benchmark transaction processing times

Created on 15 May 2019  路  3Comments  路  Source: LiskHQ/lisk-sdk

Expected behavior

After the objective "Improve transaction processing efficiency" is completed, benchmark the processing time of all transaction types and compare the results with Lisk Core 1.6.0.

Which version(s) does this affect? (Environment, OS, etc...)

2.0

test elementtransactions

Most helpful comment

These tests have been performed by executing stress test transactions type 0 tests from lisk-core-qa repository and changing stressAmount variable which corresponds to Tx sent column so the tx_sent/tx_block is always 20.

Machine: Macbook Pro 15', 2018 16GB RAM, 2.2Ghz Intel Core i7
Network: devnet (local)
Versions: 1.5, 2.0
I couldn't perform the tests on 1.6 because New Relic is not able to instrument for some reasons. I will take a look in the future.

Transaction Pool settings used during all tests (v2.0) to keep it consistent:

  • maxTransactionsPerQueue: 5000 (maximum)
  • receivedTransactionsLimitPerProcessing: MAX_TRANSACTIONS_PER_BLOCK*,
  • validatedTransactionsLimitPerProcessing: MAX_TRANSACTIONS_PER_BLOCK,
  • verifiedTransactionsLimitPerProcessing: MAX_TRANSACTIONS_PER_BLOCK,
  • pendingTransactionsProcessingLimit: MAX_TRANSACTIONS_PER_BLOCK

*MAX_TRANSACTIONS_PER_BLOCK corresponds to the values of the second column

Results on 2.0

| | | | | | | | | delegatesNextForge (ms) | | | |
|---------|----------|------|---------------|---------|----------------|----------------|--------------|-------------------------|-------|------|------|
| Tx sent | Tx/block | Tx/s | Slot time (s) | Version | #Slots missed | #Forged blocks | #Full blocks | min | max | avg | SD |
| 500 | 25 | 2.5 | 10 | 2.0 | 0 | 20 | 20 | 0.473 | 1360 | 58.2 | 179 |
| 2000 | 100 | 10 | 10 | 2.0 | 0 | 20 | 20 | 0.143 | 6020 | 411 | 968 |
| 5000 | 250 | 25 | 10 | 2.0 | 2 | 20 | 20 | 0.113 | 16900 | 794 | 3420 |
| 10000 | 500 | 50 | 10 | 2.0 | 20 | 20 | 19 | 0.118 | 35900 | 1030 | 4630 |
| 20000 | 1000 | 100 | 10 | 2.0 | 11 | 4 | 2 | 0.517 | 45000 | 266 | 3040 |
| 100000 | 5000 | 500 | 10 | 2.0 | 2 | 4 | 0 | 0.460 | 74000 | 414 | 4620 |

*Standard Deviation

Results on 1.5

| | | | | | | | | delegatesNextForge (ms) | | | |
|---------|----------|------|---------------|---------|----------------|----------------|--------------|-------------------------|-------|------|------|
| Tx sent | Tx/block | Tx/s | Slot time (s) | Version | #Slots missed | #Forged blocks | #Full blocks | min | max | avg | SD |
| 500 | 25 | 2.5 | 10 | 1.5 | 0 | 20 | 20 | 0.348 | 931 | 94.3 | 245 |
| 2000 | 100 | 10 | 10 | 1.5 | 0 | 20 | 20 | 0.319 | 8750 | 486 | 1220 |
| 5000 | 250 | 25 | 10 | 1.5 | 8 | 20 | 20 | 0.297 | 21900 | 754 | 1460 |
| 10000 | 500 | 50 | 10 | 1.5 | 20 | 11 | 11 | 0.371 | 44900 | 593 | 4050 |
| 20000 | 1000 | 100 | 10 | 1.5 | 4 | 5 | 4 | 0.35 | 62800 | 400 | 4470 |
| 100000 | 5000 | 500 | 10 | 1.5 | - | 0 | - | 0.342 | 53.5 | 6.93 | 8.04 |

On 1.5 when transactions per block were <= 250 results were similar, however, blocks with over 250 transactions performed much worse, missing slots more often and missing a greater amount of transactions than in 2.0. With 5000 tx/block the application didn't manage to fill any block.

Here's a comparison of the block table between 1.5 and 2.0:

1.5:

| | | | |
|---------|-----------|--------|----------------------|
| version | timestamp | height | numberOfTransactions |
| 1 | 94837200 | 11 | 250 |
| 1 | 94837230 | 12 | 250 |
| 1 | 94837250 | 13 | 250 |
| 1 | 94837270 | 14 | 250 |
| 1 | 94837280 | 15 | 250 |
| 1 | 94837300 | 16 | 250 |
| 1 | 94837310 | 17 | 250 |
| 1 | 94837320 | 18 | 250 |
| 1 | 94837330 | 19 | 250 |
| 1 | 94837350 | 20 | 250 |
| 1 | 94837360 | 21 | 250 |
| 1 | 94837370 | 22 | 250 |
| 1 | 94837380 | 23 | 250 |
| 1 | 94837400 | 24 | 250 |
| 1 | 94837410 | 25 | 250 |
| 1 | 94837420 | 26 | 250 |
| 1 | 94837430 | 27 | 250 |
| 1 | 94837450 | 28 | 250 |
| 1 | 94837460 | 29 | 250 |
| 1 | 94837470 | 30 | 250 |

2.0:

| | | | |
|---------|-----------|--------|----------------------|
| version | timestamp | height | numberOfTransactions |
| 1 | 94839580 | 31 | 250 |
| 1 | 94839590 | 32 | 250 |
| 1 | 94839610 | 33 | 250 |
| 1 | 94839630 | 34 | 250 |
| 1 | 94839640 | 35 | 250 |
| 1 | 94839650 | 36 | 250 |
| 1 | 94839660 | 37 | 250 |
| 1 | 94839670 | 38 | 250 |
| 1 | 94839680 | 39 | 250 |
| 1 | 94839690 | 40 | 250 |
| 1 | 94839700 | 41 | 250 |
| 1 | 94839710 | 42 | 250 |
| 1 | 94839720 | 43 | 250 |
| 1 | 94839730 | 44 | 250 |
| 1 | 94839740 | 45 | 250 |
| 1 | 94839750 | 46 | 250 |
| 1 | 94839760 | 47 | 250 |
| 1 | 94839770 | 48 | 250 |
| 1 | 94839780 | 49 | 250 |
| 1 | 94839790 | 50 | 250 |

You can observe how the number of missed slots in 1.5 is 8, however in 2.0 the number of missed slots reduces to 2, which give us room for better optimization to reduce it consistently to 0. Of course, these results came only from this particular test, but repeated tests have achieved approximate results concluding that 2.0 is more stable when forging blocks with 250 transactions than on 1.5, but is still unreliable.

I have observed the following behaviors when comparing the two versions:

  • delegatesNextForge is faster on 2.0 out performing 1.5 in all the four indicators

    • When tx/block <= 100 then 1.5 and 2.0 perform very similar, blocks are forged, fully filled and slots are not missed.

    • When tx/block > 100:



      • 2.0 performs better than 1.5 but it is still unstable (it misses slots and blocks)


      • 2.0 is able to noticeably forge more blocks with 500 transactions than 1.5


      • 2.0 is even able to forge some blocks with 1000 transactions each.



  • Both versions perform stable when tx/block == 100 ~ 10tx/s.

While working with @Usman and reading the reports of New Relic and manual logs we came up to the conclusion that the new Transaction Pool was not a bottleneck, however, we noticed that on full blocks with 250 transactions or more, Postgres activity was spiking. New Relic reported that a triple nested SELECT query on mem_accounts2delegates table was being called repeatedly causing the slowdown. This query is executed when the extended flag of AccountStore.cache method is set to true.

Query:

SELECT "address", ENCODE("publicKey", 'hex')  AS "publicKey", ENCODE("secondPublicKey", 'hex')  AS "secondPublicKey", "username", "isDelegate"::int::boolean, "secondSignature"::int::boolean, "balance", "asset", "multimin" AS "multiMin", "multilifetime" AS "multiLifetime", "nameexist"::int::boolean AS "nameExist", "missedBlocks", "producedBlocks", "rank", "fees", "rewards", "vote", case when "producedBlocks" + "missedBlocks" = 0 then 0 else ROUND((("producedBlocks"::float / ("producedBlocks" + "missedBlocks")) * 100.0)::numeric, 2)::float end AS productivity, (SELECT array_agg("dependentId")  FROM mem_accounts2delegates WHERE "accountId" = mem_accounts.address )  AS "votedDelegatesPublicKeys", (SELECT array_agg("dependentId")  FROM mem_accounts2multisignatures WHERE "accountId" = mem_accounts.address )  AS "membersPublicKeys" FROM mem_accounts WHERE ("address" = '16313739661670634666L')  OR ("address" = '13457048459696651162L')  ORDER BY "balance" ASC, "address" ASC LIMIT 101 OFFSET 0

By setting the flag to false for Type 0 transactions, performance increases significantly.

Further todo:

  • See why New Relic is not able to track web and non-web transactions on 1.6. It's doing it fine on 1.5.
  • Run these tests multiple times for each block size and transaction type on 1.6 and 2.0 in an isolated machine, possibly in parallel to speed things up. Average the results and compute the Standard Deviation for each one to identify spikes.
  • Send transactions using RPC instead of the HTTP API and compare the results.
  • Try to run the HTTP API as a child process and compare the results.
  • Investigate Postgres activity on big blocks on the above query.

All 3 comments

These tests have been performed by executing stress test transactions type 0 tests from lisk-core-qa repository and changing stressAmount variable which corresponds to Tx sent column so the tx_sent/tx_block is always 20.

Machine: Macbook Pro 15', 2018 16GB RAM, 2.2Ghz Intel Core i7
Network: devnet (local)
Versions: 1.5, 2.0
I couldn't perform the tests on 1.6 because New Relic is not able to instrument for some reasons. I will take a look in the future.

Transaction Pool settings used during all tests (v2.0) to keep it consistent:

  • maxTransactionsPerQueue: 5000 (maximum)
  • receivedTransactionsLimitPerProcessing: MAX_TRANSACTIONS_PER_BLOCK*,
  • validatedTransactionsLimitPerProcessing: MAX_TRANSACTIONS_PER_BLOCK,
  • verifiedTransactionsLimitPerProcessing: MAX_TRANSACTIONS_PER_BLOCK,
  • pendingTransactionsProcessingLimit: MAX_TRANSACTIONS_PER_BLOCK

*MAX_TRANSACTIONS_PER_BLOCK corresponds to the values of the second column

Results on 2.0

| | | | | | | | | delegatesNextForge (ms) | | | |
|---------|----------|------|---------------|---------|----------------|----------------|--------------|-------------------------|-------|------|------|
| Tx sent | Tx/block | Tx/s | Slot time (s) | Version | #Slots missed | #Forged blocks | #Full blocks | min | max | avg | SD |
| 500 | 25 | 2.5 | 10 | 2.0 | 0 | 20 | 20 | 0.473 | 1360 | 58.2 | 179 |
| 2000 | 100 | 10 | 10 | 2.0 | 0 | 20 | 20 | 0.143 | 6020 | 411 | 968 |
| 5000 | 250 | 25 | 10 | 2.0 | 2 | 20 | 20 | 0.113 | 16900 | 794 | 3420 |
| 10000 | 500 | 50 | 10 | 2.0 | 20 | 20 | 19 | 0.118 | 35900 | 1030 | 4630 |
| 20000 | 1000 | 100 | 10 | 2.0 | 11 | 4 | 2 | 0.517 | 45000 | 266 | 3040 |
| 100000 | 5000 | 500 | 10 | 2.0 | 2 | 4 | 0 | 0.460 | 74000 | 414 | 4620 |

*Standard Deviation

Results on 1.5

| | | | | | | | | delegatesNextForge (ms) | | | |
|---------|----------|------|---------------|---------|----------------|----------------|--------------|-------------------------|-------|------|------|
| Tx sent | Tx/block | Tx/s | Slot time (s) | Version | #Slots missed | #Forged blocks | #Full blocks | min | max | avg | SD |
| 500 | 25 | 2.5 | 10 | 1.5 | 0 | 20 | 20 | 0.348 | 931 | 94.3 | 245 |
| 2000 | 100 | 10 | 10 | 1.5 | 0 | 20 | 20 | 0.319 | 8750 | 486 | 1220 |
| 5000 | 250 | 25 | 10 | 1.5 | 8 | 20 | 20 | 0.297 | 21900 | 754 | 1460 |
| 10000 | 500 | 50 | 10 | 1.5 | 20 | 11 | 11 | 0.371 | 44900 | 593 | 4050 |
| 20000 | 1000 | 100 | 10 | 1.5 | 4 | 5 | 4 | 0.35 | 62800 | 400 | 4470 |
| 100000 | 5000 | 500 | 10 | 1.5 | - | 0 | - | 0.342 | 53.5 | 6.93 | 8.04 |

On 1.5 when transactions per block were <= 250 results were similar, however, blocks with over 250 transactions performed much worse, missing slots more often and missing a greater amount of transactions than in 2.0. With 5000 tx/block the application didn't manage to fill any block.

Here's a comparison of the block table between 1.5 and 2.0:

1.5:

| | | | |
|---------|-----------|--------|----------------------|
| version | timestamp | height | numberOfTransactions |
| 1 | 94837200 | 11 | 250 |
| 1 | 94837230 | 12 | 250 |
| 1 | 94837250 | 13 | 250 |
| 1 | 94837270 | 14 | 250 |
| 1 | 94837280 | 15 | 250 |
| 1 | 94837300 | 16 | 250 |
| 1 | 94837310 | 17 | 250 |
| 1 | 94837320 | 18 | 250 |
| 1 | 94837330 | 19 | 250 |
| 1 | 94837350 | 20 | 250 |
| 1 | 94837360 | 21 | 250 |
| 1 | 94837370 | 22 | 250 |
| 1 | 94837380 | 23 | 250 |
| 1 | 94837400 | 24 | 250 |
| 1 | 94837410 | 25 | 250 |
| 1 | 94837420 | 26 | 250 |
| 1 | 94837430 | 27 | 250 |
| 1 | 94837450 | 28 | 250 |
| 1 | 94837460 | 29 | 250 |
| 1 | 94837470 | 30 | 250 |

2.0:

| | | | |
|---------|-----------|--------|----------------------|
| version | timestamp | height | numberOfTransactions |
| 1 | 94839580 | 31 | 250 |
| 1 | 94839590 | 32 | 250 |
| 1 | 94839610 | 33 | 250 |
| 1 | 94839630 | 34 | 250 |
| 1 | 94839640 | 35 | 250 |
| 1 | 94839650 | 36 | 250 |
| 1 | 94839660 | 37 | 250 |
| 1 | 94839670 | 38 | 250 |
| 1 | 94839680 | 39 | 250 |
| 1 | 94839690 | 40 | 250 |
| 1 | 94839700 | 41 | 250 |
| 1 | 94839710 | 42 | 250 |
| 1 | 94839720 | 43 | 250 |
| 1 | 94839730 | 44 | 250 |
| 1 | 94839740 | 45 | 250 |
| 1 | 94839750 | 46 | 250 |
| 1 | 94839760 | 47 | 250 |
| 1 | 94839770 | 48 | 250 |
| 1 | 94839780 | 49 | 250 |
| 1 | 94839790 | 50 | 250 |

You can observe how the number of missed slots in 1.5 is 8, however in 2.0 the number of missed slots reduces to 2, which give us room for better optimization to reduce it consistently to 0. Of course, these results came only from this particular test, but repeated tests have achieved approximate results concluding that 2.0 is more stable when forging blocks with 250 transactions than on 1.5, but is still unreliable.

I have observed the following behaviors when comparing the two versions:

  • delegatesNextForge is faster on 2.0 out performing 1.5 in all the four indicators

    • When tx/block <= 100 then 1.5 and 2.0 perform very similar, blocks are forged, fully filled and slots are not missed.

    • When tx/block > 100:



      • 2.0 performs better than 1.5 but it is still unstable (it misses slots and blocks)


      • 2.0 is able to noticeably forge more blocks with 500 transactions than 1.5


      • 2.0 is even able to forge some blocks with 1000 transactions each.



  • Both versions perform stable when tx/block == 100 ~ 10tx/s.

While working with @Usman and reading the reports of New Relic and manual logs we came up to the conclusion that the new Transaction Pool was not a bottleneck, however, we noticed that on full blocks with 250 transactions or more, Postgres activity was spiking. New Relic reported that a triple nested SELECT query on mem_accounts2delegates table was being called repeatedly causing the slowdown. This query is executed when the extended flag of AccountStore.cache method is set to true.

Query:

SELECT "address", ENCODE("publicKey", 'hex')  AS "publicKey", ENCODE("secondPublicKey", 'hex')  AS "secondPublicKey", "username", "isDelegate"::int::boolean, "secondSignature"::int::boolean, "balance", "asset", "multimin" AS "multiMin", "multilifetime" AS "multiLifetime", "nameexist"::int::boolean AS "nameExist", "missedBlocks", "producedBlocks", "rank", "fees", "rewards", "vote", case when "producedBlocks" + "missedBlocks" = 0 then 0 else ROUND((("producedBlocks"::float / ("producedBlocks" + "missedBlocks")) * 100.0)::numeric, 2)::float end AS productivity, (SELECT array_agg("dependentId")  FROM mem_accounts2delegates WHERE "accountId" = mem_accounts.address )  AS "votedDelegatesPublicKeys", (SELECT array_agg("dependentId")  FROM mem_accounts2multisignatures WHERE "accountId" = mem_accounts.address )  AS "membersPublicKeys" FROM mem_accounts WHERE ("address" = '16313739661670634666L')  OR ("address" = '13457048459696651162L')  ORDER BY "balance" ASC, "address" ASC LIMIT 101 OFFSET 0

By setting the flag to false for Type 0 transactions, performance increases significantly.

Further todo:

  • See why New Relic is not able to track web and non-web transactions on 1.6. It's doing it fine on 1.5.
  • Run these tests multiple times for each block size and transaction type on 1.6 and 2.0 in an isolated machine, possibly in parallel to speed things up. Average the results and compute the Standard Deviation for each one to identify spikes.
  • Send transactions using RPC instead of the HTTP API and compare the results.
  • Try to run the HTTP API as a child process and compare the results.
  • Investigate Postgres activity on big blocks on the above query.

@limiaspasdaniel Its great that you looked into this issue with extensive details. I would suggest few points if you had to iterate this once again.

For benchmarking any indicator for the network.

  1. Please use a Linux machine and not use the MacBook. As that's what environment most users are running.
  2. Always use an empty machine, where you know exact what is running on the system. On a development system, you never know when Spotify hangs CPU, or when system is doing file indexing , so your results could be mis-leading.
  3. Always use the binary builds and not the source code, because in binary build we optimize the PostgreSQL specifically for our use case of blockchain. So no wonder if you face any issue on PostgreSQL in development, actually replicate on a binary build.

And most importantly, mention the exact steps how you came up with the final numbers. So any one in team and community can repeat the steps to compare their self.

Apparently you may think its just internal comparison, but once you put those numbers together, unintentionally in team and publicly people will talk about these numbers, as TPS is a vital indicator for any blockchain performance. Also many future decisions would be influenced by these numbers. So its better to do it very accurate and in the right environment with the right configurations.

Thanks for your input @nazarhussain. I will have this in mind for the next iteration.

Was this page helpful?
0 / 5 - 0 ratings

Related issues

diego-G picture diego-G  路  3Comments

willclarktech picture willclarktech  路  4Comments

karek314 picture karek314  路  3Comments

karmacoma picture karmacoma  路  3Comments

Tschakki picture Tschakki  路  4Comments