Lisk-sdk: Fail to recover chain when syncing

Created on 29 Mar 2018  路  2Comments  路  Source: LiskHQ/lisk-sdk

Expected behavior

In case of common block failure, we should delete block from the blockchain.

Actual behavior

In certain scenarios, when there is a fork in the network and (Step 1) the node tries to get on the right chain by calling loadCommonBlock, (Step 2)it sends the list of block ids of last five rounds, if peer has any of those blocks available, (Step 3) it responds with that block id and height(it responds with most recent block information), after this step, (Step 4) we call loadBlockFromPeer with our latest block, if peer has that block, (Step 5) it returns a 33 list of blocks with height greater than height of the block it received. if it doesn't have that block in the database, it returns genesis block and next 33 blocks(which is an odd behavior). and (Step 6) the genesisBlock fails on verification step. Afterwards, sync process ends, and we continue without trying to recover the chain.

I see two issues with our current approach to the common block:
1- While syncing with a peer of greater height, after receiving the response of getCommonBlock (Step 3), we should check if the peer contains our latest block, if it doesn't we should try to recover the chain. (Because that means that peer has a different set of blocks, and we should delete the block and try to follow the longest chain)
2- In case we don't find the block id (Step 5), we should return empty response, as opposed to returning genesisBlock.

Steps to reproduce

Stated in the actual behavior.

Which version(s) does this affect? (Environment, OS, etc...)

bug

Most helpful comment

Did some research, looks like the cause is we unintentionally changed behavior of one query. In 0.9.12 we pass array and query get only first element (block ID) from it and return empty rows. In 1.0.0 we pass array wrapped in array, so query get all elements (block IDs) and return genesis block. After removing array wrapping (return this.db.query(sql.getBlocksForTransport, [ids]); -> return this.db.query(sql.getBlocksForTransport, ids);) behavior is fine, but node is still not able to rollback properly. Reason for that is broadhash consensus is always 100% because we not update our headers is some cases (or more like in most of them). We should update headers always when last block changes, which will be solved (partially) by https://github.com/LiskHQ/lisk/pull/1793.

0.9.12 behavior:
https://github.com/LiskHQ/lisk/blob/7bf54a05815049f31931ffaba379b6d3b5febe79/modules/transport.js#L568
https://github.com/LiskHQ/lisk/blob/7bf54a05815049f31931ffaba379b6d3b5febe79/sql/transport.js#L4

[inf] 2018-04-02 13:52:50 | blocksCommon -> BEGIN - {"ids":"1036344777379915118,3448755019703517033,6524861224470851795","peer":{"ip":"127.0.0.1","port":4001,"state":2,"string":"127.0.0.1:4001","os":"darwin16.7.0","version":"0.0.0a","broadhash":"782866854d71827ed2ffc4696a74ed34724fa50dd97467e40db80abb54ce9455","height":130,"nonce":"Awd7DXpONNnYIEn9","updated":1522677170102}}
[inf] 2018-04-02 13:52:50 | blocksCommon -> escapedIds - {"escapedIds":["1036344777379915118","3448755019703517033","6524861224470851795"]}
[log] 2018-04-02 13:52:50 | query - SELECT MAX("height") AS "height", "id", "previousBlock", "timestamp" FROM blocks WHERE "id" IN ('1036344777379915118') GROUP BY "id" ORDER BY "height" DESC
[inf] 2018-04-02 13:52:50 | blocksCommon -> sql.getCommonBlock - {"escapedIds":["1036344777379915118","3448755019703517033","6524861224470851795"],"rows":[]}

1.0.0 behavior:
https://github.com/LiskHQ/lisk/blob/bcf5d6b8733cc9b0fc2be0e368af6d592314eb3d/modules/transport.js#L503-L504
https://github.com/LiskHQ/lisk/blob/bcf5d6b8733cc9b0fc2be0e368af6d592314eb3d/db/repos/blocks.js#L284-L287
https://github.com/LiskHQ/lisk/blob/bcf5d6b8733cc9b0fc2be0e368af6d592314eb3d/db/sql/blocks/get_blocks_for_transport.sql#L22-L26
```
[inf] 2018-04-02 14:23:56 | blocksCommon -> BEGIN - {"query":{"ids":"5598965201864899168,3448755019703517033,6524861224470851795"}}
[inf] 2018-04-02 14:23:56 | blocksCommon -> escapedIds - {"escapedIds":["5598965201864899168","3448755019703517033","6524861224470851795"]}
[log] 2018-04-02 14:23:56 | query - SELECT max(height) AS height, id, "previousBlock", "timestamp" FROM blocks WHERE id IN ('5598965201864899168','3448755019703517033','6524861224470851795') GROUP BY id ORDER BY height DESC
[inf] 2018-04-02 14:23:56 | blocksCommon -> db.blocks.getBlocksForTransport - {"escapedIds":["5598965201864899168","3448755019703517033","6524861224470851795"],"rows":[{"height":1,"id":"6524861224470851795","previousBlock":null,"timestamp":0}]}

All 2 comments

Did some research, looks like the cause is we unintentionally changed behavior of one query. In 0.9.12 we pass array and query get only first element (block ID) from it and return empty rows. In 1.0.0 we pass array wrapped in array, so query get all elements (block IDs) and return genesis block. After removing array wrapping (return this.db.query(sql.getBlocksForTransport, [ids]); -> return this.db.query(sql.getBlocksForTransport, ids);) behavior is fine, but node is still not able to rollback properly. Reason for that is broadhash consensus is always 100% because we not update our headers is some cases (or more like in most of them). We should update headers always when last block changes, which will be solved (partially) by https://github.com/LiskHQ/lisk/pull/1793.

0.9.12 behavior:
https://github.com/LiskHQ/lisk/blob/7bf54a05815049f31931ffaba379b6d3b5febe79/modules/transport.js#L568
https://github.com/LiskHQ/lisk/blob/7bf54a05815049f31931ffaba379b6d3b5febe79/sql/transport.js#L4

[inf] 2018-04-02 13:52:50 | blocksCommon -> BEGIN - {"ids":"1036344777379915118,3448755019703517033,6524861224470851795","peer":{"ip":"127.0.0.1","port":4001,"state":2,"string":"127.0.0.1:4001","os":"darwin16.7.0","version":"0.0.0a","broadhash":"782866854d71827ed2ffc4696a74ed34724fa50dd97467e40db80abb54ce9455","height":130,"nonce":"Awd7DXpONNnYIEn9","updated":1522677170102}}
[inf] 2018-04-02 13:52:50 | blocksCommon -> escapedIds - {"escapedIds":["1036344777379915118","3448755019703517033","6524861224470851795"]}
[log] 2018-04-02 13:52:50 | query - SELECT MAX("height") AS "height", "id", "previousBlock", "timestamp" FROM blocks WHERE "id" IN ('1036344777379915118') GROUP BY "id" ORDER BY "height" DESC
[inf] 2018-04-02 13:52:50 | blocksCommon -> sql.getCommonBlock - {"escapedIds":["1036344777379915118","3448755019703517033","6524861224470851795"],"rows":[]}

1.0.0 behavior:
https://github.com/LiskHQ/lisk/blob/bcf5d6b8733cc9b0fc2be0e368af6d592314eb3d/modules/transport.js#L503-L504
https://github.com/LiskHQ/lisk/blob/bcf5d6b8733cc9b0fc2be0e368af6d592314eb3d/db/repos/blocks.js#L284-L287
https://github.com/LiskHQ/lisk/blob/bcf5d6b8733cc9b0fc2be0e368af6d592314eb3d/db/sql/blocks/get_blocks_for_transport.sql#L22-L26
```
[inf] 2018-04-02 14:23:56 | blocksCommon -> BEGIN - {"query":{"ids":"5598965201864899168,3448755019703517033,6524861224470851795"}}
[inf] 2018-04-02 14:23:56 | blocksCommon -> escapedIds - {"escapedIds":["5598965201864899168","3448755019703517033","6524861224470851795"]}
[log] 2018-04-02 14:23:56 | query - SELECT max(height) AS height, id, "previousBlock", "timestamp" FROM blocks WHERE id IN ('5598965201864899168','3448755019703517033','6524861224470851795') GROUP BY id ORDER BY height DESC
[inf] 2018-04-02 14:23:56 | blocksCommon -> db.blocks.getBlocksForTransport - {"escapedIds":["5598965201864899168","3448755019703517033","6524861224470851795"],"rows":[{"height":1,"id":"6524861224470851795","previousBlock":null,"timestamp":0}]}

Thanks @4miners for the help!

Was this page helpful?
0 / 5 - 0 ratings

Related issues

toschdev picture toschdev  路  3Comments

MaciejBaj picture MaciejBaj  路  3Comments

ManuGowda picture ManuGowda  路  3Comments

hendrikhofstadt picture hendrikhofstadt  路  4Comments

willclarktech picture willclarktech  路  4Comments