Lisk-sdk: Achieve atomic block save

Created on 29 Oct 2016  路  14Comments  路  Source: LiskHQ/lisk-sdk

Parent: #1169

If a node is interrupted during block processing i.e. due to reload, there is a high probability of partial writes to the database, leading to corruption of memory tables and a break in consensus with the rest of network.

Solution: Rewrite database logic so that transaction application, block save, and round tick are all encompassed within an atomic database transaction that can be easily rolled back upon interruption or error.

This should better ensure data integrity across nodes on the network.

Logic rewrite will also provide better use of shared connections from pool, and improve block processing capability greatly.

Current behavior

BEGIN DB TRANSACTION
  Apply unconfirmed transaction
COMMIT

BEGIN DB TRANSACTION
  Apply confirmed transaction
COMMIT

BEGIN DB TRANSACTION
  Insert block
  Insert transaction
  Insert transaction type
COMMIT

BEGIN DB TRANSACTION
  Round tick
COMMIT

Bad: In the event of rollback on error in later db transactions, memory tables will be inconsistent.

Proposed behavior

BEGIN DB TRANSACTION
  Apply unconfirmed transaction
  Apply confirmed transaction
  Insert block
  Insert transaction
  Insert transaction type
  Round tick
COMMIT

Good: In the event of rollback on error, memory tables will be consistent.

Code Target

An example where we are doing things correctly, but also at the heart of where our database writes start:

https://github.com/LiskHQ/lisk/blob/9598668a45e2f38f1f787b1bee393c39c2827066/modules/blocks/chain.js#L84-L109

Therefore, the actions: apply unconfirmed transaction, apply confirmed transaction and round ticks are those we need to consider.

Task List

  • [x] Add "transaction applications" (unconfirmed/confirmed) to the same promise chain as Chain.prototype.saveBlock #1181
  • [x] Add "round ticks" (forwards/backwards) to the same promise chain as Chain.prototype.saveBlock #1297

Most helpful comment

@karmacoma https://github.com/LiskHQ/lisk/issues/302#issuecomment-353664288 Yes that can be achieved. But that tx object needed to pass to every level module->logic->db. I am working on it to see how to refactor it more elegantly.

All 14 comments

Work in progress. Will likely be included in the next minor release 0.6.0.

Stuck with this? Need help? :wink:

The pattern suggested in pg-promise-demo can be achieved gradually, step by step :wink:

The first best step - moving all the SQL into the same kind of structure as in the demo.

Thanks @vitaly-t. Your help and advice is always appreciated 馃憤

Should largely be resolved after: https://github.com/LiskHQ/lisk/issues/543. We will then come back and review best practices.

Closed by #543

Reopened by #1169

See: https://github.com/vitaly-t/pg-promise/wiki/Chaining-Queries for reasoning, especially when speaking of poor connection usage.

The transaction opening library.db.tx probably needs to move further up the stack in order to encompass the full set of operations: https://github.com/LiskHQ/lisk/blob/9598668a45e2f38f1f787b1bee393c39c2827066/modules/blocks/chain.js#L84-L87

Yes I will go through all and see what's the best way to do. There are two different things we should keep in mind:

  1. Acquiring a real physical connection to database frequently
  2. Acquiring an existing connection from pool for executing a query

First one we should totally avoid, for second one we should improve as much we can.

  1. Acquiring an existing connection from pool for executing a query

I think can be achieved by downwards sharing of task and tx objects from a higher level in the call stack.

The main focal point is block processing where are database writes are.

However, in order to review the rest of application, mainly in terms of database reads, I scanned the 1.0.0 branch for existing database calls:

https://gist.github.com/karmacoma/80495431baa2a56c47298eeaa8cd2e8f

NOTE: This would change when we revert: #597 and our round logic is put back in the application layer by #1174.

Since a long time, we have https://github.com/vitaly-t/pg-monitor configured for db event monitoring.

https://github.com/LiskHQ/lisk/blob/9598668a45e2f38f1f787b1bee393c39c2827066/config.json#L23-L25

By adding this event: http://vitaly-t.github.io/pg-promise/global.html#event:connect
By adding this event: http://vitaly-t.github.io/pg-promise/global.html#event:query
By adding this event: http://vitaly-t.github.io/pg-promise/global.html#event:disconnect

We can reveal the problem we are speaking of more clearly:

  • Memory table update (transaction application - mem_accounts, mem_round):
22:16:10 connect(Oliver@lisk_test)
22:16:10 update "mem_accounts" set "balance" = "balance" + 100000000, "u_balance" = "u_balance" + 100000000, "blockId" = '1309855340732829944' where "address" = '16313739661670634666L';INSERT INTO mem_round ("address", "amount", "delegate", "blockId", "round") SELECT '16313739661670634666L', (100000000)::bigint, "dependentId", '1309855340732829944', 1 FROM mem_accounts2delegates WHERE "accountId" = '16313739661670634666L';
22:16:10 disconnect(Oliver@lisk_test)
  • Block + transaction insertion:
22:16:10 connect(Oliver@lisk_test)
22:16:10 tx: begin
22:16:10 tx: INSERT INTO "blocks"("id","version","timestamp","height","previousBlock","numberOfTransactions","totalAmount","totalFee","reward","payloadLength","payloadHash","generatorPublicKey","blockSignature") VALUES ('1309855340732829944',0,49868170,11,'8952117713527584814',1,100000000,10000000,0,117,'\x2d7ced9bd5961e2dbf74ac461f9902e3c24573e01efcc240ad1e53bf9e5ef10e','\x2f9b9a43b915bb8dcea45ea3b8552ebec202eb196a7889c2495d948e15f4a724','\xe53906b30a37ae61283ba041d96f0e6ad240bd29191c304c19267a7b9dd285d850ea01cf709f6e9fd795c34ac8888651e7a12dce3678f1847d167d55456d3a06')
22:16:10 tx: INSERT INTO "trs"("id","blockId","type","timestamp","senderPublicKey","requesterPublicKey","senderId","recipientId","amount","fee","signature","signSignature","signatures") VALUES ('3251201825196309549','1309855340732829944',0,49868167,'\xc094ebee7ec0c50ebee32918655e089f6e1a604b83bcaa760293c61e0f18ab6f',null,'16313739661670634666L','16313739661670634666L',100000000,10000000,'\xf3eba2756fbb30679acc52b6de2358276864ffa4e74d090e5a5b2620eb62b5fc6b8cfc4aa47eed7be47cba3e1efae8919ac0073a52b64f85f8e1bbe0a2d16106',null,null)
22:16:10 tx: commit
22:16:10 disconnect(Oliver@lisk_test)
  • Round tick:
22:16:10 tx(Tick): begin
22:16:10 disconnect(Oliver@lisk_test)
22:16:10 tx(Tick): update "mem_accounts" set "blockId" = '1309855340732829944', "producedblocks" = "producedblocks" + 1 where "address" = '5380829552614149409L';
22:16:10 tx(Tick): commit
22:16:10 disconnect(Oliver@lisk_test)
22:16:10 connect(Oliver@lisk_test)

Our parent (Crypti) only had limited support for db transactions, no db connection pooling or sharing, and no query chaining, therefore what you see here are the results of the first iteration of the work done thus far. If we look at it optimistically, the situation is only bad from point of view of no shared database transaction between these discrete actions. This issue represents the next step in adhering to the best practices more completely.

@karmacoma https://github.com/LiskHQ/lisk/issues/302#issuecomment-353664288 Yes that can be achieved. But that tx object needed to pass to every level module->logic->db. I am working on it to see how to refactor it more elegantly.

Was this page helpful?
0 / 5 - 0 ratings

Related issues

diego-G picture diego-G  路  3Comments

willclarktech picture willclarktech  路  4Comments

slaweet picture slaweet  路  3Comments

karmacoma picture karmacoma  路  3Comments

hendrikhofstadt picture hendrikhofstadt  路  4Comments