Parent: #1169
If a node is interrupted during block processing i.e. due to reload, there is a high probability of partial writes to the database, leading to corruption of memory tables and a break in consensus with the rest of network.
Solution: Rewrite database logic so that transaction application, block save, and round tick are all encompassed within an atomic database transaction that can be easily rolled back upon interruption or error.
This should better ensure data integrity across nodes on the network.
Logic rewrite will also provide better use of shared connections from pool, and improve block processing capability greatly.
BEGIN DB TRANSACTION
Apply unconfirmed transaction
COMMIT
BEGIN DB TRANSACTION
Apply confirmed transaction
COMMIT
BEGIN DB TRANSACTION
Insert block
Insert transaction
Insert transaction type
COMMIT
BEGIN DB TRANSACTION
Round tick
COMMIT
Bad: In the event of rollback on error in later db transactions, memory tables will be inconsistent.
BEGIN DB TRANSACTION
Apply unconfirmed transaction
Apply confirmed transaction
Insert block
Insert transaction
Insert transaction type
Round tick
COMMIT
Good: In the event of rollback on error, memory tables will be consistent.
An example where we are doing things correctly, but also at the heart of where our database writes start:
Therefore, the actions: apply unconfirmed transaction, apply confirmed transaction and round ticks are those we need to consider.
Chain.prototype.saveBlock #1181Chain.prototype.saveBlock #1297Work in progress. Will likely be included in the next minor release 0.6.0.
Stuck with this? Need help? :wink:
The pattern suggested in pg-promise-demo can be achieved gradually, step by step :wink:
The first best step - moving all the SQL into the same kind of structure as in the demo.
Thanks @vitaly-t. Your help and advice is always appreciated 馃憤
Should largely be resolved after: https://github.com/LiskHQ/lisk/issues/543. We will then come back and review best practices.
Closed by #543
Reopened by #1169
See: https://github.com/vitaly-t/pg-promise/wiki/Chaining-Queries for reasoning, especially when speaking of poor connection usage.
The transaction opening library.db.tx probably needs to move further up the stack in order to encompass the full set of operations: https://github.com/LiskHQ/lisk/blob/9598668a45e2f38f1f787b1bee393c39c2827066/modules/blocks/chain.js#L84-L87
Yes I will go through all and see what's the best way to do. There are two different things we should keep in mind:
First one we should totally avoid, for second one we should improve as much we can.
- Acquiring an existing connection from pool for executing a query
I think can be achieved by downwards sharing of task and tx objects from a higher level in the call stack.
The main focal point is block processing where are database writes are.
However, in order to review the rest of application, mainly in terms of database reads, I scanned the 1.0.0 branch for existing database calls:
https://gist.github.com/karmacoma/80495431baa2a56c47298eeaa8cd2e8f
NOTE: This would change when we revert: #597 and our round logic is put back in the application layer by #1174.
Since a long time, we have https://github.com/vitaly-t/pg-monitor configured for db event monitoring.
https://github.com/LiskHQ/lisk/blob/9598668a45e2f38f1f787b1bee393c39c2827066/config.json#L23-L25
By adding this event: http://vitaly-t.github.io/pg-promise/global.html#event:connect
By adding this event: http://vitaly-t.github.io/pg-promise/global.html#event:query
By adding this event: http://vitaly-t.github.io/pg-promise/global.html#event:disconnect
We can reveal the problem we are speaking of more clearly:
22:16:10 connect(Oliver@lisk_test)
22:16:10 update "mem_accounts" set "balance" = "balance" + 100000000, "u_balance" = "u_balance" + 100000000, "blockId" = '1309855340732829944' where "address" = '16313739661670634666L';INSERT INTO mem_round ("address", "amount", "delegate", "blockId", "round") SELECT '16313739661670634666L', (100000000)::bigint, "dependentId", '1309855340732829944', 1 FROM mem_accounts2delegates WHERE "accountId" = '16313739661670634666L';
22:16:10 disconnect(Oliver@lisk_test)
22:16:10 connect(Oliver@lisk_test)
22:16:10 tx: begin
22:16:10 tx: INSERT INTO "blocks"("id","version","timestamp","height","previousBlock","numberOfTransactions","totalAmount","totalFee","reward","payloadLength","payloadHash","generatorPublicKey","blockSignature") VALUES ('1309855340732829944',0,49868170,11,'8952117713527584814',1,100000000,10000000,0,117,'\x2d7ced9bd5961e2dbf74ac461f9902e3c24573e01efcc240ad1e53bf9e5ef10e','\x2f9b9a43b915bb8dcea45ea3b8552ebec202eb196a7889c2495d948e15f4a724','\xe53906b30a37ae61283ba041d96f0e6ad240bd29191c304c19267a7b9dd285d850ea01cf709f6e9fd795c34ac8888651e7a12dce3678f1847d167d55456d3a06')
22:16:10 tx: INSERT INTO "trs"("id","blockId","type","timestamp","senderPublicKey","requesterPublicKey","senderId","recipientId","amount","fee","signature","signSignature","signatures") VALUES ('3251201825196309549','1309855340732829944',0,49868167,'\xc094ebee7ec0c50ebee32918655e089f6e1a604b83bcaa760293c61e0f18ab6f',null,'16313739661670634666L','16313739661670634666L',100000000,10000000,'\xf3eba2756fbb30679acc52b6de2358276864ffa4e74d090e5a5b2620eb62b5fc6b8cfc4aa47eed7be47cba3e1efae8919ac0073a52b64f85f8e1bbe0a2d16106',null,null)
22:16:10 tx: commit
22:16:10 disconnect(Oliver@lisk_test)
22:16:10 tx(Tick): begin
22:16:10 disconnect(Oliver@lisk_test)
22:16:10 tx(Tick): update "mem_accounts" set "blockId" = '1309855340732829944', "producedblocks" = "producedblocks" + 1 where "address" = '5380829552614149409L';
22:16:10 tx(Tick): commit
22:16:10 disconnect(Oliver@lisk_test)
22:16:10 connect(Oliver@lisk_test)
Our parent (Crypti) only had limited support for db transactions, no db connection pooling or sharing, and no query chaining, therefore what you see here are the results of the first iteration of the work done thus far. If we look at it optimistically, the situation is only bad from point of view of no shared database transaction between these discrete actions. This issue represents the next step in adhering to the best practices more completely.
@karmacoma https://github.com/LiskHQ/lisk/issues/302#issuecomment-353664288 Yes that can be achieved. But that tx object needed to pass to every level module->logic->db. I am working on it to see how to refactor it more elegantly.
Most helpful comment
@karmacoma https://github.com/LiskHQ/lisk/issues/302#issuecomment-353664288 Yes that can be achieved. But that
txobject needed to pass to every levelmodule->logic->db. I am working on it to see how to refactor it more elegantly.