Lisk-sdk: Multiple rows for one transaction ID in transaction asset tables in snapshots

Created on 24 May 2018 · 15Comments · Source: LiskHQ/lisk-sdk

Expected behavior

Since the table votes contains asset data for transactions, every transaction ID should be contained exactly once for every type 3 transaction
A UNIQUE constraint on the transactionId column should enforce this on a database level and help Postgres to optimize things

Actual behavior

In the Testnet snapshots of gr33ndragon, Bioly and Lisk HQ, there are multiple rows for one transaction ID in the votes table.

These are the > 3000 transaction IDs that are contained more than once in lisk_test_backup-5278664.gz from Lisk HQ: https://gist.github.com/webmaster128/2710ae3d3ffeb774e079c5231b451d7c

Steps to reproduce

Run SELECT "transactionId" FROM votes GROUP BY "transactionId" HAVING count("transactionId") > 1

Which version(s) does this affect? (Environment, OS, etc...)

0.9.x
Testnet (Mainnet not tested yet)

Even if this does no damage at the moment (I don't know), it's a matter of luck. As soon as you start JOINing votes to transactions, you run into big trouble.

framewornode

Source

webmaster128

Most helpful comment

Based on earlier discussion we concluded that the duplicate rows for transactions sub-tables were generated by some bug in past while there was no atomic block write, and carried forward in all snapshots.

Since after the atomic block write its not possible to have such scenario, where block processing failed, and re-processing same block caused duplicate rows. So we can just clean the db once manually before taking snapshot and will be good to go.

As @webmaster128 suggested we can add UNIQUE constraint to all transactionId columns in all transaction sub-tables e.g. transfer, vote etc just to have an additional safety check in place.

nazarhussain on 23 Aug 2018

👍3

All 15 comments

@webmaster128 Thank you for report. Issue is not present on mainnet (at least on my nodes). It can be related to inert exceptions that we have on testnet or broken snapshots.

4miners on 24 May 2018

👍2

@4miners my wild guess is that those extra rows come from and old bug or a situation where two node processes wrote to the database and are now copied from snapshot to snapshot building on top of each other.

I do not see the issue on my Testnet node syced from -0 for now (current height is 1578655).

webmaster128 on 24 May 2018

👍1

If you do not want to change the database schema, an alternative to solve this is to have this check for asset data of every transaction type.

webmaster128 on 28 May 2018

Confirming that those extra rows do not exist in snapshots from Testnet nodes that are synced from 0.

webmaster128 on 3 Jun 2018

👍2

For the sake of demonstration, the gr33ndra0n snapshot server was resynced from 0 and he provides public snapshots, that do not contain the extra data: https://testnet-snapshot.lisknode.io/

This leaves the bug untouched. Core should not accept snapshots with more than one asset row per transaction ID.

webmaster128 on 7 Jun 2018

👍1

@webmaster128 As far I debug this issue, I don't see any duplicate votes trs during block processing. So probably these duplicates appears back in some days, and carried forward with snapshot. For me its just a data cleansing and provide cleaned snapshot. I don't see any relevant code change in core.

Whereas it concerns validating the snapshot, I feel that should be done outside the core itself. As validating data integrity is an extensive task and should be performed outside the core. Core will work on provided data snapshot. Its user responsibility to provide a valid snapshot from a reliable source.

nazarhussain on 7 Jun 2018

@nazarhussain I agree with separation of concerns. Turns out someone created such a snapshot validation tool, which already covers this specific case ;)

However, if that is the way to go, this check is obsolete: https://github.com/LiskHQ/lisk/blob/d82e4c72d8bee37a1a6906ef7625823043ed54b7/modules/loader.js#L464

webmaster128 on 7 Jun 2018

If we have a seperate snapshot validator (an extensive one) we should update our code base to remove such checks.

@webmaster128 Added reference to this discussion on appropriate issue. https://github.com/LiskHQ/lisk-scripts/issues/80#issuecomment-395401786

nazarhussain on 7 Jun 2018

Why was an issue that basically requites to alter a database column to UNIQUE tagged as "hard"?

webmaster128 on 28 Jul 2018

There was another incident of inconsistent data in snapshots published by the community during the rc1->rc2 upgrade: non-unique transaction IDs in the transfer table.

See

[...]
Transaction 11725459368253410434 not unique in table transfer
Transaction 4301179488278456607 not unique in table transfer
Transaction 1120912335723190249 not unique in table transfer
Transaction 14656109628324352111 not unique in table transfer
Transaction 15008176273496315022 not unique in table transfer
Transaction 2941776323220027965 not unique in table transfer
[...]

https://travis-ci.com/prolina-foundation/snapshot-validator/jobs/139477183

webmaster128 on 15 Aug 2018

As @webmaster128 suggested we can add UNIQUE constraint to all transactionId columns in all transaction sub-tables e.g. transfer, vote etc just to have an additional safety check in place.

nazarhussain on 23 Aug 2018

👍3

Based on earlier discussion we concluded that the duplicate rows for transactions sub-tables were generated by some bug in past while there was no atomic block write, and carried forward in all snapshots.
Since after the atomic block write its not possible to have such scenario, where block processing failed, and re-processing same block caused duplicate rows.

I don't know what caused the invalid data I described in https://github.com/LiskHQ/lisk/issues/2054#issuecomment-413338673 but it cannot be "some old bug" as the snapshot provider published clean 1.x Testnet snapshots for weeks before the incident happened.

But a UNIQUE constraint will probably tell us which code is responsible for such problems.

webmaster128 on 23 Aug 2018

As part of this investigating this issue, I checked if the duplicate entries were produced in some specific timeframe/order. So, I ran the query on testnet data:

SELECT "transactionId",  count("transactionId"), blocks.height 
FROM votes 
LEFT JOIN trs ON (votes."transactionId" = trs.id) 
LEFT JOIN blocks ON (trs."blockId" = blocks.id)
GROUP BY "transactionId", blocks.height 
ORDER BY blocks.height;

Which revealed that all the vote transactions had a duplicate entry in votes table until the height 1543638. After this height, there were no duplicate transactions. Which points in the direction that there was some bug in the snapshotting (or applying block) process which no longer exists. I'll further investigate this issue to find out the actual cause of the problem.

SargeKhan on 28 Aug 2018

The same problem of duplicate records exists in the multisignatures table as well. The result of the following query shows that all multisignature transactions until height 1417013 have duplicate entries.

SELECT "transactionId",  count("transactionId"), blocks.height 
FROM multisignatures 
LEFT JOIN trs ON (multisignatures."transactionId" = trs.id) 
LEFT JOIN blocks ON (trs."blockId" = blocks.id)
GROUP BY "transactionId", blocks.height 
ORDER BY blocks.height;

Also, the dapp table entries are also duplicated. Running the following queries yields that transactions until height 781775 are duplicated.

SELECT "transactionId",  count("transactionId"), blocks.height 
FROM dapps 
LEFT JOIN trs ON (dapps."transactionId" = trs.id) 
LEFT JOIN blocks ON (trs."blockId" = blocks.id)
GROUP BY "transactionId", blocks.height 
ORDER BY blocks.height;

The issue does not exist for delegates and signatures tables. It's because "transactionId" is a primary key on both of these tables.

SargeKhan on 28 Aug 2018

Updated title of the issue so it correctly describes the real general issue we found after doing the research.

SargeKhan on 5 Sep 2018

👍1

Was this page helpful?

0 / 5 - 0 ratings