votes contains asset data for transactions, every transaction ID should be contained exactly once for every type 3 transactiontransactionId column should enforce this on a database level and help Postgres to optimize thingsIn the Testnet snapshots of gr33ndragon, Bioly and Lisk HQ, there are multiple rows for one transaction ID in the votes table.
These are the > 3000 transaction IDs that are contained more than once in lisk_test_backup-5278664.gz from Lisk HQ: https://gist.github.com/webmaster128/2710ae3d3ffeb774e079c5231b451d7c
Run SELECT "transactionId" FROM votes GROUP BY "transactionId" HAVING count("transactionId") > 1
Even if this does no damage at the moment (I don't know), it's a matter of luck. As soon as you start JOINing votes to transactions, you run into big trouble.
@webmaster128 Thank you for report. Issue is not present on mainnet (at least on my nodes). It can be related to inert exceptions that we have on testnet or broken snapshots.
@4miners my wild guess is that those extra rows come from and old bug or a situation where two node processes wrote to the database and are now copied from snapshot to snapshot building on top of each other.
I do not see the issue on my Testnet node syced from -0 for now (current height is 1578655).
If you do not want to change the database schema, an alternative to solve this is to have this check for asset data of every transaction type.
Confirming that those extra rows do not exist in snapshots from Testnet nodes that are synced from 0.
For the sake of demonstration, the gr33ndra0n snapshot server was resynced from 0 and he provides public snapshots, that do not contain the extra data: https://testnet-snapshot.lisknode.io/
This leaves the bug untouched. Core should not accept snapshots with more than one asset row per transaction ID.
@webmaster128 As far I debug this issue, I don't see any duplicate votes trs during block processing. So probably these duplicates appears back in some days, and carried forward with snapshot. For me its just a data cleansing and provide cleaned snapshot. I don't see any relevant code change in core.
Whereas it concerns validating the snapshot, I feel that should be done outside the core itself. As validating data integrity is an extensive task and should be performed outside the core. Core will work on provided data snapshot. Its user responsibility to provide a valid snapshot from a reliable source.
@nazarhussain I agree with separation of concerns. Turns out someone created such a snapshot validation tool, which already covers this specific case ;)
However, if that is the way to go, this check is obsolete: https://github.com/LiskHQ/lisk/blob/d82e4c72d8bee37a1a6906ef7625823043ed54b7/modules/loader.js#L464
If we have a seperate snapshot validator (an extensive one) we should update our code base to remove such checks.
@webmaster128 Added reference to this discussion on appropriate issue. https://github.com/LiskHQ/lisk-scripts/issues/80#issuecomment-395401786
Why was an issue that basically requites to alter a database column to UNIQUE tagged as "hard"?
There was another incident of inconsistent data in snapshots published by the community during the rc1->rc2 upgrade: non-unique transaction IDs in the transfer table.
See
[...]
Transaction 11725459368253410434 not unique in table transfer
Transaction 4301179488278456607 not unique in table transfer
Transaction 1120912335723190249 not unique in table transfer
Transaction 14656109628324352111 not unique in table transfer
Transaction 15008176273496315022 not unique in table transfer
Transaction 2941776323220027965 not unique in table transfer
[...]
https://travis-ci.com/prolina-foundation/snapshot-validator/jobs/139477183
Based on earlier discussion we concluded that the duplicate rows for transactions sub-tables were generated by some bug in past while there was no atomic block write, and carried forward in all snapshots.
Since after the atomic block write its not possible to have such scenario, where block processing failed, and re-processing same block caused duplicate rows. So we can just clean the db once manually before taking snapshot and will be good to go.
As @webmaster128 suggested we can add UNIQUE constraint to all transactionId columns in all transaction sub-tables e.g. transfer, vote etc just to have an additional safety check in place.
Based on earlier discussion we concluded that the duplicate rows for transactions sub-tables were generated by some bug in past while there was no atomic block write, and carried forward in all snapshots.
Since after the atomic block write its not possible to have such scenario, where block processing failed, and re-processing same block caused duplicate rows.
I don't know what caused the invalid data I described in https://github.com/LiskHQ/lisk/issues/2054#issuecomment-413338673 but it cannot be "some old bug" as the snapshot provider published clean 1.x Testnet snapshots for weeks before the incident happened.
But a UNIQUE constraint will probably tell us which code is responsible for such problems.
As part of this investigating this issue, I checked if the duplicate entries were produced in some specific timeframe/order. So, I ran the query on testnet data:
SELECT "transactionId", count("transactionId"), blocks.height
FROM votes
LEFT JOIN trs ON (votes."transactionId" = trs.id)
LEFT JOIN blocks ON (trs."blockId" = blocks.id)
GROUP BY "transactionId", blocks.height
ORDER BY blocks.height;
Which revealed that all the vote transactions had a duplicate entry in votes table until the height 1543638. After this height, there were no duplicate transactions. Which points in the direction that there was some bug in the snapshotting (or applying block) process which no longer exists. I'll further investigate this issue to find out the actual cause of the problem.
The same problem of duplicate records exists in the multisignatures table as well. The result of the following query shows that all multisignature transactions until height 1417013 have duplicate entries.
SELECT "transactionId", count("transactionId"), blocks.height
FROM multisignatures
LEFT JOIN trs ON (multisignatures."transactionId" = trs.id)
LEFT JOIN blocks ON (trs."blockId" = blocks.id)
GROUP BY "transactionId", blocks.height
ORDER BY blocks.height;
Also, the dapp table entries are also duplicated. Running the following queries yields that transactions until height 781775 are duplicated.
SELECT "transactionId", count("transactionId"), blocks.height
FROM dapps
LEFT JOIN trs ON (dapps."transactionId" = trs.id)
LEFT JOIN blocks ON (trs."blockId" = blocks.id)
GROUP BY "transactionId", blocks.height
ORDER BY blocks.height;
The issue does not exist for delegates and signatures tables. It's because "transactionId" is a primary key on both of these tables.
Updated title of the issue so it correctly describes the real general issue we found after doing the research.
Most helpful comment
Based on earlier discussion we concluded that the duplicate rows for transactions sub-tables were generated by some bug in past while there was no atomic block write, and carried forward in all snapshots.
Since after the atomic block write its not possible to have such scenario, where block processing failed, and re-processing same block caused duplicate rows. So we can just clean the db once manually before taking snapshot and will be good to go.
As @webmaster128 suggested we can add
UNIQUEconstraint to alltransactionIdcolumns in all transaction sub-tables e.g. transfer, vote etc just to have an additional safety check in place.