Parity-ethereum: Potential Database Corruption during sync

Created on 13 Oct 2016 · 11Comments · Source: openethereum/parity-ethereum

2016-10-12 23:29:58  Syncing #2422970 b8be…d6b2      1 blk/s    6 tx/s   0 Mgas/s       0+ 7245 Qed   #2430219    1/46/50 peers      2 GiB db    7 MiB chain   40 MiB queue   11 MiB sync
2016-10-12 23:30:02  Block import failed for #2422985 (843d…5b07)
Error: Trie(IncompleteDatabase(11b9caba988cd1aeefcc20ca0595f051064c70e7149a5a0670366c322268c310))
2016-10-12 23:30:04  Bad header 2423110 (b8e4…a224) from 27: 27, state = ChainHead
2016-10-12 23:30:04  Bad header 2423110 (b8e4…a224) from 83: 83, state = ChainHead
2016-10-12 23:30:04  Bad header 2423110 (b8e4…a224) from 41: 41, state = ChainHead
2016-10-12 23:30:04  Bad header 2423110 (b8e4…a224) from 47: 47, state = ChainHead
2016-10-12 23:30:04  Bad header 2423110 (b8e4…a224) from 69: 69, state = ChainHead
2016-10-12 23:30:04  Bad header 2423110 (b8e4…a224) from 61: 61, state = ChainHead
2016-10-12 23:30:04  Bad header 2423110 (b8e4…a224) from 72: 72, state = ChainHead
2016-10-12 23:30:04  Bad header 2423110 (b8e4…a224) from 5: 5, state = ChainHead
2016-10-12 23:30:04  Bad header 2423110 (b8e4…a224) from 48: 48, state = ChainHead
2016-10-12 23:30:06  Bad header 2423110 (b8e4…a224) from 37: 37, state = ChainHead
2016-10-12 23:30:07  Bad header 2423110 (b8e4…a224) from 2: 2, state = ChainHead
2016-10-12 23:30:08  Bad header 2423110 (b8e4…a224) from 57: 57, state = ChainHead
2016-10-12 23:30:08  Syncing #2422984 969c…b7f0      1 blk/s   15 tx/s   0 Mgas/s       0+    0 Qed   #2422983    3/39/50 peers      2 GiB db    8 MiB chain    2 KiB queue   11 MiB sync
2016-10-12 23:30:11  Bad header 2423110 (b8e4…a224) from 23: 23, state = ChainHead
thread 'IO Worker #1' panicked at 'Potential DB corruption encountered: Database missing expected key: 1e34…d51d', ethcore/src/state/mod.rs:645
...
error: Process didn't exit successfully: `target/release/parity` (signal: 11, SIGSEGV: invalid memory reference)

Enough disk space (20GB)
4GB RAM node

Running latest master via:
$ cargo run --release --no-default-features --bin parity -- --relay-set strict --force-sealing

F2-bug 🐞 M4-core ⛓

Source

tomusdrw

Most helpful comment

This should be fixed in 1.3.9. Please let us know if you see it again.

arkpar on 21 Oct 2016

👍2

All 11 comments

That's probably a rocksdb OOM issue, judging by the sigsegv.

rphmeier on 13 Oct 2016

Could not reproduce on my local VM (ubuntu 14.04)
Reproduced on the DO 4GB machine (ubuntu 15) though.

arkpar on 14 Oct 2016

Adding some more info to this based on the suggestion from @keorn

This doesn't seem to have to do _specifically_ with many days of runtime as even after restarting parity, or attempting to sync a new copy of the chain from the network, the same issue is encountered. So even a brand new machine, running the latest version of parity, will be unable to sync to either network. Even using the newer parity restore <snapshot> does not work (my earlier comment was in error). The only thing that has worked is fully downloading another user's blockchain.

While this seems to be due to a heavy set of blocks to process (around 2,420,000), possibly related to the recent exploit, it's important to note that this even failed to freshly sync from the network on a VPS with 16GB of RAM and 8 CPUs (Digital Ocean $160 droplet option). As such, even for more than capable machines this is a DoS for new nodes attempting to enter the network. And hints that the issue may not exactly be tied to the intense computation required for the exploit blocks.

Also worth noting is that the panic/crash is immediate. So if I start parity to sync a fresh chain, let it crash at the problem block hours later, and then start it again, it will crash within about a second.

My output in particular differs a bit from the original commenter's so I've included it below:

thread 'IO Worker #2' panicked at 'Potential DB corruption encountered: Database missing expected key: 1348…1230', ethcore/src/state.rs:629
stack backtrace:
   1:     0x7f3f8de417b9 - <unknown>
   2:     0x7f3f8de4948c - <unknown>
   3:     0x7f3f8de48359 - <unknown>
   4:     0x7f3f8de48a48 - <unknown>
   5:     0x7f3f8de488a2 - <unknown>
   6:     0x7f3f8de48810 - <unknown>
   7:     0x7f3f8da7f5da - <unknown>
   8:     0x7f3f8da01a4f - <unknown>
   9:     0x7f3f8d9c3e50 - <unknown>
  10:     0x7f3f8da37461 - <unknown>
  11:     0x7f3f8da39837 - <unknown>
  12:     0x7f3f8d9ef69a - <unknown>
  13:     0x7f3f8d8ecab5 - <unknown>
  14:     0x7f3f8de50f76 - <unknown>
  15:     0x7f3f8d94da3e - <unknown>
  16:     0x7f3f8de46ff2 - <unknown>
  17:     0x7f3f8c5830a3 - start_thread
  18:     0x7f3f8cf9387c - clone
  19:                0x0 - <unknown>
2016-10-14 13:07:16  Finishing work, please wait...

I have a working copy of the blockchain here (courtesy of another user) if it can be of any use debugging: full parity copy

This copy includes the problem blocks but parity doesn't need to process them so the remainder of blocks sync as normal.

pyskell on 18 Oct 2016

I am also affected by this issue as soon as i run the executable.
Running the implementation on ubuntu 16.04.1 LTS

Stage 3 block verification failed for #2422712 (a1b3…1ce4)
Error: Block(UnknownParent(1ec2be8ab88022c770b1e76ba0147c6e16e28d88e274947f038fdc1b54552f81))

Is there a workaround for this issue? or ETA for a fix? THANKS.

inmathwetrust on 19 Oct 2016

@inmathwetrust

You can download my copy at the "full parity node" link and copy the DB to your .parity folder.

Just two things to keep in mind:

This is for the ETC network, not Ethereum
Make sure you don't overwrite any keys you might have stored in your .parity folder

pyskell on 19 Oct 2016

This should be fixed in 1.3.9. Please let us know if you see it again.

arkpar on 21 Oct 2016

👍2

Hi.

I was using Parity 1.3.9... Everything was going well but syncing too slow, until such time it encountered this issue and won't let me sync on this block #2451318. Everytime I will restart the Parity, it will always crashed... This is the first time I have encountered such issue from when I started using 1.3.0 all the way to 1.3.9.

Please let me know what should I do... I am now behind syncing to the latest block because of slow syncing recently...

2016-10-23 19:14:29  Starting Parity/v1.3.9-beta-e9987c4-20161021/x86_64-windows-msvc/rustc1.12.0
2016-10-23 19:14:29  Using state DB journalling strategy fast
2016-10-23 19:14:29  Configured for Frontier/Homestead using Ethash engine
2016-10-23 19:14:42  NAT mapped to external address 112.201.176.90:58848
2016-10-23 19:14:42  Public node URL: enode://fd8891a24d019c70283d26f53ada8ae04309f42c1478777a733d5061428216f788ed2783297da0328127445f2dd308c1122e307fae67e1613241c707eff8e172@112.201.176.90:58848+60778
2016-10-23 19:14:50  Syncing #2451318 dd33…ffe9      0 blk/s    0 tx/s   0 Mgas/s       0+    0 Qed   #2451318    5/ 5/25 peers     18 MiB db    8 KiB chain  0 bytes queue   11 KiB sync
2016-10-23 19:15:04  Syncing #2451318 dd33…ffe9      0 blk/s    0 tx/s   0 Mgas/s       0+    0 Qed   #2451318    1/ 3/25 peers     18 MiB db    8 KiB chain  0 bytes queue   19 KiB sync
2016-10-23 19:15:04  Syncing #2451318 dd33…ffe9      0 blk/s    0 tx/s   0 Mgas/s       0+    0 Qed   #2451318    1/ 3/25 peers     18 MiB db    8 KiB chain  0 bytes queue   19 KiB sync
2016-10-23 19:15:04  Syncing #2451318 dd33…ffe9      0 blk/s    0 tx/s   0 Mgas/s       0+    0 Qed   #2451318    1/ 3/25 peers     18 MiB db    8 KiB chain  0 bytes queue   19 KiB sync
2016-10-23 19:15:12  Syncing #2451318 dd33…ffe9      0 blk/s    0 tx/s   0 Mgas/s       0+    0 Qed   #2451318    4/ 5/25 peers     18 MiB db    8 KiB chain  0 bytes queue  130 KiB sync
thread 'Verifier #0' panicked at 'DB flush failed.: "Corruption: block checksum mismatch"', ../src/libcore\result.rs:788
stack backtrace:
   0:     0x7ff67bb1346e - <unknown>
   1:     0x7ff67bb11363 - <unknown>
   2:     0x7ff67bb11e2d - <unknown>
   3:     0x7ff67bb11c76 - <unknown>
   4:     0x7ff67bb11bd4 - <unknown>
   5:     0x7ff67bb11b6b - <unknown>
   6:     0x7ff67bb1edb5 - <unknown>
   7:     0x7ff67ba2419a - <unknown>
   8:     0x7ff67b768069 - <unknown>
   9:     0x7ff67b5c037f - <unknown>
  10:     0x7ff67b62039a - <unknown>
  11:     0x7ff67bb15631 - <unknown>
  12:     0x7ff67b6818cb - <unknown>
  13:     0x7ff67bb0f15e - <unknown>
  14:     0x7ffd1dc48363 - BaseThreadInitThunk
2016-10-23 19:15:19  Finishing work, please wait...
thread 'Verifier #1' panicked at 'DB flush failed.: "Corruption: block checksum mismatch"', ../src/libcore\result.rs:788
stack backtrace:
   0:     0x7ff67bb1346e - <unknown>
   1:     0x7ff67bb11363 - <unknown>
   2:     0x7ff67bb11e2d - <unknown>
   3:     0x7ff67bb11c76 - <unknown>
   4:     0x7ff67bb11bd4 - <unknown>
   5:     0x7ff67bb11b6b - <unknown>
   6:     0x7ff67bb1edb5 - <unknown>
   7:     0x7ff67ba2419a - <unknown>
   8:     0x7ff67b768069 - <unknown>
   9:     0x7ff67b5c037f - <unknown>
  10:     0x7ff67b62039a - <unknown>
  11:     0x7ff67bb15631 - <unknown>
  12:     0x7ff67b6818cb - <unknown>
  13:     0x7ff67bb0f15e - <unknown>
  14:     0x7ffd1dc48363 - BaseThreadInitThunk

kenzaka07 on 23 Oct 2016

this is fixed in master #2832 and will be fixed in the 1.3.10 stable. please test when those are release and reopen if the issue reappears.

gavofyork on 27 Oct 2016

Some user reported this issue with the latest beta 1.6.8 - is this the very same issue?

5chdn on 10 Jul 2017

@5chdn probably not. Was there a out of memory or out of disk error on prior run?

arkpar on 10 Jul 2017

@arkpar can't tell, I was guiding him how to access the node logs and this is the first time he looked at it. We now reset the db and it works.

5chdn on 11 Jul 2017

Was this page helpful?

0 / 5 - 0 ratings

Related issues

How to Uninstall on Mac

bryaan · 3Comments

Provide way to submit correction tx at same nonce

danfinlay · 3Comments

EvalError: Refused to evaluate a string as JavaScript because 'unsafe-eval' is not an allowed

Michael2008S · 3Comments

The old Parity documentation is not available

mr-older · 3Comments

High CPU usage when syncing

gaoxiangxyz · 3Comments