Parity-ethereum: Potential Database Corruption during sync

Created on 13 Oct 2016  ·  11Comments  ·  Source: openethereum/parity-ethereum

2016-10-12 23:29:58  Syncing #2422970 b8be…d6b2      1 blk/s    6 tx/s   0 Mgas/s       0+ 7245 Qed   #2430219    1/46/50 peers      2 GiB db    7 MiB chain   40 MiB queue   11 MiB sync
2016-10-12 23:30:02  Block import failed for #2422985 (843d…5b07)
Error: Trie(IncompleteDatabase(11b9caba988cd1aeefcc20ca0595f051064c70e7149a5a0670366c322268c310))
2016-10-12 23:30:04  Bad header 2423110 (b8e4…a224) from 27: 27, state = ChainHead
2016-10-12 23:30:04  Bad header 2423110 (b8e4…a224) from 83: 83, state = ChainHead
2016-10-12 23:30:04  Bad header 2423110 (b8e4…a224) from 41: 41, state = ChainHead
2016-10-12 23:30:04  Bad header 2423110 (b8e4…a224) from 47: 47, state = ChainHead
2016-10-12 23:30:04  Bad header 2423110 (b8e4…a224) from 69: 69, state = ChainHead
2016-10-12 23:30:04  Bad header 2423110 (b8e4…a224) from 61: 61, state = ChainHead
2016-10-12 23:30:04  Bad header 2423110 (b8e4…a224) from 72: 72, state = ChainHead
2016-10-12 23:30:04  Bad header 2423110 (b8e4…a224) from 5: 5, state = ChainHead
2016-10-12 23:30:04  Bad header 2423110 (b8e4…a224) from 48: 48, state = ChainHead
2016-10-12 23:30:06  Bad header 2423110 (b8e4…a224) from 37: 37, state = ChainHead
2016-10-12 23:30:07  Bad header 2423110 (b8e4…a224) from 2: 2, state = ChainHead
2016-10-12 23:30:08  Bad header 2423110 (b8e4…a224) from 57: 57, state = ChainHead
2016-10-12 23:30:08  Syncing #2422984 969c…b7f0      1 blk/s   15 tx/s   0 Mgas/s       0+    0 Qed   #2422983    3/39/50 peers      2 GiB db    8 MiB chain    2 KiB queue   11 MiB sync
2016-10-12 23:30:11  Bad header 2423110 (b8e4…a224) from 23: 23, state = ChainHead
thread 'IO Worker #1' panicked at 'Potential DB corruption encountered: Database missing expected key: 1e34…d51d', ethcore/src/state/mod.rs:645
...
error: Process didn't exit successfully: `target/release/parity` (signal: 11, SIGSEGV: invalid memory reference)

Enough disk space (20GB)
4GB RAM node

Running latest master via:
$ cargo run --release --no-default-features --bin parity -- --relay-set strict --force-sealing

F2-bug 🐞 M4-core ⛓

Most helpful comment

This should be fixed in 1.3.9. Please let us know if you see it again.

All 11 comments

That's probably a rocksdb OOM issue, judging by the sigsegv.

Could not reproduce on my local VM (ubuntu 14.04)
Reproduced on the DO 4GB machine (ubuntu 15) though.

Adding some more info to this based on the suggestion from @keorn

This doesn't seem to have to do _specifically_ with many days of runtime as even after restarting parity, or attempting to sync a new copy of the chain from the network, the same issue is encountered. So even a brand new machine, running the latest version of parity, will be unable to sync to either network. Even using the newer parity restore <snapshot> does not work (my earlier comment was in error). The only thing that has worked is fully downloading another user's blockchain.

While this seems to be due to a heavy set of blocks to process (around 2,420,000), possibly related to the recent exploit, it's important to note that this even failed to freshly sync from the network on a VPS with 16GB of RAM and 8 CPUs (Digital Ocean $160 droplet option). As such, even for more than capable machines this is a DoS for new nodes attempting to enter the network. And hints that the issue may not exactly be tied to the intense computation required for the exploit blocks.

Also worth noting is that the panic/crash is immediate. So if I start parity to sync a fresh chain, let it crash at the problem block hours later, and then start it again, it will crash within about a second.

My output in particular differs a bit from the original commenter's so I've included it below:

thread 'IO Worker #2' panicked at 'Potential DB corruption encountered: Database missing expected key: 1348…1230', ethcore/src/state.rs:629
stack backtrace:
   1:     0x7f3f8de417b9 - <unknown>
   2:     0x7f3f8de4948c - <unknown>
   3:     0x7f3f8de48359 - <unknown>
   4:     0x7f3f8de48a48 - <unknown>
   5:     0x7f3f8de488a2 - <unknown>
   6:     0x7f3f8de48810 - <unknown>
   7:     0x7f3f8da7f5da - <unknown>
   8:     0x7f3f8da01a4f - <unknown>
   9:     0x7f3f8d9c3e50 - <unknown>
  10:     0x7f3f8da37461 - <unknown>
  11:     0x7f3f8da39837 - <unknown>
  12:     0x7f3f8d9ef69a - <unknown>
  13:     0x7f3f8d8ecab5 - <unknown>
  14:     0x7f3f8de50f76 - <unknown>
  15:     0x7f3f8d94da3e - <unknown>
  16:     0x7f3f8de46ff2 - <unknown>
  17:     0x7f3f8c5830a3 - start_thread
  18:     0x7f3f8cf9387c - clone
  19:                0x0 - <unknown>
2016-10-14 13:07:16  Finishing work, please wait...

I have a working copy of the blockchain here (courtesy of another user) if it can be of any use debugging: full parity copy

This copy includes the problem blocks but parity doesn't need to process them so the remainder of blocks sync as normal.

I am also affected by this issue as soon as i run the executable.
Running the implementation on ubuntu 16.04.1 LTS

Stage 3 block verification failed for #2422712 (a1b3…1ce4)
Error: Block(UnknownParent(1ec2be8ab88022c770b1e76ba0147c6e16e28d88e274947f038fdc1b54552f81))

Is there a workaround for this issue? or ETA for a fix? THANKS.

@inmathwetrust

You can download my copy at the "full parity node" link and copy the DB to your .parity folder.

Just two things to keep in mind:

  • This is for the ETC network, not Ethereum
  • Make sure you don't overwrite any keys you might have stored in your .parity folder

This should be fixed in 1.3.9. Please let us know if you see it again.

Hi.

I was using Parity 1.3.9... Everything was going well but syncing too slow, until such time it encountered this issue and won't let me sync on this block #2451318. Everytime I will restart the Parity, it will always crashed... This is the first time I have encountered such issue from when I started using 1.3.0 all the way to 1.3.9.

Please let me know what should I do... I am now behind syncing to the latest block because of slow syncing recently...

2016-10-23 19:14:29  Starting Parity/v1.3.9-beta-e9987c4-20161021/x86_64-windows-msvc/rustc1.12.0
2016-10-23 19:14:29  Using state DB journalling strategy fast
2016-10-23 19:14:29  Configured for Frontier/Homestead using Ethash engine
2016-10-23 19:14:42  NAT mapped to external address 112.201.176.90:58848
2016-10-23 19:14:42  Public node URL: enode://fd8891a24d019c70283d26f53ada8ae04309f42c1478777a733d5061428216f788ed2783297da0328127445f2dd308c1122e307fae67e1613241c707eff8e172@112.201.176.90:58848+60778
2016-10-23 19:14:50  Syncing #2451318 dd33…ffe9      0 blk/s    0 tx/s   0 Mgas/s       0+    0 Qed   #2451318    5/ 5/25 peers     18 MiB db    8 KiB chain  0 bytes queue   11 KiB sync
2016-10-23 19:15:04  Syncing #2451318 dd33…ffe9      0 blk/s    0 tx/s   0 Mgas/s       0+    0 Qed   #2451318    1/ 3/25 peers     18 MiB db    8 KiB chain  0 bytes queue   19 KiB sync
2016-10-23 19:15:04  Syncing #2451318 dd33…ffe9      0 blk/s    0 tx/s   0 Mgas/s       0+    0 Qed   #2451318    1/ 3/25 peers     18 MiB db    8 KiB chain  0 bytes queue   19 KiB sync
2016-10-23 19:15:04  Syncing #2451318 dd33…ffe9      0 blk/s    0 tx/s   0 Mgas/s       0+    0 Qed   #2451318    1/ 3/25 peers     18 MiB db    8 KiB chain  0 bytes queue   19 KiB sync
2016-10-23 19:15:12  Syncing #2451318 dd33…ffe9      0 blk/s    0 tx/s   0 Mgas/s       0+    0 Qed   #2451318    4/ 5/25 peers     18 MiB db    8 KiB chain  0 bytes queue  130 KiB sync
thread 'Verifier #0' panicked at 'DB flush failed.: "Corruption: block checksum mismatch"', ../src/libcore\result.rs:788
stack backtrace:
   0:     0x7ff67bb1346e - <unknown>
   1:     0x7ff67bb11363 - <unknown>
   2:     0x7ff67bb11e2d - <unknown>
   3:     0x7ff67bb11c76 - <unknown>
   4:     0x7ff67bb11bd4 - <unknown>
   5:     0x7ff67bb11b6b - <unknown>
   6:     0x7ff67bb1edb5 - <unknown>
   7:     0x7ff67ba2419a - <unknown>
   8:     0x7ff67b768069 - <unknown>
   9:     0x7ff67b5c037f - <unknown>
  10:     0x7ff67b62039a - <unknown>
  11:     0x7ff67bb15631 - <unknown>
  12:     0x7ff67b6818cb - <unknown>
  13:     0x7ff67bb0f15e - <unknown>
  14:     0x7ffd1dc48363 - BaseThreadInitThunk
2016-10-23 19:15:19  Finishing work, please wait...
thread 'Verifier #1' panicked at 'DB flush failed.: "Corruption: block checksum mismatch"', ../src/libcore\result.rs:788
stack backtrace:
   0:     0x7ff67bb1346e - <unknown>
   1:     0x7ff67bb11363 - <unknown>
   2:     0x7ff67bb11e2d - <unknown>
   3:     0x7ff67bb11c76 - <unknown>
   4:     0x7ff67bb11bd4 - <unknown>
   5:     0x7ff67bb11b6b - <unknown>
   6:     0x7ff67bb1edb5 - <unknown>
   7:     0x7ff67ba2419a - <unknown>
   8:     0x7ff67b768069 - <unknown>
   9:     0x7ff67b5c037f - <unknown>
  10:     0x7ff67b62039a - <unknown>
  11:     0x7ff67bb15631 - <unknown>
  12:     0x7ff67b6818cb - <unknown>
  13:     0x7ff67bb0f15e - <unknown>
  14:     0x7ffd1dc48363 - BaseThreadInitThunk

this is fixed in master #2832 and will be fixed in the 1.3.10 stable. please test when those are release and reopen if the issue reappears.

Some user reported this issue with the latest beta 1.6.8 - is this the very same issue?

image001

@5chdn probably not. Was there a out of memory or out of disk error on prior run?

@arkpar can't tell, I was guiding him how to access the node logs and this is the first time he looked at it. We now reset the db and it works.

Was this page helpful?
0 / 5 - 0 ratings