Parity-ethereum: Corruption: block checksum mismatch / during sync.

Created on 31 Jan 2018 · 17Comments · Source: openethereum/parity-ethereum

I'm running:

Which Parity version?: 1.8.7 / 1.9.0

Which operating system?: Linux

How installed?: Installer

Are you fully synchronized?: no

Which network are you connected to?: ethereum

Did you try to restart the node?: yes

Starting from a clean slate latest stable and unstable versions 1.8.7 / 1.9.0 the following error is occuring at different block heights? Could this be faulty hardware?

```2018-01-31 00:37:01 Syncing #2430235 b741…ad5f 4 blk/s 24 tx/s 1 Mgas/s 0+ 6715 Qed #2436953 25/25 peers 5 MiB chain 281 MiB db 41 MiB queue 9 MiB sync RPC: 0 conn, 0 req/s, 0 µs
2018-01-31 00:37:09 DB corrupted: Corruption: block checksum mismatch: expected 253734433, got 2018439782 in /home/balaa/.local/share/io.parity.ethereum/chains/ethereum/db/906a34e69aec8c0d/overlayrecent/db/282633.sst offset 11034697 size 596098. Repair will be triggered on next restart

====================

stack backtrace:
0: 0x5617bafb2e0c -

Thread 'IO Worker #0' panicked at 'DB flush failed.: "Corruption: block checksum mismatch: expected 253734433, got 2018439782 in /home/balaa/.local/share/io.parity.ethereum/chains/ethereum/db/906a34e69aec8c0d/overlayrecent/db/282633.sst offset 11034697 size 596098"', /checkout/src/libcore/result.rs:906

This is a bug. Please report it at:

https://github.com/paritytech/parity/issues/new

Aborted (core dumped)```

M4-core ⛓ Z7-duplicate 🖨

Source

thebalaa

All 17 comments

cc @andresilva this happens _during_ sync

follow-up on #7334 cc @DeviateFish-2

also #7748

5chdn on 31 Jan 2018

Another sample for the pile (running v1.9.0):

...
2018-01-29 22:38:23  Syncing #1464212 2180…d05a   319 blk/s 1816 tx/s  57 Mgas/s    142+ 4947 Qed  #1469311   22/25 peers     77 MiB chain   54 MiB db   42 MiB queue    8 MiB sync  RPC:  0 conn,  0 req/s,   0 µs
2018-01-29 22:38:33  Syncing #1465618 ad8b…9110   140 blk/s 1268 tx/s  83 Mgas/s   1126+ 5452 Qed  #1472200   22/25 peers     74 MiB chain   54 MiB db   43 MiB queue    9 MiB sync  RPC:  0 conn,  0 req/s,   0 µs
2018-01-29 22:38:43  Syncing #1468641 5d43…260c   302 blk/s 1736 tx/s  59 Mgas/s      0+ 3749 Qed  #1472391   22/25 peers     53 MiB chain   54 MiB db   28 MiB queue   13 MiB sync  RPC:  0 conn,  0 req/s,   0 µs
2018-01-29 22:38:53  Syncing #1470915 640a…7b17   228 blk/s 1617 tx/s  45 Mgas/s    780+ 5759 Qed  #1477472   21/25 peers     73 MiB chain   54 MiB db   41 MiB queue    8 MiB sync  RPC:  0 conn,  0 req/s,   0 µs

====================

stack backtrace:
   0:     0x55b5b84ff95c - backtrace::backtrace::trace::h88dff4dc401d81d6
   1:     0x55b5b84ff992 - backtrace::capture::Backtrace::new::hc1bdbce336b16eca
   2:     0x55b5b799fb49 - panic_hook::panic_hook::ha4f6f84d07d9cbbd

Thread 'IO Worker #2' panicked at 'DB flush failed.: Error(Msg("Corruption: block checksum mismatch: expected 3482696050, got 3888739091  in parity/chains/ethereum/db/906a34e69aec8c0d/archive/db/011705.sst offset 35210665 size 16261"), State { next_error: None, backtrace: None })', /checkout/src/libcore/result.rs:906

This is a bug. Please report it at:

    https://github.com/paritytech/parity/issues/new


====================

stack backtrace:

$ parity
Loading config file from /etc/parity/config.toml
2018-01-31 20:32:10  Starting Parity/v1.9.0-unstable-53ec114-20180125/x86_64-linux-gnu/rustc1.23.0
2018-01-31 20:32:10  Keys path parity/keys/Foundation
2018-01-31 20:32:10  DB path parity/chains/ethereum/db/906a34e69aec8c0d
2018-01-31 20:32:10  Path to dapps parity/dapps
2018-01-31 20:32:10  State DB configuration: archive +Fat +Trace
2018-01-31 20:32:10  Operating mode: active
2018-01-31 20:32:10  Configured for Foundation using Ethash engine
2018-01-31 20:32:10  Updated conversion rate to Ξ1 = US$1131.33 (105228024 wei/gas)

====================

stack backtrace:
   0:     0x5594fc78295c - backtrace::backtrace::trace::h88dff4dc401d81d6
   1:     0x5594fc782992 - backtrace::capture::Backtrace::new::hc1bdbce336b16eca
   2:     0x5594fbc22b49 - panic_hook::panic_hook::ha4f6f84d07d9cbbd

Thread 'main' panicked at 'failed to update version: Error(Msg("Corruption: block checksum mismatch: expected 3482696050, got 3888739091  inparity/chains/ethereum/db/906a34e69aec8c0d/archive/db/011705.sst offset 35210665 size 16261"), State { next_error: None, backtrace: None })', /checkout/src/libcore/result.rs:906

This is a bug. Please report it at:

    https://github.com/paritytech/parity/issues/new

$

(for the record, the lack of a second stack trace for the initial crash is not a mistake, there was no stack trace produced)

Again, going to reiterate, this is happening during a sync (full archive sync in my case, as seen in the output when restarting), and without any input at all. This is not the result of shutting down parity while it is syncing (inadvertently or otherwise). The corruption is happening during the sync process, which is causing parity to exit.

I'm capturing a full log right now, and will update this comment with it when it crashes.

As an aside... why does parity log to stderr?

[Edit] Here's a full log:
sync005.log

DeviateFish-2 on 1 Feb 2018

It is possible that not closing RocksDB properly on shutdown could lead to some silent corruption, if there's no crash on shutdown you'll only see that corruption whenever RocksDB has to write to that block in the future (which might be the case here). This is my best explanation so far, we have a fix for RocksDB not being properly closed on shutdown which will be out in the next release and I'd like to see if these corruption issues disappear or reduce in frequency.

andresilva on 1 Feb 2018

Why would that be the case here? The corruption is what's causing the shutdown of parity in these cases, not the other way around. Look at the logs: I'm not stopping parity and then encountering corruption on restart. Parity is crashing due to corruption.

I've done what I can do rule out hardware issues, but of course cannot completely rule them out. However, this seems to be a relatively frequent occurrence--#7334 was originally opened as a report of this behavior, and many of the issues closed as duplicates of it are also instances of crashes during initial sync.

These aren't cases where someone or something is forcibly terminating parity, and thus causing corruption due to an unclean shutdown. These are cases where parity itself is crashing, presumably due to corruption.

DeviateFish-2 on 2 Feb 2018

Maybe I didn't explain myself properly. Every single shutdown of parity until #7695 was an unclean shutdown, regardless of whether you would see a crash or not. RocksDB would not be properly closed. The error you're seeing doesn't mean the database is being corrupted during sync, it means you're finding corrupted data during the sync, the corruption could have happened at any other time.

I'm not saying that there isn't any other cause for the corruption, but this is currently my best explanation since this was a violation of the RocksDB API (not closing the database properly), and assuming you use the RocksDB API properly it shouldn't lead to data corruption (short of hardware faults or RocksDB bugs). If you're willing to help please do a db kill and update to 1.9.1 and report back if you find this issue again.

andresilva on 2 Feb 2018

😕1 👎1 👍1

How would it have happened at "another time" if this is a fresh sync (e.g. empty parity data directory)?

You can look at the logs I've provided. Literally every one of these samples I've provided has been following a parity db kill + removing the cache and network folders.

I've said this in literally every report that this is a clean sync, from scratch, with no pre-existing data.

Please fucking read a little better.

After running a parity db kill (and removing the cache and network folders), I tried to sync again this morning:

I'm attempting to run a full archive sync from scratch, with transaction tracing enabled. Relevant section of config.toml that reflects the current setup:

DeviateFish-2 on 3 Feb 2018

@DeviateFish-2 Sorry, I wasn't aware of that, disregard what I said in that case.
Inside the db folder there should be a LOG file for RocksDB (chains/ethereum/db/906a34e69aec8c0d/overlayrecent/db/LOG or chains/ethereum/db/906a34e69aec8c0d/archive/db/LOG. This file is rewritten every time parity is started so could you share that LOG file right after you see a corruption crash? I'll try to raise the issue with RocksDB developers to see if they can point us to something. I haven't been able to reproduce this locally so it's hard for me to debug.

andresilva on 3 Feb 2018

Here's the LOG file (renamed so Github will accept it) associated with the above parity log (sync005):

rocksdb005.log

@5chdn Could you re-open this issue?

DeviateFish-2 on 4 Feb 2018

Yep. Thanks for the logs.

5chdn on 5 Feb 2018

Is this issue being resolved anytime soon? I haven't been able to sync for MONTHS because of this issue and can confirm it's not a Hardware issue. I've had the same issue happen with 2 different SSDs and 6 different HDDs. Same problem no matter where the Parity database is stored.

Can provide more logs if needed.

Emperornero on 6 Feb 2018

👍1

@Emperornero which version? on start up or during sync?

5chdn on 7 Feb 2018

This has been happening since 1.7.6, no DB clears seem to fix the problem, currently on 1.9.2.

Emperornero on 8 Feb 2018

Ditto.

version Parity/v1.9.2-beta-0feb0bb-20180201/x86_64-linux-gnu/rustc1.23.0

Tried full sync of mainnet/foundation. It stalled out about 12 hours in (2.4m blocks). Issued clean shutdown (ctl-c). Shut the VM down until this morning.

Attempted restarting Parity this morning and received the same database corrupted database messages as everyone else.

parallels@ubuntu:~$ parity
2018-02-14 11:00:32  Starting Parity/v1.9.2-beta-0feb0bb-20180201/x86_64-linux-gnu/rustc1.23.0
2018-02-14 11:00:32  Keys path /home/parallels/.local/share/io.parity.ethereum/keys/Foundation
2018-02-14 11:00:32  DB path /home/parallels/.local/share/io.parity.ethereum/chains/ethereum/db/906a34e69aec8c0d
2018-02-14 11:00:32  Path to dapps /home/parallels/.local/share/io.parity.ethereum/dapps
2018-02-14 11:00:32  State DB configuration: fast
2018-02-14 11:00:32  Operating mode: active
2018-02-14 11:00:32  Configured for Foundation using Ethash engine
2018-02-14 11:00:32  DB corrupted: Invalid argument: You have to open all column families. Column families not opened: col4, col5, col6, col1, col3, col0, col2, attempting repair
2018-02-14 11:00:32  Updated conversion rate to Ξ1 = US$905.79 (131429600 wei/gas)
Client service error: Client(Database(Error(Msg("Received null column family handle from DB."), State { next_error: None, backtrace: None })))

DWAK-ATTK on 14 Feb 2018

I have created an issue in RocksDB with the logs that @DeviateFish-2 provided (https://github.com/facebook/rocksdb/issues/3509).

@DeviateFish-2 I understand that you've tried to rule out hardware issues by switching hard drives and memory.

@Emperornero did you try to rule out faulty memory? Could you run a memtest?

andresilva on 15 Feb 2018

I don't know if it matters, but I'm running Parity in a Parallels 12 VM (Ubuntu 16.04) on a Macbook Pro running macOS 10.13.1

I've allocated 8GB ram to the VM (Parity appears to be a memory hog). With a 60GB vhd (on the laptop's internal SSD).

DWAK-ATTK on 15 Feb 2018

Duplicate of #7748

5chdn on 23 Mar 2018

I had this same issue for days. I fixed the issue by removing 1 stick of my ram. Now my laptop has only 1 stick of 8GB DDR3 installed, and Parity syncs without an issue. Wish this helps!