Parity-ethereum: Sudden massive CPU/Disk usage and stuck syncing

Created on 24 Apr 2019  Â·  9Comments  Â·  Source: openethereum/parity-ethereum

  • Parity Ethereum version: 2.3.8
  • Operating system: Linux
  • Installation: one-line installer
  • Fully synchronized: yes
  • Network: ethereum
  • Restarted: yes

This is on multiple servers and seems to be completely random. After syncing for a while, sometimes a few days there will be a massive spike in CPU usage and Disk Usage (https://i.imgur.com/7ichLLG.png) and parity just gets stuck. This can happen for around 6 - 12 hours and then it seems to fix it self, or if i catch it earlier a quick restart of the instance fixes it. The only other thing running on these instances is nginx. Here is an example of parity logs when it starts:

2019-04-23 20:53:53 UTC Imported #7626040 0xa0be…787e (50 txs, 8.00 Mgas, 271 ms, 7.78 KiB)
2019-04-23 20:53:58 UTC Imported #7626041 0xa7cf…7d53 (88 txs, 7.69 Mgas, 441 ms, 15.62 KiB)
2019-04-23 20:54:06 UTC Imported #7626042 0x4b0f…61f2 (86 txs, 3.74 Mgas, 315 ms, 15.14 KiB)
2019-04-23 20:54:12 UTC Imported #7626043 0x0065…0144 (3 txs, 0.07 Mgas, 44 ms, 1.10 KiB)
2019-04-23 20:54:16 UTC 25/25 peers 136 MiB chain 389 MiB db 0 bytes queue 43 KiB sync RPC: 2 conn, 11 req/s, 3565 µs
2019-04-23 20:56:19 UTC 24/25 peers 136 MiB chain 389 MiB db 68 KiB queue 43 KiB sync RPC: 2 conn, 2 req/s, 272789 µs
2019-04-23 20:56:19 UTC 24/25 peers 136 MiB chain 389 MiB db 68 KiB queue 43 KiB sync RPC: 2 conn, 2 req/s, 272789 µs
2019-04-23 20:56:20 UTC Syncing #7626043 0x0065…0144 0.00 blk/s 0.0 tx/s 0.0 Mgas/s 2+ 1 Qed #7626048 24/25 peers 136 MiB c$
2019-04-23 20:56:24 UTC Syncing #7626043 0x0065…0144 0.00 blk/s 0.0 tx/s 0.0 Mgas/s 2+ 1 Qed #7626048 24/25 peers 136 MiB c$
2019-04-23 20:56:48 UTC Syncing #7626043 0x0065…0144 0.00 blk/s 0.0 tx/s 0.0 Mgas/s 3+ 1 Qed #7626050 21/25 peers 136 MiB c$
2019-04-23 20:59:24 UTC Syncing #7626043 0x0065…0144 0.00 blk/s 0.0 tx/s 0.0 Mgas/s 8+ 1 Qed #7626054 0/25 peers 136 MiB c$
2019-04-23 21:00:08 UTC Syncing #7626043 0x0065…0144 0.00 blk/s 0.0 tx/s 0.0 Mgas/s 8+ 1 Qed #7626054 0/25 peers 136 MiB c$
2019-04-23 21:00:52 UTC Syncing #7626043 0x0065…0144 0.00 blk/s 0.0 tx/s 0.0 Mgas/s 8+ 1 Qed #7626054 0/25 peers 136 MiB c$
2019-04-23 21:01:16 UTC Syncing #7626043 0x0065…0144 0.00 blk/s 0.0 tx/s 0.0 Mgas/s 8+ 1 Qed #7626054 0/25 peers 136 MiB c$
2019-04-23 21:01:19 UTC Syncing #7626043 0x0065…0144 0.00 blk/s 0.0 tx/s 0.0 Mgas/s 8+ 1 Qed #7626054 0/25 peers 136 MiB c$
2019-04-23 21:01:21 UTC Syncing #7626043 0x0065…0144 0.00 blk/s 0.0 tx/s 0.0 Mgas/s 8+ 1 Qed #7626054 0/25 peers 136 MiB c$
2019-04-23 21:01:35 UTC Syncing #7626043 0x0065…0144 0.00 blk/s 0.0 tx/s 0.0 Mgas/s 8+ 1 Qed #7626054 0/25 peers 136 MiB c$
2019-04-23 21:01:48 UTC Syncing #7626043 0x0065…0144 0.00 blk/s 0.0 tx/s 0.0 Mgas/s 8+ 1 Qed #7626054 0/25 peers 136 MiB c$
2019-04-23 21:01:48 UTC Syncing #7626043 0x0065…0144 0.00 blk/s 0.0 tx/s 0.0 Mgas/s 8+ 1 Qed #7626054 0/25 peers 136 MiB c$

F3-annoyance 💩 M4-core ⛓

All 9 comments

I seem to have a similar issue with versions >2.2.9 myself using an archive node pointing to Kovan. It syncs correctly, goes to importing and is fine for a while then it drops to sync and gets stuck in this "0.00 blk/s 0.0 tx/s 0.0 Mgas/s" state.

I have seen the issue with 2.3.9 and 2.4.5.

I have to drop back to 2.2.9 to get it working again.

Edit: My node is on a windows 10 machine.

I can also confirm this behavior on v2.2.9, v2.4.5 and v2.4.6 on linux. Running multiple full nodes with tracing on mainnet.

It seems to be happening at random blocks & I've never seen two nodes getting stuck on the same block. I never saw it fixing by itself either ... only a restart fixes it but not always; I had cases where it didn't manage to continue syncing for 6-7 restarts.

I excluded the possibility of an upgrade causing it because we've had nodes running for a long time on the same version but this only started happening recently.

Example of logs:

2019-05-22 10:01:09 UTC Syncing #7807258 0x94e7…cc80     0.00 blk/s    0.0 tx/s    0.0 Mgas/s      0+    0 Qed      #286    5/100 peers      5 MiB chain  0 bytes db  0 bytes queue  759 KiB sync  RPC:  0 conn,    4 req/s,   67 µs
2019-05-22 10:01:14 UTC Syncing #7807258 0x94e7…cc80     0.00 blk/s    0.0 tx/s    0.0 Mgas/s      0+    0 Qed      #747    5/100 peers      5 MiB chain  0 bytes db  0 bytes queue  763 KiB sync  RPC:  0 conn,    1 req/s, 3700 µs
2019-05-22 10:01:18 UTC Syncing #7807258 0x94e7…cc80     0.00 blk/s    0.0 tx/s    0.0 Mgas/s      0+    0 Qed     #1144    5/100 peers      5 MiB chain  0 bytes db  0 bytes queue    1 MiB sync  RPC:  0 conn,    2 req/s, 98904 µs
2019-05-22 10:01:23 UTC Syncing #7807258 0x94e7…cc80     0.00 blk/s    0.0 tx/s    0.0 Mgas/s      0+    0 Qed     #1527    5/100 peers      5 MiB chain  0 bytes db  0 bytes queue    1 MiB sync  RPC:  0 conn,    1 req/s, 133653 µs
2019-05-22 10:01:30 UTC Syncing #7807258 0x94e7…cc80     0.00 blk/s    0.0 tx/s    0.0 Mgas/s      0+    0 Qed     #1906    5/100 peers      5 MiB chain  0 bytes db  0 bytes queue    1 MiB sync  RPC:  0 conn,    3 req/s, 98904 µs
2019-05-22 10:01:34 UTC Syncing #7807258 0x94e7…cc80     0.00 blk/s    0.0 tx/s    0.0 Mgas/s      0+    0 Qed     #2166    4/100 peers      5 MiB chain  0 bytes db  0 bytes queue    1 MiB sync  RPC:  0 conn,    3 req/s, 99861 µs
2019-05-22 10:01:41 UTC Syncing #7807258 0x94e7…cc80     0.00 blk/s    0.0 tx/s    0.0 Mgas/s      0+    0 Qed     #2670    4/100 peers      5 MiB chain  0 bytes db  0 bytes queue    1 MiB sync  RPC:  0 conn,    3 req/s, 99861 µs
2019-05-22 10:01:44 UTC Syncing #7807258 0x94e7…cc80     0.00 blk/s    0.0 tx/s    0.0 Mgas/s      0+    0 Qed     #3175    3/100 peers      5 MiB chain  0 bytes db  0 bytes queue    1 MiB sync  RPC:  0 conn,    3 req/s, 99861 µs
2019-05-22 10:01:49 UTC Syncing #7807258 0x94e7…cc80     0.00 blk/s    0.0 tx/s    0.0 Mgas/s      0+    0 Qed     #3810    3/100 peers      5 MiB chain  0 bytes db  0 bytes queue    2 MiB sync  RPC:  0 conn,    2 req/s, 141991 µs
2019-05-22 10:01:53 UTC Syncing #7807258 0x94e7…cc80     0.00 blk/s    0.0 tx/s    0.0 Mgas/s      0+    0 Qed     #4832    3/100 peers      5 MiB chain  0 bytes db  0 bytes queue    1 MiB sync  RPC:  0 conn,    2 req/s, 183841 µs
2019-05-22 10:01:58 UTC Syncing #7807258 0x94e7…cc80     0.00 blk/s    0.0 tx/s    0.0 Mgas/s      0+    0 Qed     #5850    3/100 peers      5 MiB chain  0 bytes db  0 bytes queue    1 MiB sync  RPC:  0 conn,    2 req/s, 183841 µs
2019-05-22 10:02:03 UTC Syncing #7807258 0x94e7…cc80     0.00 blk/s    0.0 tx/s    0.0 Mgas/s      0+    0 Qed     #6604    3/100 peers      5 MiB chain  0 bytes db  0 bytes queue    2 MiB sync  RPC:  0 conn,    3 req/s, 113243 µs

Can you share any cache and pruning settings you're using on your node? What you are describing sounds like cache flushing to disk.

@joshua-mir Sure! Here's the configuration:

--auto-update=none
--mode=active
--tracing=on
--pruning=archive
--db-compaction=ssd
--scale-verifiers
--num-verifiers=6
--jsonrpc-server-threads=5
--jsonrpc-threads=5
--cache-size=22000

Yeah, that should be more than sufficient for what you're trying to do - (minor note, scale-verifiers overrides num_verifiers)

I can confirm what AC0DEM0NK3Y said about it being versions >2.2.9.

I downgraded to 2.2.9 and haven't had any issues since (27 days)

I'm am now in the tricky position of 2.2.9 seemingly not being usable on Kovan because of

Error: Error(Engine(NotProposer(Mismatch { expected: 0x00d6cc1ba9cf89bd2e58009741f4f7325badc0ed, found: 0xfaadface3fbd81ce37b0e19c0b65ff4234148132 })), State { next_error: None, backtrace: InternalBacktrace { backtrace: None } })

as I don't seem to be able to pull data from peers using that version (tried deleting nodes file). That above error seems to be fixed with 2.3.6 (takes it to stage 1 block error TimestampOverflow) but that now means I am getting stuck in this particular sync issue.

@AC0DEM0NK3Y syncing with kovan won't work with these versions unless you use the updated chain specification file from master. There was a hardfork on kovan introduced in 2.4.6/2.5.1

Closing issue due to its stale state.

Was this page helpful?
0 / 5 - 0 ratings

Related issues

0x7CFE picture 0x7CFE  Â·  3Comments

jordipainan picture jordipainan  Â·  3Comments

vmenond picture vmenond  Â·  3Comments

barakman picture barakman  Â·  3Comments

uluhonolulu picture uluhonolulu  Â·  3Comments