Parity-ethereum: Sudden massive CPU/Disk usage and stuck syncing

Created on 24 Apr 2019 · 9Comments · Source: openethereum/parity-ethereum

Parity Ethereum version: 2.3.8
Operating system: Linux
Installation: one-line installer
Fully synchronized: yes
Network: ethereum
Restarted: yes

This is on multiple servers and seems to be completely random. After syncing for a while, sometimes a few days there will be a massive spike in CPU usage and Disk Usage (https://i.imgur.com/7ichLLG.png) and parity just gets stuck. This can happen for around 6 - 12 hours and then it seems to fix it self, or if i catch it earlier a quick restart of the instance fixes it. The only other thing running on these instances is nginx. Here is an example of parity logs when it starts:

2019-04-23 20:53:53 UTC Imported #7626040 0xa0be…787e (50 txs, 8.00 Mgas, 271 ms, 7.78 KiB)
2019-04-23 20:53:58 UTC Imported #7626041 0xa7cf…7d53 (88 txs, 7.69 Mgas, 441 ms, 15.62 KiB)
2019-04-23 20:54:06 UTC Imported #7626042 0x4b0f…61f2 (86 txs, 3.74 Mgas, 315 ms, 15.14 KiB)
2019-04-23 20:54:12 UTC Imported #7626043 0x0065…0144 (3 txs, 0.07 Mgas, 44 ms, 1.10 KiB)
2019-04-23 20:54:16 UTC 25/25 peers 136 MiB chain 389 MiB db 0 bytes queue 43 KiB sync RPC: 2 conn, 11 req/s, 3565 µs
2019-04-23 20:56:19 UTC 24/25 peers 136 MiB chain 389 MiB db 68 KiB queue 43 KiB sync RPC: 2 conn, 2 req/s, 272789 µs
2019-04-23 20:56:19 UTC 24/25 peers 136 MiB chain 389 MiB db 68 KiB queue 43 KiB sync RPC: 2 conn, 2 req/s, 272789 µs
2019-04-23 20:56:20 UTC Syncing #7626043 0x0065…0144 0.00 blk/s 0.0 tx/s 0.0 Mgas/s 2+ 1 Qed #7626048 24/25 peers 136 MiB c$
2019-04-23 20:56:24 UTC Syncing #7626043 0x0065…0144 0.00 blk/s 0.0 tx/s 0.0 Mgas/s 2+ 1 Qed #7626048 24/25 peers 136 MiB c$
2019-04-23 20:56:48 UTC Syncing #7626043 0x0065…0144 0.00 blk/s 0.0 tx/s 0.0 Mgas/s 3+ 1 Qed #7626050 21/25 peers 136 MiB c$
2019-04-23 20:59:24 UTC Syncing #7626043 0x0065…0144 0.00 blk/s 0.0 tx/s 0.0 Mgas/s 8+ 1 Qed #7626054 0/25 peers 136 MiB c$
2019-04-23 21:00:08 UTC Syncing #7626043 0x0065…0144 0.00 blk/s 0.0 tx/s 0.0 Mgas/s 8+ 1 Qed #7626054 0/25 peers 136 MiB c$
2019-04-23 21:00:52 UTC Syncing #7626043 0x0065…0144 0.00 blk/s 0.0 tx/s 0.0 Mgas/s 8+ 1 Qed #7626054 0/25 peers 136 MiB c$
2019-04-23 21:01:16 UTC Syncing #7626043 0x0065…0144 0.00 blk/s 0.0 tx/s 0.0 Mgas/s 8+ 1 Qed #7626054 0/25 peers 136 MiB c$
2019-04-23 21:01:19 UTC Syncing #7626043 0x0065…0144 0.00 blk/s 0.0 tx/s 0.0 Mgas/s 8+ 1 Qed #7626054 0/25 peers 136 MiB c$
2019-04-23 21:01:21 UTC Syncing #7626043 0x0065…0144 0.00 blk/s 0.0 tx/s 0.0 Mgas/s 8+ 1 Qed #7626054 0/25 peers 136 MiB c$
2019-04-23 21:01:35 UTC Syncing #7626043 0x0065…0144 0.00 blk/s 0.0 tx/s 0.0 Mgas/s 8+ 1 Qed #7626054 0/25 peers 136 MiB c$
2019-04-23 21:01:48 UTC Syncing #7626043 0x0065…0144 0.00 blk/s 0.0 tx/s 0.0 Mgas/s 8+ 1 Qed #7626054 0/25 peers 136 MiB c$
2019-04-23 21:01:48 UTC Syncing #7626043 0x0065…0144 0.00 blk/s 0.0 tx/s 0.0 Mgas/s 8+ 1 Qed #7626054 0/25 peers 136 MiB c$

F3-annoyance 💩 M4-core ⛓

Source

5murf

👍2

All 9 comments

I seem to have a similar issue with versions >2.2.9 myself using an archive node pointing to Kovan. It syncs correctly, goes to importing and is fine for a while then it drops to sync and gets stuck in this "0.00 blk/s 0.0 tx/s 0.0 Mgas/s" state.

I have seen the issue with 2.3.9 and 2.4.5.

I have to drop back to 2.2.9 to get it working again.

Edit: My node is on a windows 10 machine.

AC0DEM0NK3Y on 24 Apr 2019

I can also confirm this behavior on v2.2.9, v2.4.5 and v2.4.6 on linux. Running multiple full nodes with tracing on mainnet.

It seems to be happening at random blocks & I've never seen two nodes getting stuck on the same block. I never saw it fixing by itself either ... only a restart fixes it but not always; I had cases where it didn't manage to continue syncing for 6-7 restarts.

I excluded the possibility of an upgrade causing it because we've had nodes running for a long time on the same version but this only started happening recently.

Example of logs:

2019-05-22 10:01:09 UTC Syncing #7807258 0x94e7…cc80     0.00 blk/s    0.0 tx/s    0.0 Mgas/s      0+    0 Qed      #286    5/100 peers      5 MiB chain  0 bytes db  0 bytes queue  759 KiB sync  RPC:  0 conn,    4 req/s,   67 µs
2019-05-22 10:01:14 UTC Syncing #7807258 0x94e7…cc80     0.00 blk/s    0.0 tx/s    0.0 Mgas/s      0+    0 Qed      #747    5/100 peers      5 MiB chain  0 bytes db  0 bytes queue  763 KiB sync  RPC:  0 conn,    1 req/s, 3700 µs
2019-05-22 10:01:18 UTC Syncing #7807258 0x94e7…cc80     0.00 blk/s    0.0 tx/s    0.0 Mgas/s      0+    0 Qed     #1144    5/100 peers      5 MiB chain  0 bytes db  0 bytes queue    1 MiB sync  RPC:  0 conn,    2 req/s, 98904 µs
2019-05-22 10:01:23 UTC Syncing #7807258 0x94e7…cc80     0.00 blk/s    0.0 tx/s    0.0 Mgas/s      0+    0 Qed     #1527    5/100 peers      5 MiB chain  0 bytes db  0 bytes queue    1 MiB sync  RPC:  0 conn,    1 req/s, 133653 µs
2019-05-22 10:01:30 UTC Syncing #7807258 0x94e7…cc80     0.00 blk/s    0.0 tx/s    0.0 Mgas/s      0+    0 Qed     #1906    5/100 peers      5 MiB chain  0 bytes db  0 bytes queue    1 MiB sync  RPC:  0 conn,    3 req/s, 98904 µs
2019-05-22 10:01:34 UTC Syncing #7807258 0x94e7…cc80     0.00 blk/s    0.0 tx/s    0.0 Mgas/s      0+    0 Qed     #2166    4/100 peers      5 MiB chain  0 bytes db  0 bytes queue    1 MiB sync  RPC:  0 conn,    3 req/s, 99861 µs
2019-05-22 10:01:41 UTC Syncing #7807258 0x94e7…cc80     0.00 blk/s    0.0 tx/s    0.0 Mgas/s      0+    0 Qed     #2670    4/100 peers      5 MiB chain  0 bytes db  0 bytes queue    1 MiB sync  RPC:  0 conn,    3 req/s, 99861 µs
2019-05-22 10:01:44 UTC Syncing #7807258 0x94e7…cc80     0.00 blk/s    0.0 tx/s    0.0 Mgas/s      0+    0 Qed     #3175    3/100 peers      5 MiB chain  0 bytes db  0 bytes queue    1 MiB sync  RPC:  0 conn,    3 req/s, 99861 µs
2019-05-22 10:01:49 UTC Syncing #7807258 0x94e7…cc80     0.00 blk/s    0.0 tx/s    0.0 Mgas/s      0+    0 Qed     #3810    3/100 peers      5 MiB chain  0 bytes db  0 bytes queue    2 MiB sync  RPC:  0 conn,    2 req/s, 141991 µs
2019-05-22 10:01:53 UTC Syncing #7807258 0x94e7…cc80     0.00 blk/s    0.0 tx/s    0.0 Mgas/s      0+    0 Qed     #4832    3/100 peers      5 MiB chain  0 bytes db  0 bytes queue    1 MiB sync  RPC:  0 conn,    2 req/s, 183841 µs
2019-05-22 10:01:58 UTC Syncing #7807258 0x94e7…cc80     0.00 blk/s    0.0 tx/s    0.0 Mgas/s      0+    0 Qed     #5850    3/100 peers      5 MiB chain  0 bytes db  0 bytes queue    1 MiB sync  RPC:  0 conn,    2 req/s, 183841 µs
2019-05-22 10:02:03 UTC Syncing #7807258 0x94e7…cc80     0.00 blk/s    0.0 tx/s    0.0 Mgas/s      0+    0 Qed     #6604    3/100 peers      5 MiB chain  0 bytes db  0 bytes queue    2 MiB sync  RPC:  0 conn,    3 req/s, 113243 µs

lacasian on 22 May 2019

👍1

Can you share any cache and pruning settings you're using on your node? What you are describing sounds like cache flushing to disk.

joshua-mir on 22 May 2019

@joshua-mir Sure! Here's the configuration:

--auto-update=none
--mode=active
--tracing=on
--pruning=archive
--db-compaction=ssd
--scale-verifiers
--num-verifiers=6
--jsonrpc-server-threads=5
--jsonrpc-threads=5
--cache-size=22000

lacasian on 22 May 2019

Yeah, that should be more than sufficient for what you're trying to do - (minor note, scale-verifiers overrides num_verifiers)

joshua-mir on 22 May 2019

👍1

I can confirm what AC0DEM0NK3Y said about it being versions >2.2.9.

I downgraded to 2.2.9 and haven't had any issues since (27 days)

5murf on 22 May 2019

I'm am now in the tricky position of 2.2.9 seemingly not being usable on Kovan because of

Error: Error(Engine(NotProposer(Mismatch { expected: 0x00d6cc1ba9cf89bd2e58009741f4f7325badc0ed, found: 0xfaadface3fbd81ce37b0e19c0b65ff4234148132 })), State { next_error: None, backtrace: InternalBacktrace { backtrace: None } })

as I don't seem to be able to pull data from peers using that version (tried deleting nodes file). That above error seems to be fixed with 2.3.6 (takes it to stage 1 block error TimestampOverflow) but that now means I am getting stuck in this particular sync issue.

AC0DEM0NK3Y on 23 May 2019

@AC0DEM0NK3Y syncing with kovan won't work with these versions unless you use the updated chain specification file from master. There was a hardfork on kovan introduced in 2.4.6/2.5.1

joshua-mir on 23 May 2019

👍1

Closing issue due to its stale state.