Parity-ethereum: parity ethereum client doesn't always shutdown gracefully

Created on 15 Feb 2019 · 17Comments · Source: openethereum/parity-ethereum

Parity Ethereum version: 2.2.10-stable
Operating system: Linux
Installation: binary
Fully synchronized: yes
Network: ethereum
Restarted: yes

Sometimes stopping parity results in the issue described here: https://github.com/paritytech/parity-ethereum/issues/9101#issuecomment-454746413

Sometimes when stopping parity even with shutdown tracing turned on the process exits immediately and nothing about a shutdown is being logged at all.

So we have 3 outcomes when stopping parity:

clean shutdown (nothing being logged at all, almost instant)
clean shutdown (shutdown being logged, taking 1-10 seconds)
unclean shutdown (_Shutdown is taking longer than expected / Shutdown timeout reached, exiting uncleanly_)

How to debug this further?

F2-bug 🐞 M4-core ⛓ P5-sometimesoon 🌲

Source

c0deright

All 17 comments

2019-02-15 12:25:03  Verifier #1 INFO import  Imported #7223288 0xe31c…3dcf (84 txs, 7.99 Mgas, 675 ms, 14.60 KiB)
2019-02-15 12:25:08  IO Worker #3 INFO import     5/ 5 peers     18 MiB chain  115 MiB db  0 bytes queue   43 KiB sync  RPC:  0 conn,    0 req/s, 3244 µs
2019-02-15 12:25:23  Verifier #1 INFO import  Imported #7223289 0x70d4…8380 (103 txs, 7.99 Mgas, 1599 ms, 16.89 KiB)
2019-02-15 12:25:38  IO Worker #2 INFO import     5/ 5 peers     18 MiB chain  115 MiB db  0 bytes queue   43 KiB sync  RPC:  0 conn,    0 req/s, 3244 µs
2019-02-15 12:25:57  main INFO parity_ethereum::run  Finishing work, please wait...
2019-02-15 12:25:57  main TRACE shutdown  [IoService] Closing...
2019-02-15 12:25:57   TRACE shutdown  [IoWorker] Closing...
2019-02-15 12:25:57   TRACE shutdown  [IoWorker] Closed
2019-02-15 12:25:57   TRACE shutdown  [IoWorker] Closing...
2019-02-15 12:25:57   TRACE shutdown  [IoWorker] Closed
2019-02-15 12:25:57   TRACE shutdown  [IoWorker] Closing...
2019-02-15 12:25:57   TRACE shutdown  [IoWorker] Closed
2019-02-15 12:25:57   TRACE shutdown  [IoWorker] Closing...
2019-02-15 12:25:57   TRACE shutdown  [IoWorker] Closed
2019-02-15 12:25:57  main TRACE shutdown  [IoService] Closed.
2019-02-15 12:26:57  main WARN parity_ethereum::run  Shutdown is taking longer than expected.
2019-02-15 12:30:57  main WARN parity_ethereum::run  Shutdown timeout reached, exiting uncleanly.

c0deright on 15 Feb 2019

People are still seeing this issue regularly in recent versions - unclean shutdowns are leading to many more reports of db corruption so bumping priority here 😥

joshua-mir on 10 May 2019

i can confirm this happens regularly to at least 4 parity archive instances we run with the following config

--auto-update=none
--base-path=/paritydb
--mode=active
--tracing=on
--pruning=archive
--db-compaction=ssd
--scale-verifiers
--num-verifiers=6
--jsonrpc-server-threads=5
--jsonrpc-threads=5
--cache-size=22000
--min-peers=100
--max-peers=1000
--jsonrpc-hosts=all
--jsonrpc-interface=all
--ws-interface=all
--tx-queue-mem-limit=2048
--tx-queue-size=2000000

tzapu on 28 May 2019

This is probably fixed by https://github.com/paritytech/parity-ethereum/pull/10689

dvdplm on 6 Jun 2019

There is probably one more of these bugs to root out. I've seen this happen with --chain kovan when the snapshotting service is running. With some extra logging added it looks like this:

2019-06-12 14:04:10  main TRACE shutdown  [IoService] Closed.
2019-06-12 14:04:10  main TRACE shutdown  ClientService dropped
2019-06-12 14:04:10  main TRACE shutdown  RPC dropped
2019-06-12 14:04:10  main TRACE shutdown  KeepAlive dropped
2019-06-12 14:04:10  main TRACE shutdown  Informant shut down
2019-06-12 14:04:10  main TRACE shutdown  Informant dropped
2019-06-12 14:04:10  main TRACE shutdown  Client dropped
2019-06-12 14:04:10  main TRACE shutdown  Waiting for refs to Client to shutdown, strong_count=19, weak_count=Some(13)
2019-06-12 14:04:10  jsonrpc-eventloop-1 TRACE shutdown  [IoService] Closing...
2019-06-12 14:04:10   TRACE shutdown  [IoWorker] Closing...
2019-06-12 14:04:10   TRACE shutdown  [IoWorker] Closed
2019-06-12 14:04:10   TRACE shutdown  [IoWorker] Closing...
2019-06-12 14:04:10   TRACE shutdown  [IoWorker] Closed
2019-06-12 14:04:10   TRACE shutdown  [IoWorker] Closing...
2019-06-12 14:04:10   TRACE shutdown  [IoWorker] Closed
2019-06-12 14:04:10   TRACE shutdown  [IoWorker] Closing...
2019-06-12 14:04:10   TRACE shutdown  [IoWorker] Closed
2019-06-12 14:04:10  jsonrpc-eventloop-1 TRACE shutdown  [IoService] Closed.
2019-06-12 14:04:11  main TRACE shutdown  Waiting for client to drop, strong_count=2, weak_count=Some(5)
2019-06-12 14:04:12  main TRACE shutdown  Waiting for client to drop, strong_count=2, weak_count=Some(5)
…
2019-06-12 14:05:10  main WARN parity_ethereum::run  Shutdown is taking longer than expected.
…
2019-06-12 14:05:11  main TRACE shutdown  Waiting for client to drop, strong_count=2, weak_count=Some(5)
2019-06-12 14:05:12  main TRACE shutdown  Waiting for client to drop, strong_count=2, weak_count=Some(5)
2019-06-12 14:05:13  main TRACE shutdown  Waiting for client to drop, strong_count=2, weak_count=Some(5)
…
2019-06-12 14:09:10  main WARN parity_ethereum::run  Shutdown timeout reached, exiting uncleanly.

dvdplm on 12 Jun 2019

The problem is still present in current stable. When it will be merged?

zet-tech on 28 Jun 2019

@zet-tech it should be resolved in 2.4.8 and 2.5.3 - it was merged into those releases - if you still have issues with shutdowns the source of the issue may be different

joshua-mir on 28 Jun 2019

It is present in 2.4.8 and the issue is for sure related to rpc. When I bind only to locahost and there is no rpc calls, restarts are correct. But when I bind parity do remote IP and it got request from our other software (even one second is enough which means 5-10 requests, only eth_getWork and eth_getBlockByNumber), restart is not possible and process is being killed. This result in DB corruption much more often then is should (even once per 10 restarts) and we were forced to move to GETH on production due to this problem.

zet-tech on 28 Jun 2019

One thing worth noting about any of these shutdown problems is that different bugs can cause the same symptom. We recently fixed one instance where shutdown would fail while the node was taking a snapshot.
RPC usage causing deadlock during shutdown is quite possibly a distinct bug.

dvdplm on 28 Jun 2019

@zet-tech That sounds really bad. I have tried to reproduce the shutdown problem after RPC on the latest master and could not see a problem. I'd need your assistance to debug this further.

if you have the possibility to try your setup with a master build that'd be great
can you share your configuration toml file with us so I can replicate your setup more closely?
I'm not sure what you mean by "bind only to locahost"/"when I bind parity do remote IP", can you elaborate?

Thanks!

dvdplm on 29 Jun 2019

Can anyone confirm this issue has been fixed ?

CorentinPacaud on 24 Sep 2019

Can anyone confirm this issue has been fixed ?

As mentioned above there are possibly several other causes with the same symptom. We have fixed a few but there might be others. FWIW we have experienced or afaik not had reports of shutdown issues for several months now.

dvdplm on 24 Sep 2019

Can anyone confirm this issue has been fixed ?

As mentioned above there are possibly several other causes with the same symptom. We have fixed a few but there might be others. FWIW we have experienced or afaik not had reports of shutdown issues for several months now.

So, after my server automatically restarted this weekend, I can confirm that the parity server restart normally with pm2. No error.
Thx

CorentinPacaud on 30 Sep 2019

I just installed 2.6.4. Problem still occurs, exactly as before.

Answering previous questions:
1). outdated
2).
eth.txt

I removed the IP address because it is public IP.

3). By bind only to localhost, I meant that if there is no RPC calls to parity then the error does not occur. But even one RPC call cause that parity cannot be shutdown.

zet-tech on 30 Sep 2019

👍1

2.5.10 solves our restart problem.

zet-tech on 6 Dec 2019

❤1

Sorry, forgot to mention that I didn't observe ungraceful shutdowns with v2.6.5, now running v2.6.6.

@zet-tech Don't forget to upgrade to at least v2.5.11 before Istanbul fork at the weekend: https://github.com/paritytech/parity-ethereum/releases/tag/v2.5.11

c0deright on 6 Dec 2019

❤1

Already updated, but bug was fixed in 2.5.10.

pt., 6 gru 2019, 11:30 użytkownik @c0deright notifications@github.com
napisał:

Sorry, forgot to mention that I didn't observe ungraceful shutdowns with
v2.6.5, now running v2.6.6.

@zet-tech https://github.com/zet-tech Don't forget to upgrade to at
least v2.5.11 before Istanbul fork at the weekend:
https://github.com/paritytech/parity-ethereum/releases/tag/v2.5.11

—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
https://github.com/paritytech/parity-ethereum/issues/10364?email_source=notifications&email_token=AF7ICSBNK7ZPNTCODWJQNPTQXISTPA5CNFSM4GXV5VYKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEGDWAVI#issuecomment-562520149,
or unsubscribe
https://github.com/notifications/unsubscribe-auth/AF7ICSCVNBPC56T3PVVKWGTQXISTPANCNFSM4GXV5VYA
.