Parity-ethereum: RPC stops working after a while on parity v2.5.10

Created on 14 Nov 2019  ·  52Comments  ·  Source: openethereum/parity-ethereum

  • Parity Ethereum version: Parity-Ethereum/v2.5.10-stable-9f94473-20191112/x86_64-linux-gnu/rustc1.39.0
  • Operating system: Linux
  • Installation: built from source
  • Fully synchronized: yes
  • Network: ethereum
  • Restarted: yes

We have been running a parity node for ethereum mainnet for a few years now. Today we tried updating our software to v2.5.10 (from v2.5.6). We attempted this upgrade twice today but both times, the RPC API stopped working after a while. This is a behavior we had previously never seen (the RPC is just unresponsive).

Here you can see that the RPC was working for block 8929535 but at some point between that block and block 8929769 the RPC stopped responding. A restart fixed it. This has been reproduced twice in production:

ubuntu@parity:~/parity_builds/parity-ethereum$ curl --data '{"method":"eth_blockNumber","params":[],"id":1,"jsonrpc":"2.0"}' -H "Content-Type: application/json" -X POST localhost:8545 --silent 
{"jsonrpc":"2.0","result":"0x8840ff","id":1}


ubuntu@parity:~/parity_builds/parity-ethereum$ time curl --data '{"method":"eth_blockNumber","params":[],"id":1,"jsonrpc":"2.0"}' -H "Content-Type: application/json" -X POST localhost:8545 --silent 
^C

real    0m17.480s
user    0m0.000s
sys 0m0.004s
ubuntu@parity:~/parity_builds/parity-ethereum$ time curl --data '{"method":"eth_blockNumber","params":[],"id":1,"jsonrpc":"2.0"}' -H "Content-Type: application/json" -X POST localhost:8545 --silent 
^C

real    0m57.002s
user    0m0.004s
sys 0m0.000s
ubuntu@parity:~/parity_builds/parity-ethereum$ sudo systemctl restart parity
ubuntu@parity:~/parity_builds/parity-ethereum$ time curl --data '{"method":"eth_blockNumber","params":[],"id":1,"jsonrpc":"2.0"}' -H "Content-Type: application/json" -X POST localhost:8545 --silent 
{"jsonrpc":"2.0","result":"0x8841e9","id":1}

real    0m0.020s
user    0m0.000s
sys 0m0.004s

I couldn't find anything relevant on parity.log but in the following graph you can see that on November 13, 2019 at around 19:06:00 Mexico City time, the RPC became unresponsive (Nov 14, 2019 @ 01:06:00 UTC):

image

Here's the parity.log file from that time:

2019-11-14 01:02:01 UTC Verifier #4 INFO import  Imported #8929579 0x6762…b4e6 (48 txs, 2.20 Mgas, 297 ms, 9.26 KiB)
2019-11-14 01:02:21 UTC IO Worker #1 INFO import    37/50 peers      4 MiB chain    7 MiB db  0 bytes queue   34 KiB sync  RPC:  0 conn,   40 req/s,  167 µs
2019-11-14 01:02:32 UTC Verifier #5 INFO import  Imported #8929580 0xeae1…5abb (15 txs, 0.57 Mgas, 79 ms, 2.86 KiB)
2019-11-14 01:02:40 UTC Verifier #0 INFO import  Imported #8929581 0x6924…f17f (221 txs, 9.90 Mgas, 1793 ms, 41.93 KiB)
2019-11-14 01:02:53 UTC Verifier #2 INFO import  Imported #8929582 0xd434…b28a (78 txs, 9.52 Mgas, 1448 ms, 17.77 KiB)
2019-11-14 01:03:03 UTC IO Worker #3 INFO import    39/50 peers      5 MiB chain    7 MiB db  0 bytes queue   34 KiB sync  RPC:  0 conn,    0 req/s,  183 µs
2019-11-14 01:03:03 UTC Verifier #4 INFO import  Imported #8929583 0xe60c…5b58 (37 txs, 1.97 Mgas, 296 ms, 6.98 KiB)
2019-11-14 01:03:09 UTC Verifier #0 INFO import  Imported #8929584 0xa760…9224 (155 txs, 9.61 Mgas, 1373 ms, 25.70 KiB)
2019-11-14 01:03:17 UTC Verifier #5 INFO import  Imported #8929585 0xd138…a59e (52 txs, 6.89 Mgas, 316 ms, 9.09 KiB)
2019-11-14 01:03:30 UTC Verifier #0 INFO import  Imported #8929587 0x97bf…0e57 (5 txs, 6.62 Mgas, 18 ms, 1.91 KiB)
2019-11-14 01:03:38 UTC IO Worker #3 INFO import    38/50 peers      5 MiB chain    7 MiB db  0 bytes queue   34 KiB sync  RPC:  0 conn,    0 req/s,  183 µs
2019-11-14 01:03:53 UTC Verifier #2 INFO import  Imported #8929588 0x6306…6bfa (87 txs, 4.49 Mgas, 696 ms, 18.67 KiB)
2019-11-14 01:04:04 UTC Verifier #3 INFO import  Imported #8929589 0x117f…5f1b (51 txs, 4.28 Mgas, 1298 ms, 9.14 KiB)
2019-11-14 01:04:14 UTC IO Worker #3 INFO import    41/50 peers      5 MiB chain    8 MiB db  0 bytes queue   34 KiB sync  RPC:  0 conn,    0 req/s,  183 µs
2019-11-14 01:04:42 UTC Verifier #4 INFO import  Imported #8929590 0xba69…3933 (125 txs, 9.93 Mgas, 1750 ms, 25.23 KiB)
2019-11-14 01:04:54 UTC IO Worker #0 INFO import    41/50 peers      5 MiB chain    8 MiB db  0 bytes queue   34 KiB sync  RPC:  0 conn,    0 req/s,  183 µs
2019-11-14 01:04:55 UTC Verifier #6 INFO import  Imported #8929592 0xfd99…7bef (241 txs, 9.81 Mgas, 1279 ms, 45.39 KiB)
2019-11-14 01:05:20 UTC Verifier #1 INFO import  Imported #8929593 0x4f91…00fe (32 txs, 8.93 Mgas, 234 ms, 6.42 KiB)
2019-11-14 01:05:32 UTC IO Worker #0 INFO import    41/50 peers      5 MiB chain    8 MiB db  0 bytes queue   34 KiB sync  RPC:  0 conn,    0 req/s,  183 µs
2019-11-14 01:05:34 UTC Verifier #3 INFO import  Imported #8929594 0xc958…79e7 (160 txs, 9.89 Mgas, 1323 ms, 28.15 KiB)
2019-11-14 01:06:06 UTC IO Worker #2 INFO import    41/50 peers      5 MiB chain    8 MiB db  0 bytes queue   34 KiB sync  RPC:  0 conn,    0 req/s,  183 µs
2019-11-14 01:06:16 UTC Verifier #1 INFO import  Imported #8929596 0xcc46…ebd6 (32 txs, 1.31 Mgas, 970 ms, 5.50 KiB)
2019-11-14 01:06:36 UTC IO Worker #2 INFO import    41/50 peers      6 MiB chain    8 MiB db  0 bytes queue   34 KiB sync  RPC:  0 conn,    0 req/s,  183 µs
2019-11-14 01:07:06 UTC IO Worker #0 INFO import    41/50 peers      6 MiB chain    8 MiB db  0 bytes queue   34 KiB sync  RPC:  0 conn,    0 req/s,  183 µs
2019-11-14 01:07:19 UTC Verifier #2 INFO import  Imported #8929597 0xe220…30ed (216 txs, 9.77 Mgas, 1239 ms, 42.55 KiB)
2019-11-14 01:07:40 UTC Verifier #3 INFO import  Imported #8929598 0xff62…482c (124 txs, 9.91 Mgas, 2876 ms, 34.78 KiB)
2019-11-14 01:07:50 UTC IO Worker #3 INFO import    41/50 peers      6 MiB chain    8 MiB db  0 bytes queue   34 KiB sync  RPC:  0 conn,    0 req/s,  183 µs
2019-11-14 01:07:59 UTC Verifier #7 INFO import  Imported #8929599 0xced4…f2b9 (60 txs, 9.89 Mgas, 1406 ms, 8.70 KiB)
2019-11-14 01:08:11 UTC Verifier #1 INFO import  Imported #8929600 0xf8f5…e13c (201 txs, 9.89 Mgas, 1593 ms, 35.78 KiB)
2019-11-14 01:08:22 UTC Verifier #2 INFO import  Imported #8929601 0xf95b…2457 (185 txs, 9.90 Mgas, 1763 ms, 30.34 KiB)

This is currently preventing us from supporting the Istanbul hard fork.

All 52 comments

Notice that the first gap in the graph I pasted was the time at which I was upgrading from 2.5.6 to 2.5.10. I'm going to go to 2.5.7 and see if I can find where the problem starts.

Running version 2.5.7 now. Will report the results in a couple of hours:

2019-11-14 02:54:53 UTC main INFO parity_ethereum::run  Starting Parity-Ethereum/v2.5.7-stable-6bd7db9-20190829/x86_64-linux-gnu/rustc1.39.0

2.5.7 has been running ok for 2+ hours.

Looking at the release notes, it seems like there was major work done on RPC for 2.5.10. We're going to update to 2.5.9 (and skip testing of 2.5.8).

2019-11-14 04:37:09 UTC main INFO parity_ethereum::run  Starting Parity-Ethereum/v2.5.9-stable-06c7096-20190926/x86_64-linux-gnu/rustc1.39.0

This is our /etc/systemd/system/parity.service:

[Unit]
Description=Parity Daemon
After=network.target

[Service]
# run as root, set base_path in config.toml
ExecStart=/usr/bin/parity --config /etc/parity/config.toml --jsonrpc-cors all
# To run as user, comment out above and uncomment below, fill in user and group
# picks up users default config.toml in $HOME/.local/share/io.parity.ethereum/
# User=username
# Group=groupname
# ExecStart=/usr/bin/parity
Restart=on-failure

# Specifies which signal to use when killing a service. Defaults to SIGTERM.
# SIGHUP gives parity time to exit cleanly before SIGKILL (default 90s)
KillSignal=SIGHUP

[Install]
WantedBy=default.target

And this is our /etc/parity/config.toml:

[parity]
mode = "last"
auto_update = "none"

[footprint]
tracing = "on"
pruning = "archive"
cache_size = 2048
db_compaction = "ssd"

[misc]
logging = "own_tx=trace"
log_file = "/var/log/parity.log"
color = true

[rpc]
disable = false
port = 8545
interface = "all"

[network]
min_peers = 25
max_peers = 50

I can confirm that we're running 2.5.9 without issues so far.

The issue appears with parity 2.5.10.

Happy to provide any logging/debugging/etc... Obviously this is mission critical for us in order to safely upgrade to Instanbul, so would appreciate any help.

Thank you for the report, this is worrying. You built from source I believe? Can you double check that you picked up jsonrpc at version 14.0.3 exactly?

yes, I built from source, as always.

I always do a cargo clean before I build with cargo build --release --features final

How do i see if I picked up jsonrpc at version 14.0.3 exactly?

The next thing I'd ask you to do is to try 2.6.5-beta.

I can roll back to 2.5.9 from 2.6.5-beta without an issue, right?

Oh, one more thing, can you enable rpc logging with logging = "own_tx=trace,rpc=debug"?

I can roll back to 2.5.9 from 2.6.5-beta without an issue, right?

No you cannot, it's one way. :/

Yikes, that upgrade would be difficult to attempt now...

I've changed the config to:
[misc] logging = "own_tx=trace,rpc=debug" log_file = "/var/log/parity.log" color = true

Do you want me to run 2.5.10 with that config to see if we get something else?

How do i see if I picked up jsonrpc at version 14.0.3 exactly?

You can check in the Cargo.lock e.g. with cat Cargo.lock | grep jsonrpc

Seems like I did pick up 14.0.3 exactly:

ubuntu@parity:~/parity_builds/current$ cat Cargo.lock | grep jsonrpc
 "jsonrpc-server-utils 14.0.3 (registry+https://github.com/rust-lang/crates.io-index)",
 "jsonrpc-core 14.0.3 (registry+https://github.com/rust-lang/crates.io-index)",
 "jsonrpc-tcp-server 14.0.3 (registry+https://github.com/rust-lang/crates.io-index)",
name = "jsonrpc-core"
name = "jsonrpc-derive"
name = "jsonrpc-http-server"
 "jsonrpc-core 14.0.3 (registry+https://github.com/rust-lang/crates.io-index)",
 "jsonrpc-server-utils 14.0.3 (registry+https://github.com/rust-lang/crates.io-index)",
name = "jsonrpc-ipc-server"
 "jsonrpc-core 14.0.3 (registry+https://github.com/rust-lang/crates.io-index)",
 "jsonrpc-server-utils 14.0.3 (registry+https://github.com/rust-lang/crates.io-index)",
name = "jsonrpc-pubsub"
 "jsonrpc-core 14.0.3 (registry+https://github.com/rust-lang/crates.io-index)",
name = "jsonrpc-server-utils"
 "jsonrpc-core 14.0.3 (registry+https://github.com/rust-lang/crates.io-index)",
name = "jsonrpc-tcp-server"
 "jsonrpc-core 14.0.3 (registry+https://github.com/rust-lang/crates.io-index)",
 "jsonrpc-server-utils 14.0.3 (registry+https://github.com/rust-lang/crates.io-index)",
name = "jsonrpc-ws-server"
 "jsonrpc-core 14.0.3 (registry+https://github.com/rust-lang/crates.io-index)",
 "jsonrpc-server-utils 14.0.3 (registry+https://github.com/rust-lang/crates.io-index)",
 "jsonrpc-core 14.0.3 (registry+https://github.com/rust-lang/crates.io-index)",
 "jsonrpc-core 14.0.3 (registry+https://github.com/rust-lang/crates.io-index)",
 "jsonrpc-http-server 14.0.3 (registry+https://github.com/rust-lang/crates.io-index)",
 "jsonrpc-core 14.0.3 (registry+https://github.com/rust-lang/crates.io-index)",
 "jsonrpc-derive 14.0.3 (registry+https://github.com/rust-lang/crates.io-index)",
 "jsonrpc-http-server 14.0.3 (registry+https://github.com/rust-lang/crates.io-index)",
 "jsonrpc-ipc-server 14.0.3 (registry+https://github.com/rust-lang/crates.io-index)",
 "jsonrpc-pubsub 14.0.3 (registry+https://github.com/rust-lang/crates.io-index)",
 "jsonrpc-ws-server 14.0.3 (registry+https://github.com/rust-lang/crates.io-index)",
 "jsonrpc-core 14.0.3 (registry+https://github.com/rust-lang/crates.io-index)",
 "jsonrpc-ws-server 14.0.3 (registry+https://github.com/rust-lang/crates.io-index)",
 "jsonrpc-core 14.0.3 (registry+https://github.com/rust-lang/crates.io-index)",
 "jsonrpc-derive 14.0.3 (registry+https://github.com/rust-lang/crates.io-index)",
 "jsonrpc-pubsub 14.0.3 (registry+https://github.com/rust-lang/crates.io-index)",
 "jsonrpc-core 14.0.3 (registry+https://github.com/rust-lang/crates.io-index)",
 "jsonrpc-http-server 14.0.3 (registry+https://github.com/rust-lang/crates.io-index)",
 "jsonrpc-pubsub 14.0.3 (registry+https://github.com/rust-lang/crates.io-index)",
"checksum jsonrpc-core 14.0.3 (registry+https://github.com/rust-lang/crates.io-index)" = "34651edf3417637cc45e70ed0182ecfa9ced0b7e8131805fccf7400d989845ca"
"checksum jsonrpc-derive 14.0.3 (registry+https://github.com/rust-lang/crates.io-index)" = "d5d5c31575cc70a8b21542599028472c80a9248394aeea4d8918a045a0ab08a3"
"checksum jsonrpc-http-server 14.0.3 (registry+https://github.com/rust-lang/crates.io-index)" = "aa54c4c2d88cb5e04b251a5031ba0f2ee8c6ef30970e31228955b89a80c3b611"
"checksum jsonrpc-ipc-server 14.0.3 (registry+https://github.com/rust-lang/crates.io-index)" = "b579cd0840d7db3ebaadf52f6f31ec601a260e78d610e44f68634f919e34497a"
"checksum jsonrpc-pubsub 14.0.3 (registry+https://github.com/rust-lang/crates.io-index)" = "3ee1b8da0b9219a231c4b7cbc7110bfdb457cbcd8d90a6224d0b3cab8aae8443"
"checksum jsonrpc-server-utils 14.0.3 (registry+https://github.com/rust-lang/crates.io-index)" = "87bc3c0a9a282211b2ec14abb3e977de33016bbec495332e9f7be858de7c5117"
"checksum jsonrpc-tcp-server 14.0.3 (registry+https://github.com/rust-lang/crates.io-index)" = "9c7807563cd721401285b59b54358f5b2325b4de6ff6f1de5494a5879e890fc1"
"checksum jsonrpc-ws-server 14.0.3 (registry+https://github.com/rust-lang/crates.io-index)" = "af36a129cef77a9db8028ac7552d927e1bb7b6928cd96b23dd25cc38bff974ab"

OK, restarted the node to 2.5.10. I can see the rpc logs.

It usually takes ~10 minutes for the RPC interface to crash, but so far it's working.

OK, it just crashed.

$ while : ; do date; time curl --data '{"method":"eth_blockNumber","params":[],"id":1,"jsonrpc":"2.0"}' -H "Content-Type: application/json" -X POST localhost:8545 --silent ; sleep 60; done
Thu Nov 14 06:43:32 UTC 2019
{"jsonrpc":"2.0","result":"0x8846ae","id":1}

real    0m0.051s
user    0m0.000s
sys 0m0.004s
Thu Nov 14 06:44:32 UTC 2019
{"jsonrpc":"2.0","result":"0x8846b0","id":1}

real    0m0.010s
user    0m0.004s
sys 0m0.000s
Thu Nov 14 06:45:32 UTC 2019
^C
real    0m33.015s
user    0m0.004s
sys 0m0.000s

log file:
2019-11-14 06:45:01 UTC http.worker00 DEBUG rpc Response: {"jsonrpc":"2.0","result":[{"action":{"callType":"call","from":"0xea674fdde714fd979de3edf0f56aa9716b898ec8","gas":"0x7148","input":"0x","to":"0x5a7119ed2d837d742ff37c5c1d68f6007a26a484","value":"0x32423e0cd0583db"},"blockHash":"0x6b79736ef41e8735baad64f5269ac952713046db1ae1273695357ff356106ae4","blockNumber":8930994,"result":{"gasUsed":"0x0","output":"0x"},"subtraces":0,"traceAddress":[],"transactionHash":"0xb1db4320c9711bb0acacc5f73f60b29c73e5c57b8101026c2fc8e167508b9b5c","transactionPosition":10,"type":"call"}],"id":134}. 2019-11-14 06:45:01 UTC http.worker00 DEBUG rpc Response: {"jsonrpc":"2.0","result":[{"action":{"callType":"call","from":"0xea674fdde714fd979de3edf0f56aa9716b898ec8","gas":"0x7148","input":"0x","to":"0x4493b1605563d179f06410079704524d6388a508","value":"0xb1a93c4bf35264"},"blockHash":"0x6b79736ef41e8735baad64f5269ac952713046db1ae1273695357ff356106ae4","blockNumber":8930994,"result":{"gasUsed":"0x0","output":"0x"},"subtraces":0,"traceAddress":[],"transactionHash":"0xa63ae7b736695597b1ccb6a7f13fe9d28dfa8d2a74cb545cb25ce6cc350f7372","transactionPosition":13,"type":"call"}],"id":134}. 2019-11-14 06:45:01 UTC http.worker00 DEBUG rpc Response: {"jsonrpc":"2.0","result":[{"action":{"callType":"call","from":"0xea674fdde714fd979de3edf0f56aa9716b898ec8","gas":"0x7148","input":"0x","to":"0x8c3875f0f570a7b20d4bd702e879643864381e1d","value":"0xb1f387132e80c2"},"blockHash":"0x6b79736ef41e8735baad64f5269ac952713046db1ae1273695357ff356106ae4","blockNumber":8930994,"result":{"gasUsed":"0x0","output":"0x"},"subtraces":0,"traceAddress":[],"transactionHash":"0xff5e61829017babbaca903e3919a3adb632afbd6fd2931f30fe004aba7635a36","transactionPosition":14,"type":"call"}],"id":134}. 2019-11-14 06:45:01 UTC http.worker00 DEBUG rpc Response: {"jsonrpc":"2.0","result":[{"action":{"callType":"call","from":"0xea674fdde714fd979de3edf0f56aa9716b898ec8","gas":"0x7148","input":"0x","to":"0x625911baf876f487195449e8bed1111ed1787602","value":"0x16387895eb3f28b"},"blockHash":"0x6b79736ef41e8735baad64f5269ac952713046db1ae1273695357ff356106ae4","blockNumber":8930994,"result":{"gasUsed":"0x0","output":"0x"},"subtraces":0,"traceAddress":[],"transactionHash":"0xd30dc9ed6a8c6cfd65439516a9b4e5ef6edc33be4769622d1a46effc88878e01","transactionPosition":11,"type":"call"}],"id":134}. 2019-11-14 06:45:01 UTC http.worker00 DEBUG rpc Response: {"jsonrpc":"2.0","result":[{"action":{"callType":"call","from":"0xea674fdde714fd979de3edf0f56aa9716b898ec8","gas":"0x7148","input":"0x","to":"0x0e35df18f82a94c5189e5cad4b8b4659107e9aa1","value":"0x163a9a8dc01930d"},"blockHash":"0x6b79736ef41e8735baad64f5269ac952713046db1ae1273695357ff356106ae4","blockNumber":8930994,"result":{"gasUsed":"0x0","output":"0x"},"subtraces":0,"traceAddress":[],"transactionHash":"0x38f0e1a2e4fe47f65627cca17946d303e31ba826000f92e317267423314817cc","transactionPosition":12,"type":"call"}],"id":134}. 2019-11-14 06:45:01 UTC http.worker00 DEBUG rpc Response: {"jsonrpc":"2.0","result":[{"action":{"callType":"call","from":"0xea674fdde714fd979de3edf0f56aa9716b898ec8","gas":"0x7148","input":"0x","to":"0xaab14f45e6a3f85f65da583d668257e25513e37b","value":"0xb2000ff5e72b66"},"blockHash":"0x6b79736ef41e8735baad64f5269ac952713046db1ae1273695357ff356106ae4","blockNumber":8930994,"result":{"gasUsed":"0x0","output":"0x"},"subtraces":0,"traceAddress":[],"transactionHash":"0xf311fc960e85696dd0e6396609483284bbae46ee6d6ccac94539e729ddbb8d2f","transactionPosition":15,"type":"call"}],"id":134}. 2019-11-14 06:45:01 UTC Verifier #6 INFO import Imported #8930995 0x35b5…17ad (8 txs, 0.66 Mgas, 102 ms, 1.94 KiB) 2019-11-14 06:45:13 UTC Verifier #3 INFO import Imported #8930996 0x2fb4…0b62 (180 txs, 9.88 Mgas, 1990 ms, 30.70 KiB)

The node is running, it's sync'ing, but the RPC interface is not responding. Is there a way to take a trace from parity that could help you debug?

The node is running, it's sync'ing, but the RPC interface is not responding. Is there a way to take a trace from parity that could help you debug?

I'll try to repeat locally. Will let you know.

Thanks @dvdplm - I'm about to hit the sack, but will check in the morning. Please let me know if I can provide additional help

I have not been able to repeat the problem locally yet. :/

What Linux distro are you using? If you didn't try yet, do you mind trying with the linux build from here?

Also, it'd be great to see the hyper logs so you need to set the RUST_LOG environment variable to RUST_LOG=hyper=trace (warning: pretty verbose).

In your issue description and comments above you use the eth_blockNum RPC but I'm guessing that's mostly to test if the server is responding or not? Can you list all the RPCs that are hitting the server, or even better try to reproduce with just a single RPC?

Yours is an archive node and I don't have access to one unfortunately. Do you have a way to try to reproduce this with a non-archive, non-tracing node?

Sorry for the random questions here, but without a way to reproduce locally it's tricky!

Could you also try setting --jsonrpc-threads=0 and --jsonrpc-server-threads=1 and publish a full output of -lrpc=trace?

@vogelito I prepared a branch with v.2.5.10 using jsonrpc at version 13.2.0: https://github.com/paritytech/parity-ethereum/tree/dp/fix/v2.5.10-jsonrpc-13.2.
Can you try to build that and run with the same log settings please?

Thank you for your support with this.

What Linux distro are you using?

ubuntu@parity:~$ cat /etc/*-release
DISTRIB_ID=Ubuntu
DISTRIB_RELEASE=16.04
DISTRIB_CODENAME=xenial
DISTRIB_DESCRIPTION="Ubuntu 16.04.5 LTS"
NAME="Ubuntu"
VERSION="16.04.5 LTS (Xenial Xerus)"
ID=ubuntu
ID_LIKE=debian
PRETTY_NAME="Ubuntu 16.04.5 LTS"
VERSION_ID="16.04"
HOME_URL="http://www.ubuntu.com/"
SUPPORT_URL="http://help.ubuntu.com/"
BUG_REPORT_URL="http://bugs.launchpad.net/ubuntu/"
VERSION_CODENAME=xenial
UBUNTU_CODENAME=xenial
ubuntu@parity:~$ lsb_release -a
No LSB modules are available.
Distributor ID: Ubuntu
Description:    Ubuntu 16.04.5 LTS
Release:    16.04
Codename:   xenial
ubuntu@parity:~$ uname -a
Linux parity 4.4.0-1074-aws #84-Ubuntu SMP Thu Dec 6 08:57:58 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux
ubuntu@parity:~$ cat /proc/version
Linux version 4.4.0-1074-aws (buildd@lcy01-amd64-026) (gcc version 5.4.0 20160609 (Ubuntu 5.4.0-6ubuntu1~16.04.10) ) #84-Ubuntu SMP Thu Dec 6 08:57:58 UTC 2018

Also, it'd be great to see the hyper logs so you need to set the RUST_LOG environment variable to RUST_LOG=hyper=trace (warning: pretty verbose).

Where will these logs go? I can prepend that to my /etc/systemd/system/parity.service in the ExecStart section.

Can you list all the RPCs that are hitting the server, or even better try to reproduce with just a single RPC?

The RPC calls we're making are:
parityClient.blockNumber();
parityClient.getBlockByNumber(block);
parityClient.getTransactionByHash(blockHash);
parityClient.traceTransaction(txHash);
parityClient.getTransactionReceipt(txHash);

As context, we have software that inspect the contents of the blockchain on our side, block by block, transaction by transaction, and trace by trace.

I prepared a branch with v.2.5.10 using jsonrpc at version 13.2.0

Thank you. I'm currently compiling this and will report back as soon as it's finished building and I have a chance to test it. I will try this without the RUST_LOG=hyper=trace change and also without the changes suggested by @tomusdrw

publish a full output of -lrpc=trace

@tomusdrw - how do I do that? do I just add -lrpc=trace to my parity startup command and will the output go to /var/log/parity.log?

@dvdplm - I've successfully built your https://github.com/paritytech/parity-ethereum/tree/dp/fix/v2.5.10-jsonrpc-13.2 branch with:

cargo clean
cargo build --release --features final

Startup line was:

2019-11-14 12:33:26 UTC main INFO parity_ethereum::run  Starting Parity-Ethereum/v2.5.10-stable-18efe53-20191114/x86_64-linux-gnu/rustc1.39.0

I've also verified that I picked up jsonrpc at version 13.20.0 exactly:
```ubuntu@parity:~/parity_builds/current$ cat Cargo.lock | grep jsonrpc
"jsonrpc-server-utils 13.2.0 (registry+https://github.com/rust-lang/crates.io-index)",
"jsonrpc-core 13.2.0 (registry+https://github.com/rust-lang/crates.io-index)",
"jsonrpc-tcp-server 13.2.0 (registry+https://github.com/rust-lang/crates.io-index)",
name = "jsonrpc-core"
name = "jsonrpc-derive"
name = "jsonrpc-http-server"
"jsonrpc-core 13.2.0 (registry+https://github.com/rust-lang/crates.io-index)",
"jsonrpc-server-utils 13.2.0 (registry+https://github.com/rust-lang/crates.io-index)",
name = "jsonrpc-ipc-server"
"jsonrpc-core 13.2.0 (registry+https://github.com/rust-lang/crates.io-index)",
"jsonrpc-server-utils 13.2.0 (registry+https://github.com/rust-lang/crates.io-index)",
name = "jsonrpc-pubsub"
"jsonrpc-core 13.2.0 (registry+https://github.com/rust-lang/crates.io-index)",
name = "jsonrpc-server-utils"
"jsonrpc-core 13.2.0 (registry+https://github.com/rust-lang/crates.io-index)",
name = "jsonrpc-tcp-server"
"jsonrpc-core 13.2.0 (registry+https://github.com/rust-lang/crates.io-index)",
"jsonrpc-server-utils 13.2.0 (registry+https://github.com/rust-lang/crates.io-index)",
name = "jsonrpc-ws-server"
"jsonrpc-core 13.2.0 (registry+https://github.com/rust-lang/crates.io-index)",
"jsonrpc-server-utils 13.2.0 (registry+https://github.com/rust-lang/crates.io-index)",
"jsonrpc-core 13.2.0 (registry+https://github.com/rust-lang/crates.io-index)",
"jsonrpc-core 13.2.0 (registry+https://github.com/rust-lang/crates.io-index)",
"jsonrpc-http-server 13.2.0 (registry+https://github.com/rust-lang/crates.io-index)",
"jsonrpc-core 13.2.0 (registry+https://github.com/rust-lang/crates.io-index)",
"jsonrpc-derive 13.2.0 (registry+https://github.com/rust-lang/crates.io-index)",
"jsonrpc-http-server 13.2.0 (registry+https://github.com/rust-lang/crates.io-index)",
"jsonrpc-ipc-server 13.2.0 (registry+https://github.com/rust-lang/crates.io-index)",
"jsonrpc-pubsub 13.2.0 (registry+https://github.com/rust-lang/crates.io-index)",
"jsonrpc-ws-server 13.2.0 (registry+https://github.com/rust-lang/crates.io-index)",
"jsonrpc-core 13.2.0 (registry+https://github.com/rust-lang/crates.io-index)",
"jsonrpc-ws-server 13.2.0 (registry+https://github.com/rust-lang/crates.io-index)",
"jsonrpc-core 13.2.0 (registry+https://github.com/rust-lang/crates.io-index)",
"jsonrpc-derive 13.2.0 (registry+https://github.com/rust-lang/crates.io-index)",
"jsonrpc-pubsub 13.2.0 (registry+https://github.com/rust-lang/crates.io-index)",
"jsonrpc-core 13.2.0 (registry+https://github.com/rust-lang/crates.io-index)",
"jsonrpc-http-server 13.2.0 (registry+https://github.com/rust-lang/crates.io-index)",
"jsonrpc-pubsub 13.2.0 (registry+https://github.com/rust-lang/crates.io-index)",
"checksum jsonrpc-core 13.2.0 (registry+https://github.com/rust-lang/crates.io-index)" = "91d767c183a7e58618a609499d359ce3820700b3ebb4823a18c343b4a2a41a0d"
"checksum jsonrpc-derive 13.2.0 (registry+https://github.com/rust-lang/crates.io-index)" = "4a76285ebba4515680fbfe4b62498ccb2a932384c8732eed68351b02fb7ae475"
"checksum jsonrpc-http-server 13.2.0 (registry+https://github.com/rust-lang/crates.io-index)" = "601fcc7bec888c7cbc7fd124d3d6744d72c0ebb540eca6fe2261b71f9cff6320"
"checksum jsonrpc-ipc-server 13.2.0 (registry+https://github.com/rust-lang/crates.io-index)" = "8756c080a6dfafd4c87fcedc454c1d8dc069b66c891152d98e87c65a15887ca2"
"checksum jsonrpc-pubsub 13.2.0 (registry+https://github.com/rust-lang/crates.io-index)" = "64e0fb0664d8ce287e826940dafbb45379443c595bdd71d93655f3c8f25fd992"
"checksum jsonrpc-server-utils 13.2.0 (registry+https://github.com/rust-lang/crates.io-index)" = "4d415f51d016a4682878e19dd03e8c0b61cd4394912d7cd3dc48d4f19f061a4e"
"checksum jsonrpc-tcp-server 13.2.0 (registry+https://github.com/rust-lang/crates.io-index)" = "733eb41b67c98b6818a6249de22295984a54237fcd2ba99bddf9495b268f810f"
"checksum jsonrpc-ws-server 13.2.0 (registry+https://github.com/rust-lang/crates.io-index)" = "4699433c1ac006d7df178b4c29c191e5bb6d81e2dca18c5c804a094592900101"

I don't want to call it an early victory, but it's been working for the last 10 minutes without an issue (and on prior v2.5.10 attempts it usually died within the first 5 minutes).

My easy "is RPC alive?" command continues to run without an issue:

$ while : ; do date; time curl --data '{"method":"eth_blockNumber","params":[],"id":1,"jsonrpc":"2.0"}' -H "Content-Type: application/json" -X POST localhost:8545 --silent ; sleep 30; done
Thu Nov 14 12:37:34 UTC 2019
{"jsonrpc":"2.0","result":"0x884c73","id":1}

real 0m0.009s
user 0m0.004s
sys 0m0.000s
Thu Nov 14 12:38:04 UTC 2019
{"jsonrpc":"2.0","result":"0x884c73","id":1}

real 0m0.009s
user 0m0.004s
sys 0m0.004s
Thu Nov 14 12:38:34 UTC 2019
{"jsonrpc":"2.0","result":"0x884c76","id":1}

real 0m0.009s
user 0m0.004s
sys 0m0.000s
Thu Nov 14 12:39:04 UTC 2019
{"jsonrpc":"2.0","result":"0x884c7a","id":1}

real 0m0.009s
user 0m0.004s
sys 0m0.000s
Thu Nov 14 12:39:34 UTC 2019
{"jsonrpc":"2.0","result":"0x884c7c","id":1}

real 0m0.009s
user 0m0.000s
sys 0m0.004s
Thu Nov 14 12:40:04 UTC 2019
{"jsonrpc":"2.0","result":"0x884c7d","id":1}

real 0m0.009s
user 0m0.000s
sys 0m0.004s
Thu Nov 14 12:40:34 UTC 2019
{"jsonrpc":"2.0","result":"0x884c82","id":1}

real 0m0.009s
user 0m0.004s
sys 0m0.000s
Thu Nov 14 12:41:04 UTC 2019
{"jsonrpc":"2.0","result":"0x884c88","id":1}

real 0m0.009s
user 0m0.000s
sys 0m0.004s
Thu Nov 14 12:41:34 UTC 2019
{"jsonrpc":"2.0","result":"0x884c89","id":1}

real 0m0.009s
user 0m0.004s
sys 0m0.000s
Thu Nov 14 12:42:04 UTC 2019
{"jsonrpc":"2.0","result":"0x884c8a","id":1}

real 0m0.009s
user 0m0.000s
sys 0m0.004s
Thu Nov 14 12:42:34 UTC 2019
{"jsonrpc":"2.0","result":"0x884c8a","id":1}

real 0m0.009s
user 0m0.004s
sys 0m0.000s
Thu Nov 14 12:43:04 UTC 2019
{"jsonrpc":"2.0","result":"0x884c8b","id":1}

real 0m0.009s
user 0m0.000s
sys 0m0.004s
Thu Nov 14 12:43:34 UTC 2019
{"jsonrpc":"2.0","result":"0x884c8b","id":1}

real 0m0.009s
user 0m0.000s
sys 0m0.004s
```

@vogelito ok, let's wait a few hours but if it stays stable we have a rough ballpark on where to look (or rather, a confirmation of our suspicions). I am very keen to figure out what is wrong with the released 2.5.10 though so if you can, it'd be great if we could continue debugging that.

Wrt your questions about logging, you can add add new modules to be logged in your parity configuration file or use command line switches independently, so using a snippet from your config file from above, for instance this should work:

[misc]
logging = "own_tx=trace,rpc=trace,hyper=trace"
log_file = "/var/log/parity.log"
color = true

…and for the jsonrpc settings, the equivalent to --jsonrpc-threads=0 and --jsonrpc-server-threads=1 would be:

[rpc]
processing_threads = 0
server_threads = 1

Thank you, very clear.

Your branch continues to run smoothly.

I will be happy to help you debug the latest release branch, just let me know what would you like me to try.

I will be happy to help you debug the latest release branch, just let me know what would you like me to try.

  1. leave the custom build running for another hour or two to convince us the issue is not present there, while collecting the trace-level logs outlined above
  2. run the released 2.5.10 with the same log&settings
  3. if at all possible, try 2.6.5-beta (but keep a copy of your v.2.5 DB as it's a one-way migration)

And thank you for your patience!

Oh, one more thing: can you share the output of ulimit -a when ran as the user that runs parity?

@tomusdrw anything you want to add to add to my list of things to try?

Great. I’m traveling for the next 5 hours, but so far so good.

Apologies for the delay, I got really delayed with my travels today.

Parity is running as root:

root@parity:/home/ubuntu# ps aux | grep parity
root     24684 86.6 39.4 9210732 6245720 ?     Ssl  Nov14 755:16 /usr/bin/parity --config /etc/parity/config.toml --jsonrpc-cors all

Here's the output of ulimit -a:

root@parity:/home/ubuntu# ulimit -a
core file size          (blocks, -c) 0
data seg size           (kbytes, -d) unlimited
scheduling priority             (-e) 0
file size               (blocks, -f) unlimited
pending signals                 (-i) 61822
max locked memory       (kbytes, -l) 64
max memory size         (kbytes, -m) unlimited
open files                      (-n) 1024
pipe size            (512 bytes, -p) 8
POSIX message queues     (bytes, -q) 819200
real-time priority              (-r) 0
stack size              (kbytes, -s) 8192
cpu time               (seconds, -t) unlimited
max user processes              (-u) 61822
virtual memory          (kbytes, -v) unlimited
file locks                      (-x) unlimited

Your branch has run smoothly for the last 15 hours.

I have updated my config from:

[parity]
mode = "last"
auto_update = "none"

[footprint]
tracing = "on"
pruning = "archive"
cache_size = 2048
db_compaction = "ssd"

[misc]
logging = "own_tx=trace"
log_file = "/var/log/parity.log"
color = true

[rpc]
disable = false
port = 8545
interface = "all"

[network]
min_peers = 25
max_peers = 50

to:

[parity]
mode = "last"
auto_update = "none"

[footprint]
tracing = "on"
pruning = "archive"
cache_size = 2048
db_compaction = "ssd"

[misc]
logging = "own_tx=trace,rpc=trace,hyper=trace"
log_file = "/var/log/parity_tests.log"
color = true

[rpc]
disable = false
port = 8545
interface = "all"
processing_threads = 0
server_threads = 1

[network]
min_peers = 25
max_peers = 50

I will first attempt the configuration above with the dp/fix/v2.5.10-jsonrpc-13.2 branch

Alright, dp/fix/v2.5.10-jsonrpc-13.2 worked with the new settings (you weren't kidding about the verbose output!)

I will now be running v2.5.10-stable. Will report back when it crashes.

As expected, v2.5.10-stable crashed within a few minutes.

I'm attaching the full log of both binaries (~12minutes of dp/fix/v2.5.10-jsonrpc-13.2 and ~4 minutes of v2.5.10-stable). I had to gzip it because uncompressed it was over the github file size limit.

parity_tests.log.gz

The log file starts with dp/fix/v2.5.10-jsonrpc-13.2 running:

2019-11-15 03:22:11 UTC main INFO parity_ethereum::run  Starting Parity-Ethereum/v2.5.10-stable-18efe53-20191114/x86_64-linux-gnu/rustc1.39.0

It runs for ~12 minutes without issues. I then restart the node to v2.5.10-stable:

2019-11-15 03:34:08 UTC main INFO parity_ethereum::run  Starting Parity-Ethereum/v2.5.10-stable-9f94473-20191112/x86_64-linux-gnu/rustc1.39.0

Within 3 minutes I notice the RPC is no longer responding.

From my little test:

ubuntu@parity:~$ while : ; do date; time curl --data '{"method":"eth_blockNumber","params":[],"id":1,"jsonrpc":"2.0"}' -H "Content-Type: application/json" -X POST localhost:8545 --silent ; sleep 30; done
Fri Nov 15 03:35:47 UTC 2019
{"jsonrpc":"2.0","result":"0x885afe","id":1}

real    0m0.366s
user    0m0.004s
sys 0m0.000s
Fri Nov 15 03:36:17 UTC 2019
^C
real    0m52.614s
user    0m0.004s
sys 0m0.000s

For testing purposes, please run your archive node with the following parameters (or just edit your config.toml):
```
--cache-size-db 4096 --interface=local --no-discovery --reserved-only --reserved-peers /tmp/FusionSolutionsNode

Then put this to `/tmp/FusionSolutionsNode` :

enode://e6c4debe071b3355279cb0bb0e36debb8b771d08e54e85039ae322259d299d921e1e1d068529e936e9a087f38fa186ed125044a7148ad63caddff6240a901e23@94.199.178.17:30306
```
I hope you have at least 8gb free ram, I think it will run flawless.

For testing purposes, please run your archive node with the following parameters

How is this relevant to the issue discussed here? I don't get it.

And setting a node to use only a single peer means putting a lot of trust in that peer. I don't think that is good advice.

For testing purposes, please run your archive node with the following parameters

How is this relevant to the issue discussed here? I don't get it.

And setting a node to use only a single peer means putting a lot of trust in that peer. I don't think that is good advice.

I would like to try that there is some bug in the jsonrpc, or simple the node is too busy for answer rpc requests.

@vogelito our best hypothesis so fare is that the RPC server stalls while a long running heavy request is being processed. If this is correct, your node should eventually recover and come back online (it'll just take a really long time). Do prove this we'd need you to start your node, wait until it stalls and then stop sending any new transactions to it, wait for say 5 minutes and then start pinging it with some "light" rpc such as e.g. eth_blockNumevery 30 seconds or so: it should start responding again at some point.
Let us know how that goes! Thanks!

Ok, let me try that. I will try to keep a detailed timing log.

Timing log:

2019-11-15 13:20:31 UTC: node restarted to v2.5.10-stable
2019-11-15 13:22:03 UTC: I notice stopped responding to RPC calls
2019-11-15 13:22:16 UTC: I stop "complex" RPC calls
2019-11-15 13:25:47 UTC: I notice simple RPC requests are being responded to again
2019-11-15 13:27:42 UTC: I turn on "complex" requests again
2019-11-15 13:37:50 UTC: node stopped responding again

I'm attaching the full log
parity_tests_2.log.gz

2019-11-15 13:37:50 UTC: node stopped responding again

Ok, great, so that's progress! Thank you for the help so far. Does changing the server_threads impact the amount of time the server is unresponsive at all? E.g. in the example above it took ~2 minutes for the server to clog up; does using more server threads increase that time at all?

What would you like me to set that value to?

What would you like me to set that value to?

ah, dunno, depends on your hardware ofc, but start with 4, 8, 16 etc?

I restarted the node with server_threads = 4 and it's been running for ~14 minutes.

Some RPC requests are slow (e.g. eth_blockNumber taking > 2 seconds), but it recovers and there hasn't been longer carnage so far.

Good. Sounds like you might want to test different settings, probably increase both server threads and processing threads until you find the sweet spot for your workload.

This has continued to work through the last ~10 hours.

Are there any docs on server_threads and processing_threads I can refer to?

“This” as in with 4 server threads? Good news.

The parity wiki documents all settings but admittedly without a lot of detail.

Yes, this as in 4 server_threads and 0 processing_threads

Reading the docs i can’t find references to server_threads and processing_threads. I find docs on jsonrpc-threads and jsonrpc-server-threads. ~Are these interchangeable?~ (Edit) yes, you already clarified this.

Looking here: https://wiki.parity.io/Configuring-Parity-Ethereum.html

Running this with server_threads = 4 and processing_threads = 4

Seems to be behaving really well.

I find docs on jsonrpc-threads and jsonrpc-server-threads

You're right, my bad. So --jsonrpc-threads is "processing threads" and `--jsonrpc-server-threads" are "server threads". We really need to make this much clearer in the docs. :/

Continues to work OK with 4 server threads and 4 processing threads.

Ok, so the processing_threads is actually not used anymore, it's a left-over from an old refactoring. With the next release we'll remove it and make the default server_threads value be 4 so what you're running with now should keep you in business. Should the RPCs stall again, let us know!

Thanks, I appreciate your help debugging this.

We’ll be on the lookout for the next release with the cleanup.

Was this page helpful?
0 / 5 - 0 ratings

Related issues

dukei picture dukei  ·  3Comments

BillSantos picture BillSantos  ·  3Comments

famfamfam picture famfamfam  ·  3Comments

tzapu picture tzapu  ·  3Comments

m-thomson picture m-thomson  ·  3Comments