Parity-ethereum: unreasonably high memory usage (without crash) and won't shut down

Created on 30 Jun 2019 · 48Comments · Source: openethereum/parity-ethereum

Greetings, sadly my Parity-Ethereum/v2.4.6-stable-94164e1-20190514/x86_64-linux-gnu/rustc1.34.1 node eats unreasonably high memory.
Node log and process statistics in CSV : https://www.fusionsolutions.io/doc/memlog.tar.gz

Start parameters are:

--ipc-apis all --reserved-peers /own/config/archiveEthNode.txt --no-serve-light --no-periodic-snapshot --jsonrpc-allow-missing-blocks --no-persistent-txqueue --jsonrpc-server-threads 8 --ipc-path=/own/sockets/ethNode.ipc --min-gas-price=10000000 --tx-queue-mem-limit=4096 --tx-queue-size=256000 --reseal-on-txs=all --force-sealing --base-path "/mnt/node-1/eth" --rpcport 8548 --port 30306 --no-ws --no-secretstore --cache-size 4096 --log-file /own/log/nodes/eth/parity_eth_$DATE.log"

The memory usage will not be higher as 12gb.

On 16:20:20 have killed the process with KILLSIG, this is the only way that I can shut down the process.

I glad help with any trace parameters or statistics.

F7-footprint 🐾 M4-core ⛓

Source

iFA88

All 48 comments

How did you collect the memory stats showed in the csv?

dvdplm on 30 Jun 2019

Like in python:

import psutil
process = psutil.Process(PID)
print(process.memory_info().rss)

Gives the same as the htop:
kép

iFA88 on 30 Jun 2019

So it's RSS, perfect! :)

The number of pending txs is pretty high, is that a normal amount in your setup?

dvdplm on 30 Jun 2019

Yeah i'm parsing pending transactions to my DB. Check my start parameters :)

iFA88 on 30 Jun 2019

👍1

Other than staying in sync, what is the node doing? I.e. what kind of RPC traffic is it used for?

dvdplm on 30 Jun 2019

I'm using for every new block this RPC's: trace_block eth_getBlockByNumber eth_getUncleByBlockHashAndIndex eth_blockNumber eth_getTransactionReceipt.
And for pending transactions every minute: parity_allTransactionHashes eth_getTransactionByHash.

iFA88 on 30 Jun 2019

I've been running a recent master build with your params now for ~6h and memory usage seems stable. While it's possible that this has been fixed in master, it is more probable that the leak is somewhere in the RPC layer. I need to set up some kind of load testing script to debug this further.

dvdplm on 30 Jun 2019

@iFA88 Do you have the possibility to confirm my findings by running a node without RPC traffic, just to check that it is indeed the RPC layer causing issues? Also, if you have a load testing script or something similar already written, that'd be helpful too ofc. Thanks!

dvdplm on 30 Jun 2019

In the log I see you must be running with --tracing on but that's not present in the the startup params from the original ticket. Are you using a config.toml file too? Can you post the full config please?

dvdplm on 1 Jul 2019

I have run the node without any RPC call, but the memory has increased continuously. There is the log, but please ignore the peer and pending TX values:

https://www.fusionsolutions.io/doc/memlog2.tar.gz

Without any RPC call the shutdown works very fast.

iFA88 on 1 Jul 2019

In the log I see you must be running with --tracing on but that's not present in the the startup params from the original ticket. Are you using a config.toml file too? Can you post the full config please?

You have right, tracing was ON while I synced from scratch, and after at it is automatic enabled.
No, i don't use any configuration file, only the parameters what I given in the first ticket.

iFA88 on 1 Jul 2019

Ok, not a problem. It explains why I couldn't repeat it. I'd have to slow sync the whole chain to repeat now I think so I'm going to try using the Goerli testnet and see if I can see the issue there. If you have the means to do so it would be great if you could try on Goerli as well using 2.4.x.

Thanks!

dvdplm on 1 Jul 2019

@dvdplm Sadly that not, but if you wish I can set some trace parameter.

iFA88 on 1 Jul 2019

Is your node synched?

dvdplm on 1 Jul 2019

Is your node synched?

Ofc, and you have seen that in the logs.

iFA88 on 2 Jul 2019

18.5h run:
https://www.fusionsolutions.io/doc/memlog3.tar.gz
kép

iFA88 on 2 Jul 2019

Yeah, still synching Kovan here with traces. Goerli is synched and after 12+ hours show no signs of memory leaks.

dvdplm on 2 Jul 2019

@dvdplm are you testing this on macOS? The problem could be related to heapsize, which uses jemallocator only on macOS.
@iFA88 could you test it with a recent master build? We've removed heapsize in #10432.

ordian on 2 Jul 2019

@ordian yes, and yes it is possible that this is a platform issue, but we'll see. For now I'm trying to rule out the obvious stuff. I'm not sure how long it takes to slow-sync mainnet with tracing on, but judging how long it takes on Kovan I think it could take weeks so I was hoping to find an easier way to reproduce this.

dvdplm on 2 Jul 2019

@ordian I will upgrade my parity to https://github.com/paritytech/parity-ethereum/releases/tag/v2.4.9 I see that this build has the commit.

I need to KILLSIG the process, because they dont shut down.
I have an another node on the classic chain which are not infected with the issue: (Same parity version, but it is archive node with trace)
kép

iFA88 on 2 Jul 2019

@iFA88 I don't think so, #10432 wasn't backported to stable and beta.

ordian on 2 Jul 2019

@ordian Thats not the commit?:
https://github.com/paritytech/parity-ethereum/compare/v2.4.9...master
kép
Sorry when I'm wrong.

iFA88 on 2 Jul 2019

@iFA88 you're comparing v2.4.9 with master, so it shows you the difference, i.e. the commits that are in master and not in 2.4.9.

ordian on 2 Jul 2019

@ordian yes, i was wrong! If you can build the current master branch for linux, then I can use that, sadly i don't have any build tools now.

iFA88 on 2 Jul 2019

@iFA88 I __think__ you can download a recent nightly from here (click the "Download" button on the right). It would be great if you could repeat the problem using that.

An update on my end: Goerli is synched and does not leak any memory. Kovan is still synching (and has been really stable, but that is irrelevant here).

dvdplm on 3 Jul 2019

@dvdplm Alright, I ran now that binary. Idk why, but the classic chain works flawless.

I have a trace about the shutdown, please look at it:
https://www.fusionsolutions.io/doc/shutdownerror.tar.gz

iFA88 on 3 Jul 2019

@dvdplm Alright, I ran now that binary. Idk why, but the classic chain works flawless.

You mean running with --chain classic using the master build does not leak memory? Or using stable?

I have a trace about the shutdown, please look at it: https://www.fusionsolutions.io/doc/shutdownerror.tar.gz

That is 2.4.6 so the latest fixes for shutdown problems are not included. Best would be to debug this further using the latest releases (or master builds). For shutdown issues it'd be good to enable shutdown=trace level logging. I don't think logging is going to provide enough info here, but best keep it on.

dvdplm on 3 Jul 2019

@dvdplm Yes, i have a classic node which runs in archive trace mode and the RES usage does not goes up as ~1.3gb, even not with Parity-Ethereum/v2.4.6-stable-94164e1-20190514/x86_64-linux-gnu/rustc1.34.1 or Parity-Ethereum/v2.4.9-stable-691580c-20190701/x86_64-linux-gnu/rustc1.35.0

I let the shutdown trace parameter on now and running Parity-Ethereum/v2.6.0-nightly-b4af8df-20190702/x86_64-linux-gnu/rustc1.35.0.

iFA88 on 3 Jul 2019

Sadly the new parity (Parity-Ethereum/v2.6.0-nightly-b4af8df-20190702/x86_64-linux-gnu/rustc1.35.0) doesn't solved the memory issue:
https://www.fusionsolutions.io/doc/memlog3.tar.gz
kép

iFA88 on 3 Jul 2019

Parity-Ethereum/v2.6.0-nightly

Ok, and just to be clear: you ran it with mainnet with tracing on just like before, same settings except for shutdown logging?

Did you also experience shutdown problems with Parity-Ethereum/v2.6.0-nightly?

dvdplm on 3 Jul 2019

@dvdplm yes and yes :(

iFA88 on 3 Jul 2019

Ok, so @ordian, this that tells us that this is not related to jemalloc, do you agree?

dvdplm on 3 Jul 2019

👍1

The Parity-Ethereum/v2.6.0-nightly-b4af8df-20190702/x86_64-linux-gnu/rustc1.35.0 has been crashed at the night, the process runs, I can communicate trough RPC, they current block height is 8080446, so the syncing has been stopped. There was no incident even in kernel log or syslog. Free space was more than enough. I switch back to Parity-Ethereum/v2.4.9-stable-691580c-20190701/x86_64-linux-gnu/rustc1.35.0.
Last log:

2019-07-04 00:42:30  Verifier #7 INFO import  Imported #8081174 0xc041…70bb (92 txs, 7.64 Mgas, 78 ms, 24.82 KiB)
2019-07-04 00:42:33  Verifier #8 INFO import  Imported #8081175 0x38ca…d29f (50 txs, 7.98 Mgas, 73 ms, 17.97 KiB)
2019-07-04 00:42:43  IO Worker #0 INFO import    35/50 peers    208 MiB chain  145 MiB db  0 bytes queue    7 MiB sync  RPC:  0 conn,    0 req/s,    0 µs
2019-07-04 00:42:43  Verifier #6 INFO import  Import

iFA88 on 4 Jul 2019

Ouch that doesn't sound good. When you say "crashed" do you mean that the process hung in some way or did it actually crash? I mean, you write that you could still query the node over RPC right?

I am still synching mainnet, am about half-way through but I anticipate it'll take a long while still.

I wonder if there's anyway you could share your database with us to speed up the investigation?

dvdplm on 4 Jul 2019

Ouch that doesn't sound good. When you say "crashed" do you mean that the process hung in some way or did it actually crash? I mean, you write that you could still query the node over RPC right?

I called it crashed, because the logging and the syncing has stopped. Maybe the main thread has been hanged?! Yeah i have queried the block number to check for the sync works or not.

I wonder if there's anyway you could share your database with us to speed up the investigation?

I would glad to help, but i don't see any possibility how can we speed up this. If you wish i can set some parameter for the party. If you have any ideas share it.

iFA88 on 4 Jul 2019

There is anything what I can do? The two parity which runs on main network eats my all of my RAM after 1-2 days. Daily restart is not the best solution :(

iFA88 on 14 Jul 2019

👍1

I am facing similar issues with latest Parity releases. I used to be able to sync easily and run other applications, however now after an hour or two of syncing consumes all my RAM and running other applications is not possible, even Parity alone causing the computer to lock up.

Parity used to be faster to sync and lighter on the RAM than Geth, but now I can control the RAM usage in Geth, so am looking to switch back.

andrewheadricke on 15 Jul 2019

I suggest the shutdown problem comes true when i send a shutdown signal to the node, but the node still accepts RPC calls and that prevents the shutdown process..

iFA88 on 3 Aug 2019

I have discovered, when I don't use the --cache-size parameter, then the parity RES usage doesn't goes up as 2gb. When I use that parameter with ANY number then the memory usage goes up to 14GB (probably more but i don't have more for free) in 24 hours.

iFA88 on 11 Aug 2019

Hey @dvdplm ! Can you please check my last comment with the --cache-size issue? Thank you!

iFA88 on 14 Sep 2019

@iFA88 apologies for the late answer. I have not been able to reproduce the problem with ram usage and --cache-size and I have tried many different versions and chains. On my machine, running macOS and 32Gb, memory usage is very stable. I know this is kind of useless and it's much more interesting to see what happens on a machine with less ram.
What happens on your end if you run with the other caching-related switches, i.e. this is what I am running currently: --cache-size-db=32096 --cache-size-blocks=2048 --cache-size-queue=32512 --cache-size-state=16096 (don't read too much into the specific numbers, I mostly picked them at random tbh). Do you still see RES ballooning after a while?

dvdplm on 14 Sep 2019

@dvdplm Do we have any command to get cache statuses (usable/limit) or any debug level/trace?

iFA88 on 14 Sep 2019

No, not that I know of. It would be quite useful.

dvdplm on 14 Sep 2019

I'm running now with --cache-size-blocks=128 --cache-size-db=2048 parameters. I dont use --cache-size now.

iFA88 on 14 Sep 2019

The node uses now 9150mb RES after 2 days with the above parameters.

iFA88 on 16 Sep 2019

So I think I'm seeing something similar here: omitting the --cache* parameters seem to keep memory usage within limits. What I see here is that the sync speed slows down significantly as memory usage goes up (after restart the sync speed goes back up). So until we fix the bug I'd say the best work-around seems to be to avoid using those params.

dvdplm on 16 Sep 2019

I can not measure the importing speed because every block has very different EVM calls. I will try now using the --cache-size again to check the issue.

iFA88 on 16 Sep 2019

Ok, It seems the issue is somehow solved, When I'm using --cache-size the process RES usage doens't goes up 7-9 GB ( with --cache-size 2048 ). If i face with this issue again i will reopen the thread. Thanks for the support!

iFA88 on 21 Sep 2019

Was this page helpful?

0 / 5 - 0 ratings