I'm running:
- Which Parity version?: 1.11.1
- Which operating system?: Linux, Ubuntu 16.04
- How installed?: from source
- Are you fully synchronized?: no
- Which network are you connected to?: ethereum
- Did you try to restart the node?: yes
While running analysis program over IPC, parity started crashing due to Linux OOM killer (like in https://github.com/paritytech/parity/issues/8618). I checked around and based on https://github.com/paritytech/parity/issues/6516 took a look at lsof on the parity program and it shows a huge number of open sockets:
# lsof -p 11486 | wc -l
238699
<some time passes>
# lsof -p 11486 | wc -l
308174
The program doing the analysis isn't optimized and opens and closes an IPC socket for each request. But while it is itself working stupidly, it is correctly closing the sockets (lsof shows only one active IPC socket open at a time on the analysis program, e.g. the one currently being used).
Thus it seems that parity is not properly closing IPC sockets after they have been closed by the remote end. I killed the analysis program, e.g. there is no IPC/RPC activity and the number of open files on parity did not decrease (but didn't increase either).
(The memory use goes up simultanously, probably due to resources such as buffers etc. associated with the open sockets. At the end of the test parity was hogging 8GB of memory.)
The files left open on parity are IPC sockets:
parity 11486 parity 494u unix 0xffff88031bc82000 0t0 4077212 /srv/parity/parity.ipc type=STREAM
parity 11486 parity 495u unix 0xffff880485db1c00 0t0 4077428 /srv/parity/parity.ipc type=STREAM
parity 11486 parity 496u unix 0xffff8803b8283800 0t0 4080870 /srv/parity/parity.ipc type=STREAM
parity 11486 parity 497u unix 0xffff88028cbff000 0t0 4077072 /srv/parity/parity.ipc type=STREAM
parity 11486 parity 498u unix 0xffff8803c6fa1400 0t0 4081311 /srv/parity/parity.ipc type=STREAM
parity 11486 parity 499u unix 0xffff8804897d8c00 0t0 4083075 /srv/parity/parity.ipc type=STREAM
parity 11486 parity 500u unix 0xffff8803afe16800 0t0 4081886 /srv/parity/parity.ipc type=STREAM
parity 11486 parity 501u unix 0xffff88048b5fa800 0t0 4083255 /srv/parity/parity.ipc type=STREAM
I can reproduce this reliably by just opening and closing the IPC socket, even with a shell script (give the socket path as argument):
#!/bin/sh -e
for i in `seq 0 1000`
do
echo $i
nc -U $1 </dev/null >/dev/null &
pid=$!
sleep 0.1
kill $pid
done
Monitoring the number of files before starting the script shows a ~650 files open quite stable:
# while lsof -n -p 18186 | wc -l; do; sleep 5; done
630
651
647
652
648
656
When the script is started the numbers start to go up and do not come down even after the script terminates:
753
820
860
917
944
1007
1051
1116
1157
1215
1254
1315
1369
1409
1467
1517
1536
1592
1635
1631
1630
So it seems quite clear that parity is not properly either detecting that the remote end has closed the IPC socket, or not properly closing them after that.
I also observed the same problem with 1.10.1. Also, this issue seems to affect only IPC sockets as using HTTP instead will not cause a file descriptor leakage.
thanks for the report!
This sounds very much like it might be the underlying cause of the errors we've seen and discussed in #6791 .
Thanks @santtu for the specific details, I'll try to reproduce this.
I can confirm Parity (on master) leaking ipc-socket-connections, when they are closed unexepectedly as per the script provided. Currently trying to create a minimal replicatable example to see if this is in parity/jsonrpc crate or within the local code base.
Issues seems to be on the side of parity or the specific way we use jsonrpc. With the script against a minimal rpc socket (gist), I see that every connection is properly closed:
TRACE 2018-06-04T18:28:29Z: jsonrpc_ipc_server::server: Accepted incoming IPC connection: 1
TRACE 2018-06-04T18:28:29Z: jsonrpc_ipc_server::server: Peer: service finished
TRACE 2018-06-04T18:28:29Z: jsonrpc_ipc_server::server: Accepted incoming IPC connection: 2
TRACE 2018-06-04T18:28:29Z: jsonrpc_ipc_server::server: Peer: service finished
TRACE 2018-06-04T18:28:29Z: jsonrpc_ipc_server::server: Accepted incoming IPC connection: 3
TRACE 2018-06-04T18:28:29Z: jsonrpc_ipc_server::server: Peer: service finished
TRACE 2018-06-04T18:28:29Z: jsonrpc_ipc_server::server: Accepted incoming IPC connection: 4
TRACE 2018-06-04T18:28:30Z: jsonrpc_ipc_server::server: Peer: service finished
A message, we are lacking, when run against Parity master (RUST_LOG=ipc=trace cargo run)
2018-06-04 20:36:03 TRACE ipc Accepted incoming IPC connection: 1035
2018-06-04 20:36:03 TRACE ipc Accepted incoming IPC connection: 1036
2018-06-04 20:36:03 TRACE ipc Accepted incoming IPC connection: 1037
2018-06-04 20:36:03 TRACE ipc Accepted incoming IPC connection: 1038
2018-06-04 20:36:03 TRACE ipc Accepted incoming IPC connection: 1039
2018-06-04 20:36:03 TRACE ipc Accepted incoming IPC connection: 1040
2018-06-04 20:36:03 TRACE ipc Accepted incoming IPC connection: 1041
2018-06-04 20:36:04 TRACE ipc Accepted incoming IPC connection: 1042
2018-06-04 20:36:04 TRACE ipc Accepted incoming IPC connection: 1043
2018-06-04 20:36:04 TRACE ipc Accepted incoming IPC connection: 1044
2018-06-04 20:36:04 TRACE ipc Accepted incoming IPC connection: 1045
2018-06-04 20:36:04 TRACE ipc Accepted incoming IPC connection: 1046
2018-06-04 20:36:04 TRACE ipc Accepted incoming IPC connection: 1047
2018-06-04 20:36:04 TRACE ipc Accepted incoming IPC connection: 1048
2018-06-04 20:36:04 TRACE ipc Accepted incoming IPC connection: 1049
2018-06-04 20:36:04 TRACE ipc Accepted incoming IPC connection: 1050
2018-06-04 20:36:05 TRACE ipc Accepted incoming IPC connection: 1051
2018-06-04 20:36:05 TRACE ipc Accepted incoming IPC connection: 1052
We are leaking those connections within parity.
We are leaking those connections within parity.
Small update. I figured out the root Arc causing the leak. Our extractor, run at the establishing of every ipc request, also creates a Session for PubSub-Facilities. However, this Arc is somewhere not being returned properly (or we might even have a cycle somewhere) and stays around even after the connection has closed. However, as the closing-code is supposted to be run during drop and that never happens because someone still holds a strong reference to that Arc (not yet figured out where that happens), drop never happens, thus clean-up never takes place.
Simple way to show this being the cause: replace this Some(...) with None and you'll see the leak is gone . The leak happens within or in interaction with that Session-Arc.
I was able to create an example with jsonrpc alone in the way the objects are structured there. Therefore I consider this a jsonrpc bug (as it must be fixed there) and not strictly parity (as the underlying crate causes the issue although it is being used properly by parity).
working on a solution now
Most helpful comment
working on a solution now