Describe the bug
Operating system killed the jormungandr process due to an out-of-memory error after running overnight (on a Raspberry Pi Zero).
Mandatory Information
$ jcli --full-version
jcli 0.5.6 (HEAD-7ab929e+, release, linux [arm]) - [rustc 1.38.0 (625451e37 2019-09-23)]
$ jormungandr --full-version
jormungandr 0.5.6 (HEAD-7ab929e+, release, linux [arm]) - [rustc 1.38.0 (625451e37 2019-09-23)]
Additional context
Raspberry Pi Zero Rev 1.3, 512MB RAM.
Linux version 4.4.21+ (dc4@dc4-XPS13-9333) (gcc version 4.9.3 (crosstool-NG crossto
ol-ng-1.22.0-88-g8460611) ) _redacted_ Thu Sep 15 14:17:52 BST 2016
(Raspbian Jessie: I've not got round to updating this yet - this is a spare Pi that I dug out :))
jormungandr using up all available memory on the linux x86_64 rustc version too. Memory jumped up along with the block info messages, then peer connection error.
Oct 09 21:37:12.000 INFO update current branch tip, date: 229.22567, parent: xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx, hash: xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx, task: block
...
Oct 09 21:37:24.000 INFO error connecting to peer, reason: Too many open files (os error 24), peer_addr: x.x.x.x:xxx, task: network
@lednersin What is the jormungandr version you were using? What is the limit on the number of file descriptor (like ulimit -n) for the process?
Mine seem to stabilise at ~500MB (584MB on my Mac laptop), but are not receiving blocks. Mac details below. I've passed on more info to Nicolas
and will keep them running.
Darwin Mercury4 18.7.0 Darwin Kernel Version 18.7.0: Tue Aug 20 16:57:14 PDT 2019; root:xnu-4903.271.2~2/RELEASE_X86_64 x86_64
jcli 0.5.6+lock (master-f0e869c2, release, macos [x86_64]) - [rustc 1.38.0 (625451e37 2019-09-23)]
jormungandr 0.5.6+lock (master-f0e869c2, release, macos [x86_64]) - [rustc 1.38.0 (625451e37 2019-09-23)]
On 10 Oct 2019, at 07:07, Mikhail Zabaluev notifications@github.com wrote:
@lednersin What is the jormungandr version you were using? What is the limit on the number of file descriptor (like ulimit -n) for the process?
—
You are receiving this because you authored the thread.
Reply to this email directly, view it on GitHub, or unsubscribe.
Best Wishes,
Kevin
E: kevinhammond.net, W: http://www.kevinhammond.net
This email reflects the opinions of the individual concerned, may contain confidential or copyright information that should not be copied or distributed without permission, may be of a private or personal nature unless explicitly indicated otherwise, and should not under any circumstances be taken as an official statement of policy or procedure.
One thing to experiment with quickly is to use a recently added configuration setting p2p.max_connections to limit the number of open peer connections. If you still see file descriptors hogged far beyond the limit (check on Linux with lsof -i -n -c jormungandr), then we didn't do our homework correctly and I must investigate.
The above does not exclude the possibility of a memory leak, though we don't see this issue much noticeably yet on less memory-constrained platforms.
Will try. As I said to @nicolas, this is a stress test system (I have larger ones) - I was just using all my spare systems as test nodes. I won’t be heartbroken if it is just too small to use in practice! One issue on the Pi is that virtual memory is horrible since it will normally use an SD Card so you really have to keep everything in physical memory... My Pi 3 seems to have stabilised at 476M RSS, 543M Virt
(was up to 494M RSS) but it has 1GB available. It ran for a couple of days but not seen any blocks yet.
I don’t think that is a port reuse issue as in #860, since I have generally been starting up on new port numbers.
Sent from my iPhone
On 11 Oct 2019, at 10:23, Mikhail Zabaluev notifications@github.com wrote:
One thing to experiment with quickly is to use a recently added configuration setting p2p.max_connections to limit the number of open peer connections. If you still see file descriptors hogged far beyond the limit (check on Linux with lsof -i -n -c jormungandr), then we didn't do our homework correctly and I must investigate.
The above does not exclude the possibility of a memory leak, though we don't see this issue much noticeably yet on less memory-constrained platforms.
—
You are receiving this because you authored the thread.
Reply to this email directly, view it on GitHub, or unsubscribe.
@mzabaluev the latest 0.5.6 version, the file limit was 1024 at that time (the default soft limit for Debian, Fedora and many derivatives).
However I tried it with an with an increased 16k file limit. The block error I have wrote may have been just correlation, not causation. With the 16k limit jormungandr still jumped up to max. during the last day.
If you want to correlate with any network activity, i noticed three jumps at:
from 540 MB to 670 MB at 2019-10-10 between 10:30Z-11:00Z
from 670 MB to-870 MB at 2019-10-10 between 14:30Z-15:00Z
variations.between 870-960 MB
from 960 MB to 2 GB (max I allowed for the process) at 2019-10-11 between 06:00Z-07:00Z
(sorry, couldn't reply with the shorter name)
I've got the same issue (as reported here) but for me it slowly uses more and more memory (a few mb every hour), then suddenly jumps from like 350mb to 2gb in a matter of minutes. I've been running the process with the log level set to debug, hopefully you guys can use it help finding out the cause of the problem. I've uploaded the logs to my google drive here, hope it can help.
Cheers!
Thanks, it seems the memory issue was due to a bug in p2p poldercast module. Before the network policy was implemented we didn't dispose of the other peers fast enough.
Most helpful comment
One thing to experiment with quickly is to use a recently added configuration setting
p2p.max_connectionsto limit the number of open peer connections. If you still see file descriptors hogged far beyond the limit (check on Linux withlsof -i -n -c jormungandr), then we didn't do our homework correctly and I must investigate.The above does not exclude the possibility of a memory leak, though we don't see this issue much noticeably yet on less memory-constrained platforms.