Jormungandr: Suspected file descriptor leak (ERRO Error while accepting connection on 0.0.0.0:3000: Os)

Created on 11 Oct 2019  路  12Comments  路  Source: input-output-hk/jormungandr

Describe the bug
First error in logs are "ERRO Error while accepting connection on 0.0.0.0:3000: Os { code: 24, kind: Other, message: "Too many open files" }, task: network", after a while its starts spamming "ERRO cannot send PullHeaders request to network: send failed because receiver is gone, task: block" and node stops receving blocks

Mandatory Information

  1. jcli 0.5.6 (HEAD-7ab929e+, release, linux [x86_64]) - [rustc 1.38.0 (625451e37 2019-09-23)]
  2. jormungandr 0.5.6 (HEAD-7ab929e+, release, linux [x86_64]) - [rustc 1.38.0 (625451e37 2019-09-23)]

To Reproduce
Steps to reproduce the behavior:
Dont know how to reproduce, happens after the node has run a few hours

Expected behavior
No error

Additional context
See appendix for output logs and strace logs. I have four examples as I happens every time I try to run my node.

errPullHeader.zip

bug question Priority - High

Most helpful comment

Massively duplicate connections are still present; working on a better fix.

All 12 comments

Thank you for the captures, will investigate.

Massively duplicate connections are still present; working on a better fix.

I'm running 0.6.0 with strace, will serve you fresh logs when(/if?) it crashes :)

Error just happened on my other server, running a other setup with a old-style mechanical HDD, dont know if it is relevant. Running v0.6.1

Here are the logs:
err_24.zip

Just happened on my other server, so its not hardware related as this one uses a SSD. Still on v0.6.1

Logs:
err_24_2.zip

@bjarnekvae it's not storage related, it is a network subsystem issue as indicated in the log : task: network"

One probable cause has been fixed in #981 (and fixes of the fix that have followed).
Please retry with version 0.6.5 or later.

running 0.6.5 with strace now

This error just came back in v0.8.0-rc2

node_log.log

Strace log:
tooManyFiles.zip

@mzabaluev This is still relevant for v0.8.0-rc5

ERRO Error while accepting connection on 0.0.0.0:3001: Os { code: 24, kind: Other, message: "Too many open files" }, task: network

version: jormungandr 0.8.5-3db06807

Was this page helpful?
0 / 5 - 0 ratings