Js-ipfs: HTTP daemon performance is low

Created on 29 Dec 2020  路  5Comments  路  Source: ipfs/js-ipfs

  • Version:
    ipfs-http-server v0.1.4

  • Platform:

    • Darwin feather 20.2.0 Darwin Kernel Version 20.2.0: Wed Dec 2 20:39:59 PST 2020; root:xnu-7195.60.75~1/RELEASE_X86_64 x86_64
    • Node.js v14.15.1
  • Subsystem:
    ipfs-http-daemon

Severity:

Medium: Performance Issues

Description:

JS-IPFS HTTP daemon refuses any new HTTP connection over certain threshold, which on my machine is 20. I believe, Node.js is a bit more performant than this.

As far as I understand, js-ipfs http daemon is built on top of HAPI, which apparently is one of the slowest HTTP frameworks for Nodejs. I do believe, this commands the inability to handle more than 20 simultaneous connections. As an experiment, I got dag.put HTTP endpoint implemented on top of Fastify, with same IPFS instance configuration. Handles 1000 simultaneously initiated connections just fine.

I do understand, that one could put something like HAProxy in front, and that JS-IPFS maybe has never been oriented towards performance. Though, it feels like you guys should be aware of the issue.

Steps to reproduce the error:

Adjust TIMES variable in https://gist.github.com/ukstv/25f77d94113f32c0b2d200f8f1e0c3a1 to 20, and run it against local js-ipfs instance. Works all right, reports no refused connections. If you set it back to 1000, then a huge part of the connections would be refused.

kinquestion

Most helpful comment

It's important to set a baseline for performance expectations. In this gist I set up an HTTP server and use a client to make requests to it, using only node core - no HTTP framework and no HTTP client abstractions.

Running on Mac OS X 10.15.6 I see the following:

$ node max-concurrent-requests.js
testing 100 concurrent requests
max in flight 100
testing 111 concurrent requests
max in flight 111
testing 123 concurrent requests
max in flight 123
testing 136 concurrent requests
request 125 failed
max in flight 0
Error: connect ECONNRESET 127.0.0.1:51658
    at TCPConnectWrap.afterConnect [as oncomplete] (net.js:1137:16) {
  errno: 'ECONNRESET',
  code: 'ECONNRESET',
  syscall: 'connect',
  address: '127.0.0.1',
  port: 51658
}

For me, after a few runs it starts to fail somewhere between 120-140 concurrent requests on average.

The same code run on Linux is much better:

$ node max-concurrent-requests.js
testing 100 concurrent requests
max in flight 100
testing 111 concurrent requests
max in flight 111
testing 123 concurrent requests
max in flight 123
... output omitted
testing 2027 concurrent requests
max in flight 2027
testing 2230 concurrent requests
request 2048 failed
... output omitted
max in flight 0
Error: connect EMFILE 127.0.0.1:38212 - Local (undefined:undefined)
    at internalConnect (net.js:923:16)
    at defaultTriggerAsyncIdScope (internal/async_hooks.js:323:12)
    at net.js:1011:9
    at processTicksAndRejections (internal/process/task_queues.js:79:11) {
  errno: 'EMFILE',
  code: 'EMFILE',
  syscall: 'connect',
  address: '127.0.0.1',
  port: 38212
}

Over 2000 concurrent requests before it gets an EMFILE, likely because it's hit a limit on how many files a process can have open.

Why the ECONNRESETs though? A cursory google reveals lots of 'Why does my networking code work on Linux but ECONNRESET on OS X?'

There are two interesting parameters, one is the TCP connection backlog (511 by default, set as a parameter to server.listen and the other is somaxconn. On OS X it's set to 128 by default and is probably why I get so few concurrent requests compared to Linux. From what I understand, the value is 128 to give you some sort of protection against SYN flood attacks.

It's also set to 128 on Linux but the max concurrent connections seems to be limited by the process ulimit -n so there may be something else at play on that platform.

Anyway, for OS X we can increase this until the next reboot with:

$ sudo sysctl kern.ipc.somaxconn=2048
kern.ipc.somaxconn: 128 -> 2048

Now run the test again and also increase the connection backlog to least 2048:

$ node max-concurrent-requests.js
testing 100 concurrent requests
max in flight 100
testing 111 concurrent requests
max in flight 111
testing 123 concurrent requests
max in flight 123
... output omitted
testing 2027 concurrent requests
max in flight 2027
testing 2230 concurrent requests
request 2047 failed
max in flight 0
Error: connect ECONNRESET 127.0.0.1:52661
    at TCPConnectWrap.afterConnect [as oncomplete] (net.js:1137:16) {
  errno: 'ECONNRESET',
  code: 'ECONNRESET',
  syscall: 'connect',
  address: '127.0.0.1',
  port: 52661
}

Great! Now we have way more concurrent requests, similar to Linux.

On to the benchmarks. I've taken your test and modified it slightly to

a) create the random data before we start making connections as it's not free
b) to incorporate the changes from #3474 which gives a bit more control over the behaviour of the http client.

It also needs this .diff applied to node_modules/ipfs-utils/src/http.js to tell us the number of requests in flight at any one time.

On MacOS X with 1000 requests, maxSockets: 100 and keepAlive: true on the agent I see:

$ node index.js 
Set {}
0/1000
max in flight 1
took 7324 ms

So no errors and it completed in 7.3 seconds.

With 150 sockets I see:

$ node index.js 
Set {}
0/1000
max in flight 1
took 6860 ms

With 200 sockets I see:

$ node index.js 
Set {}
0/1000
max in flight 1
took 6752 ms

Something to note here is that max in flight is only ever 1 - the API server is responding too quickly so we don't have multiple requests open, we need to make another change to validate what's going on.

Apply this diff to node_modules/ipfs-http-server/src/api/resources/dag.js to add 100ms latency to every ipfs.dag.put request and we see max in flight start to increase in line with maxSockets:

maxSockets: 100

$ node concurrent-requests.js
Set {}
0/1000
max in flight 100
took 7304 ms

maxSockets: 150

$ node concurrent-requests.js
Set {}
0/1000
max in flight 150
took 6847 ms

maxSockets: 200

$ node concurrent-requests.js
165/1000 FetchError: request to http://localhost:5002/api/v0/dag/put?format=dag-cbor&input-enc=raw&hash=sha2-256 failed, reason: connect ECONNRESET 127.0.0.1:5002
    at ClientRequest.<anonymous> (/Users/alex/test/http/node_modules/node-fetch/lib/index.js:1461:11)
    at ClientRequest.emit (events.js:323:22)
    at Socket.socketErrorListener (_http_client.js:426:9)
    at Socket.emit (events.js:311:20)
    at emitErrorNT (internal/streams/destroy.js:92:8)
    at emitErrorAndCloseNT (internal/streams/destroy.js:60:3)
    at processTicksAndRejections (internal/process/task_queues.js:84:21) {
  message: 'request to http://localhost:5002/api/v0/dag/put?format=dag-cbor&input-enc=raw&hash=sha2-256 failed, reason: connect ECONNRESET 127.0.0.1:5002',
  type: 'system',
  errno: 'ECONNRESET',
  code: 'ECONNRESET'
}
Set { 165 }
1/1000
max in flight 200
took 6850 ms

Let's increase kern.ipc.somaxconn=2048, maxSockets: Infinity and keepAlive: false:

$ node concurrent-requests.js
Set {}
0/1000
max in flight 1000
took 7135 ms

kern.ipc.somaxconn=2048, maxSockets: Infinity, keepAlive: true:

$ node concurrent-requests.js 
Set {}
0/1000
max in flight 1000
took 6301 ms

So, we can increase the number of incoming connections, but only by tweaking system parameters which seems a little unreasonable. A better solution would be to limit the number of concurrent connections used by the http client in node through the use of a http.Agent, which is the purpose of #3474.

All 5 comments

Thank you for submitting your first issue to this repository! A maintainer will be here shortly to triage and review.
In the meantime, please double-check that you have provided all the necessary information to make this process easy! Any information that can help save additional round trips is useful! We currently aim to give initial feedback within two business days. If this does not happen, feel free to leave a comment.
Please keep an eye on how this issue will be labeled, as labels give an overview of priorities, assignments and additional actions requested by the maintainers:

  • "Priority" labels will show how urgent this is for the team.
  • "Status" labels will show if this is ready to be worked on, blocked, or in progress.
  • "Need" labels will indicate if additional input or analysis is required.

Finally, remember to use https://discuss.ipfs.io if you just need general support.

We're looking into this.

May be related to https://github.com/ipfs/js-ipfs/issues/3469

It's important to set a baseline for performance expectations. In this gist I set up an HTTP server and use a client to make requests to it, using only node core - no HTTP framework and no HTTP client abstractions.

Running on Mac OS X 10.15.6 I see the following:

$ node max-concurrent-requests.js
testing 100 concurrent requests
max in flight 100
testing 111 concurrent requests
max in flight 111
testing 123 concurrent requests
max in flight 123
testing 136 concurrent requests
request 125 failed
max in flight 0
Error: connect ECONNRESET 127.0.0.1:51658
    at TCPConnectWrap.afterConnect [as oncomplete] (net.js:1137:16) {
  errno: 'ECONNRESET',
  code: 'ECONNRESET',
  syscall: 'connect',
  address: '127.0.0.1',
  port: 51658
}

For me, after a few runs it starts to fail somewhere between 120-140 concurrent requests on average.

The same code run on Linux is much better:

$ node max-concurrent-requests.js
testing 100 concurrent requests
max in flight 100
testing 111 concurrent requests
max in flight 111
testing 123 concurrent requests
max in flight 123
... output omitted
testing 2027 concurrent requests
max in flight 2027
testing 2230 concurrent requests
request 2048 failed
... output omitted
max in flight 0
Error: connect EMFILE 127.0.0.1:38212 - Local (undefined:undefined)
    at internalConnect (net.js:923:16)
    at defaultTriggerAsyncIdScope (internal/async_hooks.js:323:12)
    at net.js:1011:9
    at processTicksAndRejections (internal/process/task_queues.js:79:11) {
  errno: 'EMFILE',
  code: 'EMFILE',
  syscall: 'connect',
  address: '127.0.0.1',
  port: 38212
}

Over 2000 concurrent requests before it gets an EMFILE, likely because it's hit a limit on how many files a process can have open.

Why the ECONNRESETs though? A cursory google reveals lots of 'Why does my networking code work on Linux but ECONNRESET on OS X?'

There are two interesting parameters, one is the TCP connection backlog (511 by default, set as a parameter to server.listen and the other is somaxconn. On OS X it's set to 128 by default and is probably why I get so few concurrent requests compared to Linux. From what I understand, the value is 128 to give you some sort of protection against SYN flood attacks.

It's also set to 128 on Linux but the max concurrent connections seems to be limited by the process ulimit -n so there may be something else at play on that platform.

Anyway, for OS X we can increase this until the next reboot with:

$ sudo sysctl kern.ipc.somaxconn=2048
kern.ipc.somaxconn: 128 -> 2048

Now run the test again and also increase the connection backlog to least 2048:

$ node max-concurrent-requests.js
testing 100 concurrent requests
max in flight 100
testing 111 concurrent requests
max in flight 111
testing 123 concurrent requests
max in flight 123
... output omitted
testing 2027 concurrent requests
max in flight 2027
testing 2230 concurrent requests
request 2047 failed
max in flight 0
Error: connect ECONNRESET 127.0.0.1:52661
    at TCPConnectWrap.afterConnect [as oncomplete] (net.js:1137:16) {
  errno: 'ECONNRESET',
  code: 'ECONNRESET',
  syscall: 'connect',
  address: '127.0.0.1',
  port: 52661
}

Great! Now we have way more concurrent requests, similar to Linux.

On to the benchmarks. I've taken your test and modified it slightly to

a) create the random data before we start making connections as it's not free
b) to incorporate the changes from #3474 which gives a bit more control over the behaviour of the http client.

It also needs this .diff applied to node_modules/ipfs-utils/src/http.js to tell us the number of requests in flight at any one time.

On MacOS X with 1000 requests, maxSockets: 100 and keepAlive: true on the agent I see:

$ node index.js 
Set {}
0/1000
max in flight 1
took 7324 ms

So no errors and it completed in 7.3 seconds.

With 150 sockets I see:

$ node index.js 
Set {}
0/1000
max in flight 1
took 6860 ms

With 200 sockets I see:

$ node index.js 
Set {}
0/1000
max in flight 1
took 6752 ms

Something to note here is that max in flight is only ever 1 - the API server is responding too quickly so we don't have multiple requests open, we need to make another change to validate what's going on.

Apply this diff to node_modules/ipfs-http-server/src/api/resources/dag.js to add 100ms latency to every ipfs.dag.put request and we see max in flight start to increase in line with maxSockets:

maxSockets: 100

$ node concurrent-requests.js
Set {}
0/1000
max in flight 100
took 7304 ms

maxSockets: 150

$ node concurrent-requests.js
Set {}
0/1000
max in flight 150
took 6847 ms

maxSockets: 200

$ node concurrent-requests.js
165/1000 FetchError: request to http://localhost:5002/api/v0/dag/put?format=dag-cbor&input-enc=raw&hash=sha2-256 failed, reason: connect ECONNRESET 127.0.0.1:5002
    at ClientRequest.<anonymous> (/Users/alex/test/http/node_modules/node-fetch/lib/index.js:1461:11)
    at ClientRequest.emit (events.js:323:22)
    at Socket.socketErrorListener (_http_client.js:426:9)
    at Socket.emit (events.js:311:20)
    at emitErrorNT (internal/streams/destroy.js:92:8)
    at emitErrorAndCloseNT (internal/streams/destroy.js:60:3)
    at processTicksAndRejections (internal/process/task_queues.js:84:21) {
  message: 'request to http://localhost:5002/api/v0/dag/put?format=dag-cbor&input-enc=raw&hash=sha2-256 failed, reason: connect ECONNRESET 127.0.0.1:5002',
  type: 'system',
  errno: 'ECONNRESET',
  code: 'ECONNRESET'
}
Set { 165 }
1/1000
max in flight 200
took 6850 ms

Let's increase kern.ipc.somaxconn=2048, maxSockets: Infinity and keepAlive: false:

$ node concurrent-requests.js
Set {}
0/1000
max in flight 1000
took 7135 ms

kern.ipc.somaxconn=2048, maxSockets: Infinity, keepAlive: true:

$ node concurrent-requests.js 
Set {}
0/1000
max in flight 1000
took 6301 ms

So, we can increase the number of incoming connections, but only by tweaking system parameters which seems a little unreasonable. A better solution would be to limit the number of concurrent connections used by the http client in node through the use of a http.Agent, which is the purpose of #3474.

@achingbrain that鈥檚 a very rigorous analysis, thank you. So, it means two ways of dealing with the refused connections now: either increase os-level parameters, or wait till #3474 is released. Ideally, both should be applied, as they belong to different sides of data flow. When is the release then? :)

As soon as node 14 stops making my life interesting and the build passes 馃槈

Tomorrow, all things going well.

Was this page helpful?
0 / 5 - 0 ratings