Version:
ipfs-http-server v0.1.4
Platform:
Subsystem:
ipfs-http-daemon
Medium: Performance Issues
JS-IPFS HTTP daemon refuses any new HTTP connection over certain threshold, which on my machine is 20. I believe, Node.js is a bit more performant than this.
As far as I understand, js-ipfs http daemon is built on top of HAPI, which apparently is one of the slowest HTTP frameworks for Nodejs. I do believe, this commands the inability to handle more than 20 simultaneous connections. As an experiment, I got dag.put HTTP endpoint implemented on top of Fastify, with same IPFS instance configuration. Handles 1000 simultaneously initiated connections just fine.
I do understand, that one could put something like HAProxy in front, and that JS-IPFS maybe has never been oriented towards performance. Though, it feels like you guys should be aware of the issue.
Adjust TIMES variable in https://gist.github.com/ukstv/25f77d94113f32c0b2d200f8f1e0c3a1 to 20, and run it against local js-ipfs instance. Works all right, reports no refused connections. If you set it back to 1000, then a huge part of the connections would be refused.
Thank you for submitting your first issue to this repository! A maintainer will be here shortly to triage and review.
In the meantime, please double-check that you have provided all the necessary information to make this process easy! Any information that can help save additional round trips is useful! We currently aim to give initial feedback within two business days. If this does not happen, feel free to leave a comment.
Please keep an eye on how this issue will be labeled, as labels give an overview of priorities, assignments and additional actions requested by the maintainers:
Finally, remember to use https://discuss.ipfs.io if you just need general support.
We're looking into this.
May be related to https://github.com/ipfs/js-ipfs/issues/3469
It's important to set a baseline for performance expectations. In this gist I set up an HTTP server and use a client to make requests to it, using only node core - no HTTP framework and no HTTP client abstractions.
Running on Mac OS X 10.15.6 I see the following:
$ node max-concurrent-requests.js
testing 100 concurrent requests
max in flight 100
testing 111 concurrent requests
max in flight 111
testing 123 concurrent requests
max in flight 123
testing 136 concurrent requests
request 125 failed
max in flight 0
Error: connect ECONNRESET 127.0.0.1:51658
at TCPConnectWrap.afterConnect [as oncomplete] (net.js:1137:16) {
errno: 'ECONNRESET',
code: 'ECONNRESET',
syscall: 'connect',
address: '127.0.0.1',
port: 51658
}
For me, after a few runs it starts to fail somewhere between 120-140 concurrent requests on average.
The same code run on Linux is much better:
$ node max-concurrent-requests.js
testing 100 concurrent requests
max in flight 100
testing 111 concurrent requests
max in flight 111
testing 123 concurrent requests
max in flight 123
... output omitted
testing 2027 concurrent requests
max in flight 2027
testing 2230 concurrent requests
request 2048 failed
... output omitted
max in flight 0
Error: connect EMFILE 127.0.0.1:38212 - Local (undefined:undefined)
at internalConnect (net.js:923:16)
at defaultTriggerAsyncIdScope (internal/async_hooks.js:323:12)
at net.js:1011:9
at processTicksAndRejections (internal/process/task_queues.js:79:11) {
errno: 'EMFILE',
code: 'EMFILE',
syscall: 'connect',
address: '127.0.0.1',
port: 38212
}
Over 2000 concurrent requests before it gets an EMFILE, likely because it's hit a limit on how many files a process can have open.
Why the ECONNRESETs though? A cursory google reveals lots of 'Why does my networking code work on Linux but ECONNRESET on OS X?'
There are two interesting parameters, one is the TCP connection backlog (511 by default, set as a parameter to server.listen and the other is somaxconn. On OS X it's set to 128 by default and is probably why I get so few concurrent requests compared to Linux. From what I understand, the value is 128 to give you some sort of protection against SYN flood attacks.
It's also set to 128 on Linux but the max concurrent connections seems to be limited by the process ulimit -n so there may be something else at play on that platform.
Anyway, for OS X we can increase this until the next reboot with:
$ sudo sysctl kern.ipc.somaxconn=2048
kern.ipc.somaxconn: 128 -> 2048
Now run the test again and also increase the connection backlog to least 2048:
$ node max-concurrent-requests.js
testing 100 concurrent requests
max in flight 100
testing 111 concurrent requests
max in flight 111
testing 123 concurrent requests
max in flight 123
... output omitted
testing 2027 concurrent requests
max in flight 2027
testing 2230 concurrent requests
request 2047 failed
max in flight 0
Error: connect ECONNRESET 127.0.0.1:52661
at TCPConnectWrap.afterConnect [as oncomplete] (net.js:1137:16) {
errno: 'ECONNRESET',
code: 'ECONNRESET',
syscall: 'connect',
address: '127.0.0.1',
port: 52661
}
Great! Now we have way more concurrent requests, similar to Linux.
On to the benchmarks. I've taken your test and modified it slightly to
a) create the random data before we start making connections as it's not free
b) to incorporate the changes from #3474 which gives a bit more control over the behaviour of the http client.
It also needs this .diff applied to node_modules/ipfs-utils/src/http.js to tell us the number of requests in flight at any one time.
On MacOS X with 1000 requests, maxSockets: 100 and keepAlive: true on the agent I see:
$ node index.js
Set {}
0/1000
max in flight 1
took 7324 ms
So no errors and it completed in 7.3 seconds.
With 150 sockets I see:
$ node index.js
Set {}
0/1000
max in flight 1
took 6860 ms
With 200 sockets I see:
$ node index.js
Set {}
0/1000
max in flight 1
took 6752 ms
Something to note here is that max in flight is only ever 1 - the API server is responding too quickly so we don't have multiple requests open, we need to make another change to validate what's going on.
Apply this diff to node_modules/ipfs-http-server/src/api/resources/dag.js to add 100ms latency to every ipfs.dag.put request and we see max in flight start to increase in line with maxSockets:
maxSockets: 100
$ node concurrent-requests.js
Set {}
0/1000
max in flight 100
took 7304 ms
maxSockets: 150
$ node concurrent-requests.js
Set {}
0/1000
max in flight 150
took 6847 ms
maxSockets: 200
$ node concurrent-requests.js
165/1000 FetchError: request to http://localhost:5002/api/v0/dag/put?format=dag-cbor&input-enc=raw&hash=sha2-256 failed, reason: connect ECONNRESET 127.0.0.1:5002
at ClientRequest.<anonymous> (/Users/alex/test/http/node_modules/node-fetch/lib/index.js:1461:11)
at ClientRequest.emit (events.js:323:22)
at Socket.socketErrorListener (_http_client.js:426:9)
at Socket.emit (events.js:311:20)
at emitErrorNT (internal/streams/destroy.js:92:8)
at emitErrorAndCloseNT (internal/streams/destroy.js:60:3)
at processTicksAndRejections (internal/process/task_queues.js:84:21) {
message: 'request to http://localhost:5002/api/v0/dag/put?format=dag-cbor&input-enc=raw&hash=sha2-256 failed, reason: connect ECONNRESET 127.0.0.1:5002',
type: 'system',
errno: 'ECONNRESET',
code: 'ECONNRESET'
}
Set { 165 }
1/1000
max in flight 200
took 6850 ms
Let's increase kern.ipc.somaxconn=2048, maxSockets: Infinity and keepAlive: false:
$ node concurrent-requests.js
Set {}
0/1000
max in flight 1000
took 7135 ms
kern.ipc.somaxconn=2048, maxSockets: Infinity, keepAlive: true:
$ node concurrent-requests.js
Set {}
0/1000
max in flight 1000
took 6301 ms
So, we can increase the number of incoming connections, but only by tweaking system parameters which seems a little unreasonable. A better solution would be to limit the number of concurrent connections used by the http client in node through the use of a http.Agent, which is the purpose of #3474.
@achingbrain that鈥檚 a very rigorous analysis, thank you. So, it means two ways of dealing with the refused connections now: either increase os-level parameters, or wait till #3474 is released. Ideally, both should be applied, as they belong to different sides of data flow. When is the release then? :)
As soon as node 14 stops making my life interesting and the build passes 馃槈
Tomorrow, all things going well.
Most helpful comment
It's important to set a baseline for performance expectations. In this gist I set up an HTTP server and use a client to make requests to it, using only node core - no HTTP framework and no HTTP client abstractions.
Running on Mac OS X 10.15.6 I see the following:
For me, after a few runs it starts to fail somewhere between 120-140 concurrent requests on average.
The same code run on Linux is much better:
Over 2000 concurrent requests before it gets an
EMFILE, likely because it's hit a limit on how many files a process can have open.Why the
ECONNRESETs though? A cursory google reveals lots of 'Why does my networking code work on Linux butECONNRESETon OS X?'There are two interesting parameters, one is the TCP connection backlog (511 by default, set as a parameter to server.listen and the other is
somaxconn. On OS X it's set to 128 by default and is probably why I get so few concurrent requests compared to Linux. From what I understand, the value is 128 to give you some sort of protection against SYN flood attacks.It's also set to 128 on Linux but the max concurrent connections seems to be limited by the process
ulimit -nso there may be something else at play on that platform.Anyway, for OS X we can increase this until the next reboot with:
Now run the test again and also increase the connection backlog to least
2048:Great! Now we have way more concurrent requests, similar to Linux.
On to the benchmarks. I've taken your test and modified it slightly to
a) create the random data before we start making connections as it's not free
b) to incorporate the changes from #3474 which gives a bit more control over the behaviour of the http client.
It also needs this .diff applied to
node_modules/ipfs-utils/src/http.jsto tell us the number of requests in flight at any one time.On MacOS X with 1000 requests,
maxSockets: 100andkeepAlive: trueon the agent I see:So no errors and it completed in 7.3 seconds.
With 150 sockets I see:
With 200 sockets I see:
Something to note here is that
max in flightis only ever 1 - the API server is responding too quickly so we don't have multiple requests open, we need to make another change to validate what's going on.Apply this diff to
node_modules/ipfs-http-server/src/api/resources/dag.jsto add 100ms latency to everyipfs.dag.putrequest and we seemax in flightstart to increase in line withmaxSockets:maxSockets: 100maxSockets: 150maxSockets: 200Let's increase
kern.ipc.somaxconn=2048,maxSockets: InfinityandkeepAlive: false:kern.ipc.somaxconn=2048,maxSockets: Infinity,keepAlive: true:So, we can increase the number of incoming connections, but only by tweaking system parameters which seems a little unreasonable. A better solution would be to limit the number of concurrent connections used by the http client in node through the use of a
http.Agent, which is the purpose of #3474.