Node: Performance Regression: tls.createSecurePair() Not Buffering Connections in v10

Created on 24 Apr 2018  Â·  21Comments  Â·  Source: nodejs/node

  • Version: v10.0.0-pre
  • Platform: Linux newtons7 4.9.0-6-rt-amd64 #1 SMP PREEMPT RT Debian 4.9.82-1+deb9u3 (2018-03-02) x86_64 GNU/Linux
  • Subsystem: TLS

With the upcoming Node.js v10.0.0, performance of sockets created with createSecurePair() is degraded, with several effects as unresponsive event loop, crashing connections and general sluggishness.

Steps To Reproduce

Create a proxy that receives encrypted TLS connections and resends to a plain socket server. It should be a regular socket server that uses tls.createSecurePair() to decrypt data, and send it unencrypted to the socket server. Bombard it with a lot of connections and relatively small packets. Soon enough the proxy will hog down and become unresponsive.

client -> proxy -> server

It is best to create multiple workers to act as clients, since otherwise the main event loop will be too busy sending data.

Sample Script

Run the script combined-tls.js:

git clone https://github.com/logtrust/tls-server-demo
cd tls-server-demo
npm install
# raise the number of available file descriptors to ~10k
ulimit -n 10240
node lib/combined-tls.js

Running the last line using Node.js v8 or v9 will produce a responsive server, that prints a trace every five seconds; using Node.js v10.0.0 pre will yield an unresponsive server that prints traces erratically, generates strange errors and takes ages to shut down after ^C'ing it.

See the README.md for runtime options such as modifying the number of incoming connections. The script uses the stdio module only for parsing options, and strictly Node.js core modules after that.

Possible Explanation

After the great work by @addaleax in https://github.com/nodejs/node/pull/17882, the unholy mess that was lib/_tls_legacy.js was greatly simplified to use new tls.TLSSocket() underneath. A crucial bit was not ported, though: the buffering capabilities of tls.SecurePair().

As a consequence, the new code in lib/internal/streams/duplexpair.js sends events (either data or readable depending on the stream mode) as they are received and decrypted. When many small messages are sent to the TLSSocket it will hog down. In contrast, the old code behaved like TCP: when heavily loaded it started buffering data and sending bigger and bigger packets to the cleartext stream.

The sample code shows an average of 32 bytes per event with TLSSocket (v10), the same exact value sent by the client per packet; while the original createSecurePair() (v8 and v9) shows a bigger rate that depends on system load: I have seen anything from 60 to 300 bytes/event. The number of events in TLSSocket is so high with v10 that attempting to resend everything from the proxy makes the event loop unresponsive.

A possible workaround is to buffer packets in the proxy server, but this is suboptimal as it means double-guessing what the stream is doing. Also, buffering in the proxy would minimize packet resend time, but we would still be processing a lot of unnecessary events.

Importance

This issue is critical for us since it affects our ability to ingest data using both secure and insecure connections to the same port.

performance regression tls

Most helpful comment

@addaleax @mcollina @alexfernandez I've just run the benchmark in the repo and some others of my own and the results are spectacular: performs 4 times better than the old _tls_legacy with chunks of 2 bytes, 2.5 times better with chunks of 100 bytes and 1.1 times better with chunks of 1024 bytes. Great, great great job!!! While doing the experiments I've noticed that clear sockets perform way worse than TLS sockets with small chunks of data (2 and 100 bytes). But it happened the same in node 8, so there is no regression. I'll open a different issue for this. Thanks again for the excellent work!!

All 21 comments

@nodejs/crypto

Thanks for the extensive work prior to reporting! I’ll definitely take a look as soon as I have the time.

tls.createSecurePair() has been deprecated for a while now, though. Have you looked in some way into migrating to TLSSocket directly? I get that that’s a major, non-trivial API shift, but it would be nice to have the comparison if that’s possible.

I’m assigning this to myself because I agree that the patch identified by @alexfernandez is by far the most likely cause here, but if somebody else wants to investigate I don’t want to stand in the way. :)

Thanks for looking into it, @addaleax! Sure, we have considered migrating to new tls.TLSSocket() as recommended by the docs, that is where our investigation begun. It behaves very similarly to tls.createSecurePair() in v10, and the underlying cause is probably the same: lack of buffering. Then we tried v10 to see how the new code behaved, and to our dismay not even the old tls.createSecurePair() gives good performance.

If you run the sample script under Node.js v8 or v9 with --new or -n it will attempt to use new TLSSocket(), and the results will be very similar to v10 with tls.createSecurePair().

Note: the class Pipe in the script is just a way of easily switching between readable and data events using the --readable option, which has no noticeable effect on performance. A simple first.pipe(second) yields the exact same results.

Excellent bug report @alexfernandez - just saying it out loud in case it's not obvious.

If you'd like to work on the PR and get guidance/help from @addaleax if/when you're having issues with it that's also something we're very much in favor of. The report itself shows an understanding of the code and indicates a capability to do so.

Of course, this is not a request but an offer - you are in under no obligation or pressure to work on a PR. I'm just saying clearly that it's welcome :)

Hmm... I'm also seeing a massive regression somewhere along the way in v9.x in the lib/simple-tls-new.js. From 2.4mb/s to 1.4mb/s. That's troubling... I'm going to try bisecting when I have a moment, or at least find which exact version saw the regression.

Edit: The regression landed in 9.7.0. Will report when I have more info.

@addaleax Reverting https://github.com/nodejs/node/commit/82c43aed16cf9b69e2fd44f2e9797a8c56f6fb6d fixes the regression I experienced above (unrelated to the other one) and actually yields a performance improvement to 3.2mb/s. Any thoughts?

Edit: After testing some more, I think we need a C++ nextTick... 😆 When using env()->isolate()->EnqueueMicrotask instead of SetImmediate the performance recovers. Of course, I do not think we should interleave with promises like that. Or we could make it possible for those calls to be sync.

Thanks, @benjamingr! I will try to understand the code at a low level and maybe propose something.

Of course, I do not think we should interleave with promises like that.

Can you elaborate on that a little? Promises don't actually make any guarantee about when they run except that they do when only platform code does (which the C++ qualifies as).

We make no official guarantees on the current nextTick (and new ticks) -> promise queue (and chained promises) -> other stuff order and I don't think that we should (as long as we never run promise thens in a way that violates the spec).

Can you elaborate on that a little? Promises don't actually make any guarantee about when they run except that they do when only platform code does (which the C++ qualifies as).

Yeah, it's probably not a big deal to interleave. Also the way EnqueueMicrotask is implemented is pretty efficient so I don't really want to re-implement it on our end. We can't get anywhere near the same perf with std::vector, at least based on a quick test.

I opened a PR in https://github.com/nodejs/node/pull/20287

Fwiw, the reproduction in https://github.com/logtrust/tls-server-demo does not seem to work very reliably on Linux… :/

@addaleax Yes, it is not simple to reproduce and my script is not very polished. I think I have reduced it to a benchmark similar to benchmark/tls/throughput.js, I will send a PR presently to include it in Node.js.

@alexfernandez can you also try running the benchmark with https://github.com/nodejs/node/pull/20287 applied?

(curl -L https://github.com/nodejs/node/pull/20287.patch | git am in your node folder)

@benjamingr I don't recommend it. It won't affect the SecurePair piece at all. (Hence only Refs not Fixes.)

More importantly, I'm re-working that PR to make sync writes to TLSWrap possible instead of just shifting around the type of async scheduler. After further exploration, I think async writes are actually good and we need to potentially disable DoTryWrite for TLS. I'm seeing very sizeable gains from that type of a change.

@benjamingr Sure enough! Thanks for the curl line, it makes it much simpler :+1: .

@addaleax Now that https://github.com/nodejs/node/commit/c346cb6929b0ec1140e88234ab0185b67519186c has landed, we finally have a reliable way to reproduce this isssue! :smile: The procedure is explained in the PR https://github.com/nodejs/node/pull/20344, basically just run node benchmark/tls/secure-pair.js which will show the result in megabits (in 5 seconds). On my laptop it goes from ~16 to ~3 with 2-byte packets, and from ~2k to ~1k with 1 KB packets. Performance for larger packets improves slightly, from 3.2M to 3.7M.

Quick update now that Node.js 10.x is becoming LTS next week: 10.13.0 is still painfully slow when using tls.createSecurePair(). So is 11.1.0, while 8.12.0 and 9.11.2 are still fast. From with the repo:

node benchmark/tls/secure-pair.js

The results for using 2, 1024 and 104856 bytes-long packets using tls.createSecurePair() are show below. Results for new TLSSocket() are quite similar across versions. All durations are 5 seconds.

Node.js version | packet size | MB transferred
-------------------|-----------|------------
8.12.0|2|15.3
9.11.2|2|15.5
10.13.0|2|3.5
11.1.0|2|4.4
8.12.0|1024|2050
9.11.2|1024|1796
10.13.0|1024|1069
11.1.0|1024|1180
8.12.0|1048576|2985
9.11.2|1048576|3109
10.13.0|1048576|3582
11.1.0|1048576|3561

In summary, anyone using tls.createSecurePair() is heavily penalized unless they are transferring upwards of 1 MB in 5 seconds.

cc @mcollina

Sorry for replying this after a long time. I did not forget about this, just the last month has been very hectic.

@addaleax maybe you can take another shot at this? or at least, can you take a look and point us into the right direction for a fix?

Hi there!! I am taking over @alexfernandez's work on node and TLS and I was wondering if there's been any progress on this issue, @addaleax. This regression in performance is blocking us from upgrading to node 10 in production, and we would love to move up and enjoy the rest of improvements made in this version.

Thanks in advance for your excellent work!!

@alexfernandez @mcollina @tufosa I think https://github.com/nodejs/node/pull/27861 has addressed this issue – feel free to double-check, though.

@addaleax @mcollina @alexfernandez I've just run the benchmark in the repo and some others of my own and the results are spectacular: performs 4 times better than the old _tls_legacy with chunks of 2 bytes, 2.5 times better with chunks of 100 bytes and 1.1 times better with chunks of 1024 bytes. Great, great great job!!! While doing the experiments I've noticed that clear sockets perform way worse than TLS sockets with small chunks of data (2 and 100 bytes). But it happened the same in node 8, so there is no regression. I'll open a different issue for this. Thanks again for the excellent work!!

Was this page helpful?
0 / 5 - 0 ratings

Related issues

fanjunzhi picture fanjunzhi  Â·  3Comments

akdor1154 picture akdor1154  Â·  3Comments

danialkhansari picture danialkhansari  Â·  3Comments

Icemic picture Icemic  Â·  3Comments

stevenvachon picture stevenvachon  Â·  3Comments