Node: request body sent in separate SSL segment (bug at 9.6.0)

Created on 5 May 2019 · 11Comments · Source: nodejs/node

Version: bug introduced in 9.6.0 and still present on 12.1.0
Platform: Linux 4.4.0-142-generic #168-Ubuntu SMP Wed Jan 16 21:00:45 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux
Subsystem: http / https (maybe tls)
Minor bug.

The code:

~ js
let req = https.request("https://example.org")
req.end( Buffer.from("hello\n") );
~

For Node >= 9.6.0, sends two SSL segments (one with request + headers, the other with the body hello\n). Node <= 9.5.0 sends a single SSL segment with everything as expected.

If instead of passing a Buffer, we pass a string:

~ js
req.end( "hello\n" );
~

This works correctly in all Node.JS versions.

http tls

Source

mildsunrise

Most helpful comment

I also found out this is more significant in http2. My gf had a homework task where she had to push 49 resources to the browser:

~ js
for (var i = 1; i < 50; i++) {
var path = "/testStreamShort" + i;
stream.pushStream({":path": path}, (err, pushStream) => {
if (err) throw err;
pushStream.respond({ ':status': 200, 'content-type': 'text/plain' });
pushStream.end("a");
});
}
stream.respond({ ':status': 200, 'content-type': 'text/plain' });
stream.end("a");
~

This emits 249 segments: 49 PUSH_PROMISE (31 bytes), then 50 HEADERS (12 bytes), then 50 DATA (9 bytes), then 50 segments with the actual body (1 byte), then 50 empty DATA (9 bytes) to indicate end of stream. This results in 8562 bytes written to the TCP socket.

With the change I mentioned, everything is emitted in only 2 segments, weighting 3128 bytes. That's a 63% saving. I don't know how to properly benchmark this, but a quick attempt shows a 28% decrease in CPU time.

mildsunrise on 19 May 2019

👍2

All 11 comments

I can confirm this happens as described, with TLS1.2 or 1.3, on master/12.x, but it's not clear to me that there is any negative effect to it. I wiresharked the exchange, and it looks like both TLS records are sent in a single TCP segment, so that isn't a problem.

Since it is odd, I'll try to see why it does this when I have time (the first few layers of functions called by .end() don't behave differently for String vs Buffer).

Why is this a concern to you?

sam-github on 6 May 2019

Oh, you're right, I genuinely thought they were separate TCP segments :sweat_smile: In that case it probably won't make a difference (unless you're making like, a ton of requests). For debugging it'd still be nice to have full control on the SSL segments, though.

It seems the distinction between string and buffer is being done here:

https://github.com/nodejs/node/blob/8c4bd2af4f24ee3254ced687bc0c9a61417551a9/lib/_http_outgoing.js#L217-L238

So, replacing lines 222-237 with:

~ js
var header = Buffer.from(this._header, 'latin1');
data = Buffer.concat([header, data]);
~

Would make the behaviour symmetrical among strings and buffers, I think. Would you accept a PR?

mildsunrise on 6 May 2019

We do accept PRs, of course, but changes in core can be tricky.

Concating the buffers seems like it would add an unnecessary memory copy/buffer concatenation into a hot path, the current code looks pretty deliberate. I'm more surprised that using a string causes concatenation, than that a data buffer is a different write from the header values (which are probably strings).

sam-github on 6 May 2019

I think this can be improved, most sockets have a ._writeGeneric() method that can write out a mix of strings and buffers.

OutgoingMessage#_send() calls OutgoingMessage#_writeRaw() which calls this.connection.write() to actually write the data. The "write generic" logic would look something like this:

if (conn._writeGeneric) {
  const chunks = [/* ... */];  // derive from this.outputData
  chunks.allBuffers = false;  // if it's a mix of strings and buffers
  // explicit check needed because it returns undefined on success
  return false !== conn._writeGeneric(true, chunks, '', callback);
}
// ...
return conn.write(data, encoding, callback);

Chunks have this format:

const chunk = {
  chunk: /* buffer or string */,
  encoding: /* string, set to '' for buffer */,
};

You could go even further by calling conn._handle.writev() directly because ._writeGeneric() does a lot of unnecessary work - but it also does some necessary work so you'll have to read the source carefully. :-)

Honestly, I'd love if someone goes over the net, http and streams code with a fine-tooth comb. There are a lot of accrued inefficiencies.

bnoordhuis on 7 May 2019

👍1

Thanks for the info! I'm new to Node internals, but I'll try to work on this

mildsunrise on 8 May 2019

👍1

@bnoordhuis Just to make sure, you meant conn._handle._writev()?

mildsunrise on 19 May 2019

@jmendeth The method should be named ._handle.writev() (e.g. process.stdout._handle.writev when process.stdout refers to a pipe/TTY).

That being said … Ben’s suggestion only really works if we can actually skip through the streams layer, i.e. there are no other chunks currently being written and no other interaction with the socket as a JS stream takes place.

addaleax on 19 May 2019

Thanks! I have investigated further and _writeGeneric() (and thus handle.writev) is already called with both chunks. So if we modify TLSWrap::DoWrite() and TLSWrap::ClearIn() to emit a single SSL segment, that's all that's needed for the code I posted.

Of course, copying into a temporary buffer may decrease performance for large chunks, but it's faster than doing it in JS, and given the incredible overhead this might have for small requests / responses, I think we should at least allow users to enable concatenation.

(Unrelated question: I've seen that SSL_MODE_ENABLE_PARTIAL_WRITE isn't used (and the code assumes all writes are complete) is that intentional?)

mildsunrise on 19 May 2019

The writes all go into a BIO buffer, so incomplete writes aren't possible, unlike writing to an OS socket descriptor.

sam-github on 19 May 2019

I also found out this is more significant in http2. My gf had a homework task where she had to push 49 resources to the browser: