Node: NGHTTP2_ENHANCE_YOUR_CALM when starting with version 10.16.3

Created on 20 Aug 2019 · 22Comments · Source: nodejs/node

Version:
Platform:
Subsystem:

Version: 10.16.3
Platform: 4.15.0-1044-aws #46-Ubuntu SMP Thu Jul 4 13:38:28 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux
Subsystem: http2

I'm reusing a session for performance reasons and opening 10 new requests to a server per second. On node 10.16.2 it runs forever with no problems. On 10.16.3 it starts throwing "Error [ERR_HTTP2_STREAM_ERROR]: Stream closed with error code NGHTTP2_ENHANCE_YOUR_CALM" after a minute or two +- 1000 requests.

Is this a newly introduced bug or how to prevent this?

http2

Source

jellelute

Most helpful comment

Hi, is there any news about this issue? Also could the fix be ported to NodeJS 12, as this version became the LTS version recently. Thanks for your efforts!

jellelute on 23 Oct 2019

👍4

All 22 comments

Can you share any more information? Is the server the program running Node.js, is the client, or are both? What is the value of the maxSessionMemory option on either side (if it is set)? Does running node as node --security-reverts=CVE-2019-9514 help?

(See also https://github.com/nodejs/node/pull/29122#issuecomment-522642986 for some background.)

addaleax on 20 Aug 2019

This is most likely related to the new denial of service attack mitigations. The revert workaround @addaleax suggests should make it work but this is something we're going to need to revisit.

jasnell on 20 Aug 2019

@addaleax I'm not aware of the server site implementation. You can use this url to reproduce:
https://api.binance.com/api/v1/depth?symbol=BTCUSDT. maxSessionMemory was not set explicitly. Running node with --security-reverts=CVE-2019-9514 does not solve the issue.
Probably another commit did cause this issue.

jellelute on 29 Aug 2019

We are experiencing a similar issue in the Google Cloud Firestore SDK. We have 10 or so customers reporting their production apps hitting this error where our http/2 stream is closed with NGHTTP2_ENHANCE_YOUR_CALM.

I can reproduce the issue with node v10.16.3 but not v10.16.2 so it seems to be a regression. Running node with --security-reverts=CVE-2019-9514 _does not_ help.

I've created a repro project here: https://github.com/mikelehen/node-http2-repro

Unfortunately it is _not_ a minimal repro, but it is what I was able to piece together based on our customer reports. Is it enough to debug the issue and isolate a root cause? We would very much appreciate a workaround and/or a fix to node so that we can unblock our customers. Thank you!

mikelehen on 1 Oct 2019

We managed to get a reproducible version of the bug we are experiencing. I can confirm that including the flags --security-reverts=CVE-2019-9514 --security-reverts=CVE-2019-9512 --security-reverts=CVE-2019-9516 --security-reverts=CVE-2019-9518 does not fix the issue.

I was able to bisect and identify this commit as the one causing the failure... @addaleax any thoughts?

/cc @nodejs/http2

MylesBorins on 2 Oct 2019

Couple of quick tests... can you try setting the maxSessionMemory option significantly higher and seeing if the problem persists? There are a few other knobs there that can be turned also but let's try that one first. The default value for maxSessionMemory is 10, indicating a max of ten megabytes per session, which could be eaten away quickly with a large number of streams.

jasnell on 2 Oct 2019

I was able to bisect and identify this commit as the one causing the failure... @addaleax any thoughts?

@MylesBorins Which commit?

addaleax on 2 Oct 2019

The commit @MylesBorins meant to link is https://github.com/nodejs/node/pull/29122/commits/164ac5b241b96089e6bad5bb83ea416966b3245f . 😄

mikelehen on 2 Oct 2019

Okay, that’s interesting – I’ll take a look. It still would be good to know if turning the maxSessionMemory knob makes a difference here, as @jasnell suggested.

addaleax on 2 Oct 2019

Based on that commit message, specifically "If we are waiting for the ability to send more output, we should not process more input", I think the overall goal of that commit is a bad idea. With that logic implemented on both sides of a connection, if one side ever pushes back on flow control for any reason, the other side will stop reading and start pushing back the other way, which will result in a deadlock where neither side will clear out their read buffer because they're waiting for the other side to do so.

And if just the client implements that logic, it's easy to imagine a scenario where the client doesn't read enough, resulting in the server being unable to write while still receiving more data. It wouldn't surprise me if some servers send ENHANCE_YOUR_CALM when that happens.

murgatroid99 on 2 Oct 2019

@addaleax @jasnell It looks like increasing maxSessionMemory _does_ solve the problem. If I set it to 100, my repro works. If I set it to 1 then it fails much faster.

A couple notes:

Our repro is using http2.connect() not http2.createServer().
FWIW the general usage pattern of our repro is basically: send a request to the backend, wait for the response, send the next request to the backend, wait for the response, etc. It ends up failing after 300+ request/response pairs. Each request / response pair is roughly the same size (requests are < 1k, responses are maybe 1MB). Running on my machine / network, we end up doing around 2-3 requests / second.

mikelehen on 2 Oct 2019

FYI- If you need to run my repro in https://github.com/mikelehen/node-http2-repro you'll need access to my project which I'll have to grant you out-of-band. But hopefully the info above may be enough to resolve the issue? Thanks!

mikelehen on 3 Oct 2019

@murgatroid99 ... the scenario you describe is entirely possible and worth exploring to see how much of an issue that is in our implementation. I'm thinking, however, that for this specific problem, it's more of a memory management issue. Specifically: we need to get better at it. The current maxSessionMemory limit was really just a finger in the air guesstimate and it's looking like we either need to (a) increase it or (b) find other ways of reducing the memory load per QuicSession.

jasnell on 3 Oct 2019

@jasnell Any guesses why https://github.com/nodejs/node/commit/164ac5b241b96089e6bad5bb83ea416966b3245f would have exacerbated the memory management issue?

FWIW the more I increase maxSessionMemory, the longer my repro runs before failing, even though we are doing fixed-size requests and responses and only have 1 outstanding at a time. So I would expect the necessary sessionMemory to be pretty much flat. So I'm wondering if https://github.com/nodejs/node/commit/164ac5b241b96089e6bad5bb83ea416966b3245f has introduced a mismatch in IncrementCurrentSessionMemory() / DecrementCurrentSessionMemory() calls.

Like maybe https://github.com/nodejs/node/commit/164ac5b241b96089e6bad5bb83ea416966b3245f#diff-a2de252da810db89f574ecafa0f8e95aR1901 should instead be:

DecrementCurrentSessionMemory(stream_buf_.len - stream_buf_offset_);

(this is a wild guess. I don't really understand the code; I'm just trying to apply some pattern matching...)

mikelehen on 3 Oct 2019

Definitely sounds like a memory accounting issue then... or a straight up memory leak. That gives a good place to start the investigation. First place to check would be to rule out a memory leak, and if that looks good, then we can look at the accounting.

jasnell on 3 Oct 2019

@jasnell Thanks for all your investigations so far.

I work with @mikelehen on Firestore and was wondering if you could let us know if you had an estimated timeline for a fix? We have a couple of users that are blocked on this. Thank you so much!

schmidt-sebastian on 9 Oct 2019

With my current workload, it won't be until later this week at the earliest, but definitely haven't forgotten!

jasnell on 9 Oct 2019

👍2

In case it helps, I actually tried out my guess above (https://github.com/nodejs/node/issues/29223#issuecomment-537721898) last week and with that tweak, the error seems to go away. That said, I don't know how to verify that all the accounting is actually coming out correct. It's possible I just messed up the accounting in a different way.

mikelehen on 9 Oct 2019

Thank you for the update @jasnell! Let us know if you need help verifying a fix.

schmidt-sebastian on 9 Oct 2019

👍1

Hi, is there any news about this issue? Also could the fix be ported to NodeJS 12, as this version became the LTS version recently. Thanks for your efforts!

jellelute on 23 Oct 2019

👍4

Any update here? Would it help if I sent a PR based on my guess in https://github.com/nodejs/node/issues/29223#issuecomment-537721898 ? I don't know the code well enough to know if it's correct or how to definitively validate / test it, though. :-)

mikelehen on 8 Nov 2019

I'v opened a PR (https://github.com/nodejs/node/pull/30684) that addresses the issue. 🤞 we can get it merged? 😄

mikelehen on 27 Nov 2019

Was this page helpful?

0 / 5 - 0 ratings