One concern around llhttp is it uses https://github.com/indutny/llparse and a bunch of TypeScript. I don't want to go down the language war path, but it's probably worth considering the increased build complexity (Bazelifying llparser and dependencies) and developer impedance matching when making the call. All things being equal, I'm a fan of higher level solutions to parsing, but switching to this is not totally obvious.
Well, the auto-generated llhttp was just added to node.js as the experimental-http-parser, and node.js is/was the only thing keeping the hand-written http-parser on a life-support, so the decision might be forced upon us sooner rather than later... I don't have a strong opinion until then.
As for the extra build-time dependencies, we just need the generated C code, see: https://github.com/indutny/llhttp/issues/10.
If I could be of any help, please do not hesitate to ask me directly.
@indutny would you mind filing a tracking bug for the migration over on the node.js so interested envoy developers can follow it? If we opt to migrate we'd probably swap shortly after you, and if we don't it's still important to know when node.js stops using http_parser from a security perspective, so it's good to know either way :-)
My opinion on this is that we should take a wait and see approach. TBH, HTTP/1.1 at this layer is basically done. I don't really see what changes we would need to make.
@alyssawilk https://github.com/nodejs/node/issues/24730 - This might do.
This issue has been automatically marked as stale because it has not had activity in the last 30 days. It will be closed in the next 7 days unless it is tagged "help wanted" or other activity occurs. Thank you for your contributions.
This issue has been automatically closed because it has not had activity in the last 37 days. If this issue is still valid, please ping a maintainer and ask them to label it as "help wanted". Thank you for your contributions.
Reopening in light of https://github.com/envoyproxy/envoy/issues/6434. Since we didn't get any response from Node.JS security WG when reporting, it seems that http-parser should be considered unsupported. The community did drive a fix when we went public, see https://github.com/nodejs/http-parser/issues/468.
Since http-parser is a codec and hence exposed to untrusted traffic from clients and backends on the data plane, it has a higher standard to live up to than other external dependencies. I would like us to consider only including dependencies that have active security teams that are at least doing CVE management. This is further motivation for moving away from http-parser.
@htuch I'm sorry about the negative experience that you had. I wish we would have seen your security report on our channels before you submitted that github issue. What channel did you use? Was it HackerOne or the mailing list ([email protected])? I'm having a hard time finding your report...
I am part of Node.js community and one of the maintainers of http-parser. I believe that in our shared effort we managed to fix the issue very promptly after it was disclosed. Sorry again if this caused you any frustration!
As a matter of fact, I'm a maintainer of llhttp too :joy:
@indutny we sent two e-mails, one pre and one post disclosure to [email protected]. Happy to forward them again. Is HackerOne preferred? The title was "Vulnerability in http-parser & embedded NULL header handling".
Thanks again for the prompt fix once this went public, the comment above reflects a concern that we didn't have any effective method to coordinate prior to public disclosure.
Oh no! [email protected] should be forwarding reports to HackerOne. Apparently, something is not working as expected. Again, I'm terribly sorry for this. Will look into it momentarily.
@indutny Will https://hackerone.com/nodejs/hacktivity and https://groups.google.com/group/nodejs-sec be updated retroactively with the CVE?
@htuch it seems like the reporting system would have been or is broken for both http-parser and lhttp (independent of their health status as sub-projects).
@moderation that's a good question. We can definitely request a CVE for the issue. Let me see what I can do.
cc @rvagg @jasnell, perhaps?
@indutny we sent two e-mails, one pre and one post disclosure to [email protected]. Happy to forward them again. Is HackerOne preferred? The title was "Vulnerability in http-parser & embedded NULL header handling".
Hi, I work for @Hacker0x01 and act as our main liaison to the open source community. Hoping I can assist you.
When you e-mailed [email protected], you should have received an e-mail back from HackerOne stating "ACTION REQUIRED: Submit Vulnerability Report". Can you check to see if you received that? If so, you'll need to click that link in order to finalize your report submission.
In any case, you can always submit directly to https://hackerone.com/nodejs.
@reedloden as discussed in https://github.com/nodejs/security-wg/issues/454, this was not received. I'm thinking that this is a process flaw if folks can opt to use e-mail or HackerOne and the e-mail path is not as reliable or requires steps that might result in a dropped report. Would it be better to just drop the e-mail option entirely and have folks use HackerOne directly to avoid potential lost reports?
I haven't been as active in the security triage recently but I don't recall this coming through. My apologies for having missed it.
@htuch I figured out what happened, but until it's fixed (either on our side or on Node.js's), I would recommend that https://hackerone.com/nodejs be utilized. Even if e-mail was working, the web interface is preferred, as e-mail is not secure.
@reedloden thanks for the analysis of the issue. @indutny are you going to followup with the NodeJS security WG on (1) CVE-2019-9900 and (2) how to deal with the e-mail redirect issues? I can do (1) if that helps by filing a HackerOne report.
Is picohttpparser still in the running as a replacement as well? It was mentioned in #2880 & #5536 as a potential perf enhancement for http1.1 .
May be time to revisit this. llhttp now the default in Node 12 and the team are looking to deprecate http-parser https://twitter.com/addaleax/status/1173966000175366144
Should we do an analysis between llhttp, picohttpparser, and home-brewed Ragel (another of @brian-pane's ideas) to decide what replacement to use?
Yep, it would be great to see a comparison, including feature support, standard compliance, performance, licenses, security process/maturity and so on. http-parser is basically EOLed.
Happy to see an analysis done, but IMO we should probably just use llhttp since it's roughly a drop in API replacement for http_parser.
I've used picohttp in the past and we would have to implement a substantial amount of stuff on top that http_parser/llhttp already implements.
The only thing that slightly sketches me out about llhtttp is the way it pushes us towards having to live in Typescript land. I'm a fan of higher level languages for parsers and codecs, but not such a huge Javascript fan. We can just use the released C implementations though, so build integration shouldn't be an issue. FWIW, the concerns over appropriate handling of vulnerabilities in the Node.js community remains, it took a long time to resolve the http-parser issues I reported.
Started playing around with llhttp (diff so far), the parser API is pretty much the exact same which is great, but it doesn't expose the URL parsing library like http-parser which we use in a few places as a URL utility.
@derekargueta can we move away from relying on these URL parsing utils and switch to the Chromium URL library as per https://github.com/envoyproxy/envoy/issues/6588? This would allow us to get behind a single source of URL parsing truth, which can only be a good thing for consistency; we've seen quite a few issues with different parts of Envoy not agreeing on things like URL semantics historically.
@htuch sure, I was unaware of sunsetting the Node URL parser.
Also, should the new parser should initially be gated behind a feature flag to allow opt-in before forcing everywhere, similar to when the buffer implementation changed?
@derekargueta I'd recommend a feature flag to allow folks to fall back as needed, but I'd advocate we flip it on eagerly, there isn't a ton of value in remaining with the extant URL parser.
So far I have 73% of http1 codec tests passing, working through them. llhttp introduces slight deviations - stricter header parsing (i.e. this block is no longer required), slightly different handling of upgrade, etc.
Also, I'm going to do the URL utility library as a separate change if no one beats me to it. My updated branch leaves that as is for now.
@derekargueta please let me know if I can be of any help in this endeavor.
FWIW I highly suggest moving from http_parser to llhttp. There was even discussion about changing the defaults in 10.x during LTS due to lack of ability to continue to support the deprecated parser
I will update the title of this issue. We are switching. @derekargueta is working on it.
This change is about ready, however there's one missing feature in llhttp which is the lenient_http_headers option (used here in http-parser). This option toggles strict validation of HTTP header values by ensuring they are printable ASCII characters. We currently run http-parser in lenient mode (which is the default) then provide a runtime switch to enable strict header validation. However, llhttp defaults to strict validation and doesn't provide an option to disable it.
To move forward, we either need to
envoy.reloadable_features.strict_header_validation and have all traffic use strict header value checking. Based on some conversation on the PR that added strict_header_validation, it seems that we eventually wanted to move this way? If so, I think this option would be better.https://github.com/envoyproxy/envoy/pull/8849 flips strict validation on by default and the option will be removed entirely (unless folks yell) at the next Envoy release. I assume we're going to support both parsers in parallel for a bit so I think going 100% strict is probably fine (we can revisit if changing the default causes problems in upcoming weeks)
Awesome stuff. @derekargueta is there any way to support both parsers without a recompile? IMO this should be feature flagged and flippable at startup like we did with the buffer implementation?
@derekargueta any update on status here?
@htuch I've done my part to encourage @derekargueta :wink: https://twitter.com/moderat10n/status/1230125590428053509
No status update at the moment - I have started catching my stale branches up to master. There's been some refactorings of the HTTP codecs that I need to work through
Started re-investigating this. @indutny one issue that came up is that the llhttp release artifact is that it's not built with strict mode enabled, and this option cannot be enabled from the C API since it is a variable in producing the C code. Are there plans to publish release builds with strict mode enabled, or be able to set strict mode through the C-API perhaps using -D like http-parser? This is causing quite a few tests to fail but I really don't want to introduce Typescript tooling in the Envoy build (for running npx to generate the C code with strict mode)
There is a bit of optimizing that has to be done on llhttp's code before compiling it down to C. I don't think that it'd be possible to introduce the configuration defines the same way as they are in http-parser. Runtime checks would likely affect performance so I'd avoid this too.
There are two things that could be done, however:
.c files from strict and loose mode builds into one guarded by a huge top-level #ifdefI'll look into the latter to see how much churn this would involve in llhttp. Could you consider the former meanwhile?
FWIW, that Pull Request has landed and was released in llhttp 2.1.0.
@derekargueta any update on progress here?
Now @derekargueta 's presentation at KubeConEU is done he should have some cycles :smile:
Yes I had a working implementation with @indutny's last release that fixed the strict-mode issue but was waiting for the exception-removal work in theHTTP codecs to settle. Currently working through some git conflicts :) should be done soon.
This commit on 2020-10-02 confirms that the last remaining maintainer for http-parser has moved on and http-parser is no longer maintained - https://github.com/nodejs/http-parser/commit/ec8b5ee63f0e51191ea43bb0c6eac7bfbff3141d.
From an @envoyproxy/dependency-shepherds POV it would be great if we could get https://github.com/envoyproxy/envoy/pull/9033 over the line /cc @derekargueta @htuch
I've tagged this as both priority and security in response to the above comment.
I have a "drop-in replacement" branch with everything passing _except_ the new functionality that was introduced via http-parser for allowing transfer-encoding: chunked + content-length: N. There's an llhttp issue for this where I've started a small investigation and the fix seems pretty straightforward: https://github.com/nodejs/llhttp/issues/69
FYI, nodejs/llhttp#69 was just closed with a fix in v3.0.0
I added a link to the release at the now stale WIP PR at https://github.com/envoyproxy/envoy/pull/9033#discussion_r532955677
Confirmed all tests pass with the new llhttp update 馃帀 reviving the abstractions I initially had for supporting both parsers through a command line flag
Woohoo
Most helpful comment
So far I have 73% of http1 codec tests passing, working through them. llhttp introduces slight deviations - stricter header parsing (i.e. this block is no longer required), slightly different handling of upgrade, etc.
Also, I'm going to do the URL utility library as a separate change if no one beats me to it. My updated branch leaves that as is for now.