Node: url.parse would be better to not decode authority?

Created on 2 Apr 2017  路  12Comments  路  Source: nodejs/node

  • Version:
    N/A

  • Platform:
    N/A

  • Subsystem:

I found this article.
According to the article, Java and Python has ftp protocol injection to decode CRLF in the url.

ftp://foo:bar%0d%[email protected]/file.png

The url has the newline(CRLF) in authority part. Java and Python ftp server recognize following injected code.

USER foo
PASS bar
INJECTED
TYPE I
EPSV ALL
PASV
...

And our url module has the same issue for CRLF.

> url.parse('ftp://foo:bar%0d%[email protected]/file.png')
Url {
  protocol: 'ftp:',
  slashes: true,
  auth: 'foo:bar\r\nINJECTED',
  host: 'example.net',
  port: null,
  hostname: 'example.net',
  hash: null,
  search: null,
  query: null,
  pathname: '/file.png',
  path: '/file.png',
  href: 'ftp://foo:bar%0D%[email protected]/file.png' }

I tried WHATWG URL, the new url parser does not decode the authority part.

new URL('ftp://foo:bar%0d%[email protected]/file.png')
// Chrome
{
  hash: "",
  host: "example.net",
  hostname: "example.net",
  href: "ftp://foo:bar%0d%[email protected]/file.png",
  origin: "ftp://example.net",
  password: "bar%0d%0aINJECTED",
  pathname: "/file.png",
  port: "",
  protocol: "ftp:",
  search: "",
  username: "foo"
}

// Node v7.8 REPL
URL {
  href: ftp://foo:bar%0d%[email protected]/file.png
  protocol: ftp:
  username: foo
  password: --------
  hostname: example.net
  pathname: /file.png
}

Question

Why our url.parse decode authority by default?
And should we fix this CRLF problem?

I tried to fix this problem to sanitize CRLF url. but this change breaks compatibility. so I would like to hear some opinions.

I tried some npm modules related to ftp, but I cannot find vulnerabilities using this problem.

related urls

security semver-major url whatwg-url

All 12 comments

If this should be addressed, beware 8.x rc deadline will be closing in 2 days.

I'd +1 on this to put the patch into 8.0.0, and I think it would be nice if you could open a pull request either.

cc @nodejs/url

My fear is this change is too drastic to be included in 8.0.0 without a prolonged period of testing. In fact, I am not certain the URL parser behavior should be changed _at all_, for compatibility with implementations in Java, Python, and older versions of Node.js, and for the fact that the bug only manifests itself with FTP, and is not a deficiency _per se_ of the URL parser.

On the other hand, I support additional error conditions in userland FTP client modules, such as @mscdex's ftp module.

/cc @nodejs/security

While I've never been fond of the behavior of url.parse() I'm afraid that this would be too drastic of a breaking change to get into 8.0.0 with so little time left. The ecosystem is incredibly sensitive to breaking changes in the url.parse() functionality (as we've seen before).

Given that there is a functional alternative in core (the WHATWG parser), I would rather see efforts focused there than on changing the existing parser. But that's just me. I'm interested in hearing what @nodejs/ctc folks have to say :-)

not a deficiency per se of the URL parser

I feel the same way. The problem is with (some) users of the module, not the module itself.

This probably can't be changed anyway because of backwards compatibility. You wouldn't be able to tell if foo:bar%baz contains an URL-encoded octet or the characters %, b and a without looking at the node.js version.

I don't target v8.0 to fix this change.
But I would like to know our url module stance.

not a deficiency per se of the URL parser

Ya, I think so. but even though we would be better not to decode any url parts by default.
At the least, we would be better to provide more secure / useful option like { decodeAuth: false }.

In my humble opinion,

  1. we would be better to stop decodeUriComponent by default (or provide secure option)
  2. recommend WHATWG URL parser and deprecate url.parse gradually.

Item 2 (recommending WHATWG URL Parser and slowly deprecating url.parse() is the track we are already on. My preference would be to document the deficiencies in url.parse() but leave the behavior as is without any modification.

Item 2 (recommending WHATWG URL Parser and slowly deprecating url.parse() is the track we are already on.

Speaking of which, a guide on how to migrate from the legacy API to WHATWG URL API would help, but I am not sure if this should be in doc/api/url.md, or in the website repo?

At the least, we would be better to provide more secure / useful option like { decodeAuth: false }.

That requires people to opt in, though. If someone is security-savvy enough to do that, they probably are already careful to encodeURIComponent() their inputs.

Should this remain open?

I don't think so.

Was this page helpful?
0 / 5 - 0 ratings

Related issues

Brekmister picture Brekmister  路  3Comments

willnwhite picture willnwhite  路  3Comments

sandeepks1 picture sandeepks1  路  3Comments

srl295 picture srl295  路  3Comments

danielstaleiny picture danielstaleiny  路  3Comments