Node: url: parse() fails if auth contain a colon or ampersand

Created on 26 May 2015  路  17Comments  路  Source: nodejs/node

_I originally reported this at joyent/nodejs. At @jasnell suggestion, I'm reporting it here, so that the right people notice it. https://github.com/joyent/node/issues/25353_

The Problem

If the user or the password component of auth contains a colon or ampersand, url.parse() fails.

var url = require('url');

var user = encodeURIComponent('us:er');
var password = encodeURIComponent('pass:word');
var uri = 'http://' + user + ':' + password + '@localhost/';
console.log(uri); // http://us%3Aer:pass%3Aword@localhost/

var parsed = url.parse(uri);
console.log(parsed.auth); // us:er:pass:word

The output of parse() is not understood by format().

console.log(url.format(parsed)); // http://us:er%3Apass%3Aword@localhost/

Browsers handled it this way:

var parser = document.createElement('a');
parser.href = "http://us%3Aer:pass%[email protected]:3000/path%3Ename/?search=some%20phrase#hash";

console.log(parser.hash); // #hash
console.log(parser.host); // example.com:3000
console.log(parser.hostname); // example.com
console.log(parser.href); // http://us%3Aer:pass%[email protected]:3000/path%3Ename/?search=some%20phrase#hash
console.log(parser.origin); // http://example.com:3000
console.log(parser.password); // pass%3Aword
console.log(parser.pathname); // /path%3Ename/
console.log(parser.port); // 3000
console.log(parser.protocol); // http:
console.log(parser.search); // ?search=some%20phrase
console.log(parser.username); // us%3Aer

I think the best solution is (like the browser) to split auth into two fields.
I created a PR that does this: https://github.com/joyent/node/pull/25359

confirmed-bug url

Most helpful comment

@DavidTPate Why create another PR redoing the work already did, especially if you don't have time to complete it? My PR updates parse(), format(), the docs, and the tests. If you are aware of problems with my PR, let me know. I'd be happy to address them so this issue can be closed.

All 17 comments

@vkurchatkin @nodejs/tsc ... this has impact on the URL module API. Please take a look when you get the chance.

Based on RFC-3986 a colon or ampersand are not valid characters within a URI unless encoded (which is what this example shows).

url.parse()

The reason that url.parse() is returning the decoded values for the authority part is because of this line which decodes them.

Here's the code for url.parse() that I'm talking about:

   if (atSign !== -1) {
      auth = rest.slice(0, atSign);
      rest = rest.slice(atSign + 1);
      this.auth = decodeURIComponent(auth);
    }

Fixing this part is pretty trivial, but obviously there are larger implications due the usage of this module. Maybe for now the old auth property is kept (but called out as deprecated...somehow) while this part is split out into username/password.

It should be as simple as:

this.auth = decodeURIComponent(auth);
var authParts = auth.split(':');
this.username = authParts[0];

// A username can be provided without a password (no `:`) or with a blank password (`:` followed directly by `@`)
if (authParts.length === 2) {
  this.password = authParts[1];
}

url.format()

As for the url.format() piece this is because of these two lines.

Here's the code for url.format() that I'm talking about:

  var auth = this.auth || '';
  if (auth) {
    auth = encodeURIComponent(auth);
    auth = auth.replace(/%3A/i, ':');
    auth += '@';
  }

The reason why the output console.log(url.format(parsed)); // http://us:er%3Apass%3Aword@localhost/ is being received from this call in the example from OP is that it's just matching the first instance of the percent encoded :. This one should be simpler to fix, since it should just involve a simple update such as:

var auth = '';
if (this.username) {
  auth += encodeURIComponent(this.username);

  // Password is optional and can be left blank if desired.
  if (this.password) {
    auth += ':' + encodeURIComponent(this.password);
  }
  auth += '@';
}

Since url.format() operates on the prototype of Url I don't think this change would require deprecation like the change to url.parse().

@DavidTPate Agreed. That's basically what my PR does. It leaves auth alone to maintain backward compatibility and adds fields for username and password. And url.format() is changed to use these field preferentially over auth.

Marking as a bug, like jasnell did.

@brendanashworth @jasnell What can I do to get this pulled?

I originally created PR in May before the Node/IO merger. I rebased in September for Node 4.x. Now, I've rebased it again for Node 5.4.x. I really don't want to to rebase again.

@ben-page I started adding the functionality here: https://github.com/nodejs/node/pull/1813 but there's impacts to URL parsing and other unforseen consequences and I just haven't had the time to mess with it as of late.

@DavidTPate Why create another PR redoing the work already did, especially if you don't have time to complete it? My PR updates parse(), format(), the docs, and the tests. If you are aware of problems with my PR, let me know. I'd be happy to address them so this issue can be closed.

Should be addressed by the alternative WHATWG URL impl here: https://github.com/nodejs/node/pull/7448

  // Password is optional and can be left blank if desired.
  if (this.password) {
    auth += ':' + encodeURIComponent(this.password);
  }

@DavidTPate are you sure that the separator should be dropped in that case?
e.g. username:@example.com vs [email protected]

cf RFC 1738 3.1:

has a user name of "foo" and an empty password

@Mouvedia It all depends on which spec is being utilized. The one that you refer to is the old URL specification. There's the RFC-3986 specification which requires that the colon not be included when the password is blank. Though the new WhatWG URL spec doesn't really care.

@DavidTPate indeed both should be valid. But I am not finding anything about [email protected] in RFC-3986. Could you quote the relevant passage?

@Mouvedia I think I got confused between the port piece and authority piece with my previous comment on an empty password not being allowed.

Here's the relevant part from RFC-3986 around the password being empty:

Applications should not render as clear text any data
after the first colon (":") character found within a userinfo
subcomponent unless the data after the colon is the empty string
(indicating no password)

Here's the one I was thinking about from RFC-3986 in regards to the port:

URI producers and normalizers should omit the ":" delimiter that
separates host from port if the port component is empty.

This appears to have been fixed in master.

Same thing happens when the password contains "/". This is still not fixed.

@stefanotto Please open a new issue and include a simple test case.

@stefanotto please open a separate issue, if you believe it's an actual issue.

@stefanotto ... The legacy url.parse() has multiple such issues and is unlikely to be fixed. The URL object follows the WHATWG URL Standard and, while it's not entirely clear, makes the assumption that / characters in userinfo (usernames and passwords) will be percent-encoded. See https://url.spec.whatwg.org/#userinfo-percent-encode-set

If you run the following in Chrome and Firefox, you'll see that they also do not handle / characters in the password well: new URL('https://abc:1/2/[email protected]')

It is recommended that special characters like / in the password need to be pct-encoded.

Was this page helpful?
0 / 5 - 0 ratings

Related issues

loretoparisi picture loretoparisi  路  3Comments

srl295 picture srl295  路  3Comments

danialkhansari picture danialkhansari  路  3Comments

danielstaleiny picture danielstaleiny  路  3Comments

dfahlander picture dfahlander  路  3Comments