Node: url library behaves unexpectedly when attempting to set a url's port to a large number

Created on 25 Mar 2018  ·  32Comments  ·  Source: nodejs/node

  • Version: 9.8.0
  • Platform: Linux 4.4.0-31-generic #50~14.04.1-Ubuntu SMP Wed Jul 13 01:07:32 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux
  • Subsystem: url


const {URL} = require('url');
let url = new URL('http://127.0.0.1');
url.port // -> ''
url.port = 2.12323232e88;
url.port // -> '2'

confirmed-bug whatwg-url

Most helpful comment

@nodeav yeah, you're missing how IDL works. It specifies the ToString behavior Node.js (and browsers) have.

All 32 comments

I have been able to reproduce it and @nodeav has offered to try and tackle this with a PR :)

I tagged this as a "confirmed bug" since this is not what our .docs say, a solution could be to either fix the docs or to change the URL implementation to reject incorrect numbers for ports.

+1 on this being a bug, but one that also needs fixing in the spec:

Let port be the mathematical integer value that is represented by buffer in radix-10 using ASCII digits for digits with values 0 through 9.

Chromium and Firefox exhibit the same behaviour as Node.

/cc @nodejs/url

I believe the property is operating as specified and documented. The documentation says:

The port value may be set as either a number or as a String containing a number in the range 0 to 65535 (inclusive)…

If an invalid string is assigned to the port property, but it begins with a number, the leading number is assigned to port.

However I do agree the first sentence quoted can be misleading, since it’s actually meant to be “(number or string containing a number) in the range …” rather than “number or (string containing a number in the range…)”.

Whether or not there is a bug in the spec, the odd parts of the URL Standard (like this one) are usually written for more reasons than just being buggy (compatibility is the most common reason). I don’t know the details on this property but @annevk might.

@TimothyGu What led be to believe that this was a bug in the spec was that the spec does contain an explicit range check when setting port, and that that range check 'works' for all number values up until the limit where Number#toString() returns exponential notation.

@addaleax I’d recommend filing a bug at https://github.com/whatwg/url for further verification if you are convinced that this is indeed a spec bug.

Most browsers had a limit of sorts and this was the most common one. The port attribute is a little weird in that the setter first converts to a string, and then uses the URL parser to get an integer out of it. So that's why you get "2" above I'm pretty sure (haven't looked), because the URL parser's port bits bail out on the dot.

I'll post a PR in an hour or two, with a range check in the beginning of the port setter. Do you think it is the right solution? I feel that this is unexpected behavior, regardless of the spec.

1) https://notapattern.net/2014/10/14/ways-men-in-tech-are-unintentionally-sexist/
2) Violating the specification is almost certainly not the way forward. If you want to change the specification somehow, follow the suggestion from @TimothyGu above and please consider https://whatwg.org/working-mode#changes and https://whatwg.org/faq.

@nodeav what @annevk means in point number 1 is that we don't use "guys" for "people" in English in Node.

@annevk worth noting that in @nodeav's native tongue the distinction (using guys for people) doesn't really exist since you would use a gender neutral term (אנשים or חבר׳ה rather than בחורים) which roughly translates to guys but without the gendered part. Thanks for pointing it out - I'm just giving some random context trivia.


About point #2 do you think that's a reasonable change for the spec?

I'm also -1 on changing Node's behavior without changing the spec - the whole point of adding WHATWG URLs to Node was to make browser interoperability easier and settle on a single specification for URLs in both platforms.

It's not really clear to me what the proposed change is. To make port accept a Number as well as a String and throw if it's out-of-bound? That seems somewhat backwards incompatible. It's also not entirely clear to me it's worth it. It'd help to know when you'd write the kind of code in OP.

@annevk let's say you read the port from a parsed file containing URLS or another third party source:

var port = readPort(); // attacker returns 30 ** 30
// our validation: don't allow opening a connection to a lower-than 1024 port to the server.
if (typeof port !== 'number' || port < 1024) { 
  return false; 
}
serverUrl.port = port;
download(serverUrl); // user connected to port we didn't mean them to be able to connect to

Sorry, did not mean to offend anyone, I have no intentions to exclude any gender from the discussion (@benjamingr's anecdote is correct, I meant it as a gender-neutral term). I've since edited my comment to exclude the offending word.

Regarding the issue:
This is a snippet from the node v9.9.0 documentation:

The port value may be set as either a number or as a String containing a number in the range 0 to 65535 (inclusive). Setting the value to the default port of the URL objects given protocol will result in the port value becoming the empty string ('').

The proposed change is, thus, to make sure the number is in the valid range specified in the documentation. Another possibility is to change the documentation to match the current behaviour. However, In my opinion at least, something is clearly slightly broken here.

@benjamingr that's fairly compelling. I'd support trying to change it given the security angle.

Opened issue in whatwg repo https://github.com/whatwg/url/issues/377

@benjamingr Great. Does that mean I shouldn't open a PR, and await their fix instead?

@nodeav well, ideally you'd take a stab at making a PR to the spec although I get that this might be more intimidating. Alternatively you can wait for the spec to change and then PR into Node.

I'll look into that, thanks. The docs are still wrong currently; Should I open a PR to fix the docs instead?
(As a user I'd expect the port parameter to accept a number, which conflicts with the docs changes I'm talking about)

@nodeav yes please! :)

I'm still working on the doc change,
however it looks like since the whatwg spec talks specifically about an input String, as opposed to an input Number, a bug still exists here, either in the doc or in the code; The NodeJS docs say the port can be set as a number, which is then range-checked.

a snippet from the whatwg spec:
Let port be the mathematical integer value that is represented by buffer in radix-10 using ASCII digits for digits with values 0 through 9.

Correct me if I'm wrong, but it seems to me that the root of this bug is the inconsistency when converting a number to a string in node (actually, the ECMA standard), not passing numbers in scientific notation to whatwg.
Accepting a number as an input is an addition to the standard, and the inconsistency should be handled.
For example, once the string "1e32" is passed to whatwg for checking, it acts as defined in the spec and returns 1.
Passing 1e32 to node's url.port setter, however, results in unexpected behavior for the user.

@trivikr I just wanted to make sure the argument I have presented has been considered prior to closing this issue, and that the final decision has been made in regards to changing the code. Thanks.

@nodeav This issue was closed automatically by commit message. Reopening

Should I create a unit test for this case, Or is it covered by whatwg somehow?

Well, the next step would be to follow up with whatwg

@benjamingr Not sure how I can help with the whatwg thing, and I also happen to still feel this is a Node-specific bug.
Converting numbers to scientific notation is an ECMA thing, not a whatwg one.
Node should, IMO, handle numbers being passed to the setter instead of strings - it already converts Numbers to Strings, but doesn't handle large numbers correctly in respect to what whatwg specifies as valid input.x

@nodeav Again, it's not a Node.js bug when it acts according to the WHATWG and W3C specifications, and also browsers. You could try filing an issue against whatwg/url.

As @TimothyGu said - this is required if we want to keep compatibility with browsers - I think we should give the standards way a chance and to proceed in that route.

But the whatwg standard doesn't say anything about accepting numbers as an input, only strings. Nodejs chooses to accept numbers as an input, and does the conversion to a String according to the ECMA specification - however, this conversion conflicts with whatwg's allowed input, which doesn't accept numbers in scientific notation represented in a string, nor numbers (integers) at all, only strings representing decimal numbers in a straightforward manner.

Node does not provide this. A Python implementation would not encounter this problem. Am I missing something?

from the spec:

> A URL-port string must be zero or more ASCII digits.
> An ASCII digit is a code point in the range U+0030 (0) to U+0039 (9), inclusive.

And the full 'algorithm':

port state
> If c is an ASCII digit, append c to buffer.

Otherwise, if one of the following is true

c is the EOF code point, U+002F (/), U+003F (?), or U+0023 (#)

url is special and c is U+005C ()

state override is given

then:

If buffer is not the empty string, then:

Let port be the mathematical integer value that is represented by buffer in radix-10 using ASCII digits for digits with values 0 through 9.

If port is greater than 2^16 − 1, validation error, return failure.

Set url’s port to null, if port is url’s scheme’s default port, and to port otherwise.

Set buffer to the empty string.

If state override is given, then return.

Set state to path start state, and decrease pointer by one.

Otherwise, validation error, return failure.

@nodeav yeah, you're missing how IDL works. It specifies the ToString behavior Node.js (and browsers) have.

@annevk Thanks, I'll go bug the whatwg people instead then!

@nodeav I think you meant to ping Anne and not Anna :)

FWIW Anne _is_ "whatwg people" - we can pursue the issue in https://github.com/whatwg/url/issues/377

@benjamingr haha, indeed. Let's continue there. Thanks for the patience, everyone

Was this page helpful?
0 / 5 - 0 ratings