IMPACT: Before I get into the specifics, I want to be clear about what the impact is for npm. It's actually pretty limited. As best as we can tell this only impacts when _saving_ a file-type specifier to a package.json with npm, not when installing from one. As such, it can be worked around by editing the package.json.
#7234 introduced a change to how url.format printed invalid file: type URLs. Specifically if someone gave it the invalid file:/absolute/path/to/file.txt it would translate that into the valid file:///absolute/path/to/file.txt. Previously invalid URLs were left unaltered.
npm's use of file URLs are usually to point at relative paths, which isn't something file URLs are really setup to support. The way npm does this is tack file: on the front, so npm -i -S ./foo/ becomes file:foo and npm -i -S ../foo becomes file:../foo.
As of v7, those become file://foo and file://../foo respectively. The first being a reference to / on the foo host, and the latter being a reference to /foo on the .. host.
Ok, so Node v7 is making legacy URLs "healthy" where it didn't used to and that bit our particular use case. We can update npm to work around that, but there is an additional wrinkle…
We don't actually call url.format('file:foo'). What we call is:
url.format({protocol: 'file', slashes: false, pathname: 'foo'})
It'd be legit to call this a doc bug and update there, imo, though ignoring properties in the object form feels odd to me. Either way, we'll be updating our code to not rely onurl.format.
Thanks for the use-case @iarna
@nodejs/ctc since we approved this we ought to consider the impact and whether and how we want to apply mitigation in this case. I see a few ways forward:
url.URL as the future to fix these kinds of mistakes and just leave url aloneurl.format({protocol: 'file', slashes: false, pathname: 'foo'}), it's arguably bad form of us to make an assumption that it's an absolute path, even if file urls officially don't do relative.@iarna Just to make sure I'm understanding: npm can and will work around this, so even if we update the docs and that's it, things will be OK with npm (assuming people upgrade npm). In other words, you are informing us of a use case that we seem to have failed to consider and we should make a decision as to whether or not we should accommodate that use case. But either way, npm will be fine. Or am I misunderstanding?
I'm OK with any of the options @rvagg listed, but (assuming @iarna's answer to my question in the preceding comment is "yes, that is correct") I'd prefer the first one: retain the current behavior and update the docs.
And my second choice is the last one (basically, keep the current behavior generally, but add relative paths as a possibility even though relative paths are not permitted in file urls according to spec).
But I can see reasons to be for and against each of the possibilities.
And I would (of course) feel differently if this were causing widespread breakage in the ecosystem. If that were the case, I'd probably be all for reverting.
Actually...I just did a test in a browser, and Chrome, at least, understands relative file paths. Given that "do what the browser does" seems to be the general philosophy for url, I have to change my preference to: Support relative paths.
While I would generally prefer url.parse() to work correctly, the key reason I implemented url.URL separately was to avoid ecosystem breakage. I'd rather leave url.parse() as it is, warts and all, and move users over to url.URL as we can.
I gotta say, while I understand that there are times when a strictly conformant URL implementation is useful, I have found url's easy going nature to be very useful, in both formatting and parsing, it allows it to be used for all kinds of non-internet standard encoding of url-like data, as npm found.
I have found url's easy going nature to be very useful
At the risk of getting a little too far off topic: I mostly agree, but...there are things lurking... Well, here's an example: Given a FQDN with one component longer than the officially-allowed 63 characters, url.parse() will truncate it to 63 characters (which is acceptable IMO) and decide that the leftover part should be pushed over to the path (which is a problem IMO). Behold:
> url.parse('http://www.ThisIsAnAbsurdlyLongHostNameButSurelyNoneOfThisShouldEndUpAsPartOfThePathBecauseWhoCouldPossiblyWantThatAMIRITE.com/CorrectPathBeginning')
Url {
protocol: 'http:',
slashes: true,
auth: null,
host: 'www.thisisanabsurdlylonghostnamebutsurelynoneofthisshouldendupaspar',
port: null,
hostname: 'www.thisisanabsurdlylonghostnamebutsurelynoneofthisshouldendupaspar',
hash: null,
search: null,
query: null,
pathname: '/tOfThePathBecauseWhoCouldPossiblyWantThatAMIRITE.com/CorrectPathBeginning',
path: '/tOfThePathBecauseWhoCouldPossiblyWantThatAMIRITE.com/CorrectPathBeginning',
href: 'http://www.thisisanabsurdlylonghostnamebutsurelynoneofthisshouldendupaspar/tOfThePathBecauseWhoCouldPossiblyWantThatAMIRITE.com/CorrectPathBeginning' }
>
Maybe I just have failed to think of an appropriate use case (a 63-character TLD maybe?), but I cannot imagine a situation where the above behavior is desirable. It's easy to think of situations where it leads to bugs. (Basically: Every situation where it might happen.) And I would not be surprised to find that it has the potential for security issues in the right context, although I may lack the creativity to contrive a reasonable example of that myself.
In any event, I think we should do what browsers do, which appears to be to not bother validating the char length of components for the FQDN. Garbage in/garbage out, but at least make it sensible garbage that clearly maps to the input.
But that's not what's being discussed here. Just a long aside. Thanks for indulging me. (If that topic _is_ of interest, it's being discussed in https://github.com/nodejs/node/pull/9292.)
@Trott
Just to make sure I'm understanding: npm can and will work around this, so even if we update the docs and that's it, things will be OK with npm (assuming people upgrade npm).
Yes, that's right.
@jasnell It's not url.parse that's causing issues. It's url.format. Though I suppose your point still stands. =)
Looking at the docs right now, I don't think this is a doc omission (although the docs could be more clear). For url.format(), it says:
- If either the
urlObject.slashesproperty is true,urlObject.protocol
begins with one ofhttp,https,ftp,gopher, orfile, or
urlObject.protocolisundefined, the literal string//will be appended
toresult.
That's not exactly the clearest prose expression of all time. But it does say that if the protocol property is set to file, the slashes are appended.
(FWIW, the word either should ideally be removed, as there are more than two options listed.)
So, unless I'm missing something, I think we're covered in the docs if it is determined that sticking with the current behavior is the most desirable course of action.
(I am going to file a PR to reword it for clarity, though.)
We shipped our own change for this in [email protected] which I'll be downstreaming to you all this Thursday, Dec 22nd. (Well, 4.1.1 actually, but yeah.)
This issue has been inactive for sufficiently long that it seems like perhaps it should be closed. Feel free to re-open (or leave a comment requesting that it be re-opened) if you disagree. I'm just tidying up and not acting on a super-strong opinion or anything like that.
Most helpful comment
We shipped our own change for this in [email protected] which I'll be downstreaming to you all this Thursday, Dec 22nd. (Well, 4.1.1 actually, but yeah.)