Node: Don't resolve symlinks when requiring

Created on 16 Oct 2015  ยท  204Comments  ยท  Source: nodejs/node

Currently, Node.js resolves symlinks when requiring and then uses the real location of the package/file as its __filename and __dirname instead of the symlinked one.

This is a problem because symlinked modules don't act the same as locally copied modules.

For example:

    index.js //require("dep1")
    node_modules
        dep1
            index.js //require("dep2")
        dep2
            index.js //console.log('fun!'):

works, but

    index.js //require("dep1")
    node_modules
        dep1 -> ../../dep1
        dep2
            index.js
dep1
    index.js //require("dep2")

does not, because dep1 does not act like it's located in the node_modules directory of module and thus cannot find dep2.

This is especially a problem when one wants to symlink modules that have peer-dependencies.

Therefore, I suggest changing the behavior to no longer resolve symlinks.
What do you think?

discuss module

Most helpful comment

Let's assume we have the following directory structure:

.
โ”œโ”€โ”€ index.js
โ””โ”€โ”€ node_modules
    โ”œโ”€โ”€ dep1
    โ”‚ย ย  โ”œโ”€โ”€ index.js
    โ”‚ย ย  โ””โ”€โ”€ node_modules
    โ”‚ย ย      โ””โ”€โ”€ dep2 -> ../../dep2
    โ””โ”€โ”€ dep2
        โ”œโ”€โ”€ index.js
        โ””โ”€โ”€ node_modules
            โ””โ”€โ”€ dep1 -> ../../dep1

7 directories, 3 files

Where dep1 -> dep2, dep2 -> 1 - a very simple circular dependency (index.js -> dep1).

sh-3.2$ cat index.js
require('dep1')

sh-3.2$ cat node_modules/dep
dep1/ dep2/

sh-3.2$ cat node_modules/dep1/index.js
require('dep2')
console.log('dep1')

sh-3.2$ cat node_modules/dep2/index.js
require('dep1')
console.log('dep2')

If I run node index.js, I get the following output:

sh-3.2$ node index.js
dep1
dep2
dep1
dep2
dep1
dep2
dep1
dep2
dep1
dep2
dep1
dep2
dep1
dep2
dep1
dep2
dep1
dep2
dep1
dep2
dep1
dep2
dep1
dep2
dep1
dep2
dep1
dep2
dep1
dep2
dep1
dep2
dep1

Which is definitely not expected.

require.cache:

[ '/Users/alex/repos/ied/sandbox/index.js',
  '/Users/alex/repos/ied/sandbox/node_modules/dep1/index.js',
  '/Users/alex/repos/ied/sandbox/node_modules/dep1/node_modules/dep2/index.js',
  '/Users/alex/repos/ied/sandbox/node_modules/dep1/node_modules/dep2/node_modules/dep1/index.js',
  '/Users/alex/repos/ied/sandbox/node_modules/dep1/node_modules/dep2/node_modules/dep1/node_modules/dep2/index.js',
  '/Users/alex/repos/ied/sandbox/node_modules/dep1/node_modules/dep2/node_modules/dep1/node_modules/dep2/node_modules/dep1/index.js',
  '/Users/alex/repos/ied/sandbox/node_modules/dep1/node_modules/dep2/node_modules/dep1/node_modules/dep2/node_modules/dep1/node_modules/dep2/index.js',
  '/Users/alex/repos/ied/sandbox/node_modules/dep1/node_modules/dep2/node_modules/dep1/node_modules/dep2/node_modules/dep1/node_modules/dep2/node_modules/dep1/index.js',
  '/Users/alex/repos/ied/sandbox/node_modules/dep1/node_modules/dep2/node_modules/dep1/node_modules/dep2/node_modules/dep1/node_modules/dep2/node_modules/dep1/node_modules/dep2/index.js',
  '/Users/alex/repos/ied/sandbox/node_modules/dep1/node_modules/dep2/node_modules/dep1/node_modules/dep2/node_modules/dep1/node_modules/dep2/node_modules/dep1/node_modules/dep2/node_modules/dep1/index.js',
  '/Users/alex/repos/ied/sandbox/node_modules/dep1/node_modules/dep2/node_modules/dep1/node_modules/dep2/node_modules/dep1/node_modules/dep2/node_modules/dep1/node_modules/dep2/node_modules/dep1/node_modules/dep2/index.js',
  '/Users/alex/repos/ied/sandbox/node_modules/dep1/node_modules/dep2/node_modules/dep1/node_modules/dep2/node_modules/dep1/node_modules/dep2/node_modules/dep1/node_modules/dep2/node_modules/dep1/node_modules/dep2/node_modules/dep1/index.js',
  '/Users/alex/repos/ied/sandbox/node_modules/dep1/node_modules/dep2/node_modules/dep1/node_modules/dep2/node_modules/dep1/node_modules/dep2/node_modules/dep1/node_modules/dep2/node_modules/dep1/node_modules/dep2/node_modules/dep1/node_modules/dep2/index.js',
  '/Users/alex/repos/ied/sandbox/node_modules/dep1/node_modules/dep2/node_modules/dep1/node_modules/dep2/node_modules/dep1/node_modules/dep2/node_modules/dep1/node_modules/dep2/node_modules/dep1/node_modules/dep2/node_modules/dep1/node_modules/dep2/node_modules/dep1/index.js',
  '/Users/alex/repos/ied/sandbox/node_modules/dep1/node_modules/dep2/node_modules/dep1/node_modules/dep2/node_modules/dep1/node_modules/dep2/node_modules/dep1/node_modules/dep2/node_modules/dep1/node_modules/dep2/node_modules/dep1/node_modules/dep2/node_modules/dep1/node_modules/dep2/index.js',
  '/Users/alex/repos/ied/sandbox/node_modules/dep1/node_modules/dep2/node_modules/dep1/node_modules/dep2/node_modules/dep1/node_modules/dep2/node_modules/dep1/node_modules/dep2/node_modules/dep1/node_modules/dep2/node_modules/dep1/node_modules/dep2/node_modules/dep1/node_modules/dep2/node_modules/dep1/index.js',
  '/Users/alex/repos/ied/sandbox/node_modules/dep1/node_modules/dep2/node_modules/dep1/node_modules/dep2/node_modules/dep1/node_modules/dep2/node_modules/dep1/node_modules/dep2/node_modules/dep1/node_modules/dep2/node_modules/dep1/node_modules/dep2/node_modules/dep1/node_modules/dep2/node_modules/dep1/node_modules/dep2/index.js',
  '/Users/alex/repos/ied/sandbox/node_modules/dep1/node_modules/dep2/node_modules/dep1/node_modules/dep2/node_modules/dep1/node_modules/dep2/node_modules/dep1/node_modules/dep2/node_modules/dep1/node_modules/dep2/node_modules/dep1/node_modules/dep2/node_modules/dep1/node_modules/dep2/node_modules/dep1/node_modules/dep2/node_modules/dep1/index.js',
  '/Users/alex/repos/ied/sandbox/node_modules/dep1/node_modules/dep2/node_modules/dep1/node_modules/dep2/node_modules/dep1/node_modules/dep2/node_modules/dep1/node_modules/dep2/node_modules/dep1/node_modules/dep2/node_modules/dep1/node_modules/dep2/node_modules/dep1/node_modules/dep2/node_modules/dep1/node_modules/dep2/node_modules/dep1/node_modules/dep2/index.js',
  '/Users/alex/repos/ied/sandbox/node_modules/dep1/node_modules/dep2/node_modules/dep1/node_modules/dep2/node_modules/dep1/node_modules/dep2/node_modules/dep1/node_modules/dep2/node_modules/dep1/node_modules/dep2/node_modules/dep1/node_modules/dep2/node_modules/dep1/node_modules/dep2/node_modules/dep1/node_modules/dep2/node_modules/dep1/node_modules/dep2/node_modules/dep1/index.js',
  '/Users/alex/repos/ied/sandbox/node_modules/dep1/node_modules/dep2/node_modules/dep1/node_modules/dep2/node_modules/dep1/node_modules/dep2/node_modules/dep1/node_modules/dep2/node_modules/dep1/node_modules/dep2/node_modules/dep1/node_modules/dep2/node_modules/dep1/node_modules/dep2/node_modules/dep1/node_modules/dep2/node_modules/dep1/node_modules/dep2/node_modules/dep1/node_modules/dep2/index.js',
  '/Users/alex/repos/ied/sandbox/node_modules/dep1/node_modules/dep2/node_modules/dep1/node_modules/dep2/node_modules/dep1/node_modules/dep2/node_modules/dep1/node_modules/dep2/node_modules/dep1/node_modules/dep2/node_modules/dep1/node_modules/dep2/node_modules/dep1/node_modules/dep2/node_modules/dep1/node_modules/dep2/node_modules/dep1/node_modules/dep2/node_modules/dep1/node_modules/dep2/node_modules/dep1/index.js',
  '/Users/alex/repos/ied/sandbox/node_modules/dep1/node_modules/dep2/node_modules/dep1/node_modules/dep2/node_modules/dep1/node_modules/dep2/node_modules/dep1/node_modules/dep2/node_modules/dep1/node_modules/dep2/node_modules/dep1/node_modules/dep2/node_modules/dep1/node_modules/dep2/node_modules/dep1/node_modules/dep2/node_modules/dep1/node_modules/dep2/node_modules/dep1/node_modules/dep2/node_modules/dep1/node_modules/dep2/index.js',
  '/Users/alex/repos/ied/sandbox/node_modules/dep1/node_modules/dep2/node_modules/dep1/node_modules/dep2/node_modules/dep1/node_modules/dep2/node_modules/dep1/node_modules/dep2/node_modules/dep1/node_modules/dep2/node_modules/dep1/node_modules/dep2/node_modules/dep1/node_modules/dep2/node_modules/dep1/node_modules/dep2/node_modules/dep1/node_modules/dep2/node_modules/dep1/node_modules/dep2/node_modules/dep1/node_modules/dep2/node_modules/dep1/index.js',
  '/Users/alex/repos/ied/sandbox/node_modules/dep1/node_modules/dep2/node_modules/dep1/node_modules/dep2/node_modules/dep1/node_modules/dep2/node_modules/dep1/node_modules/dep2/node_modules/dep1/node_modules/dep2/node_modules/dep1/node_modules/dep2/node_modules/dep1/node_modules/dep2/node_modules/dep1/node_modules/dep2/node_modules/dep1/node_modules/dep2/node_modules/dep1/node_modules/dep2/node_modules/dep1/node_modules/dep2/node_modules/dep1/node_modules/dep2/index.js',
  '/Users/alex/repos/ied/sandbox/node_modules/dep1/node_modules/dep2/node_modules/dep1/node_modules/dep2/node_modules/dep1/node_modules/dep2/node_modules/dep1/node_modules/dep2/node_modules/dep1/node_modules/dep2/node_modules/dep1/node_modules/dep2/node_modules/dep1/node_modules/dep2/node_modules/dep1/node_modules/dep2/node_modules/dep1/node_modules/dep2/node_modules/dep1/node_modules/dep2/node_modules/dep1/node_modules/dep2/node_modules/dep1/node_modules/dep2/node_modules/dep1/index.js',
  '/Users/alex/repos/ied/sandbox/node_modules/dep1/node_modules/dep2/node_modules/dep1/node_modules/dep2/node_modules/dep1/node_modules/dep2/node_modules/dep1/node_modules/dep2/node_modules/dep1/node_modules/dep2/node_modules/dep1/node_modules/dep2/node_modules/dep1/node_modules/dep2/node_modules/dep1/node_modules/dep2/node_modules/dep1/node_modules/dep2/node_modules/dep1/node_modules/dep2/node_modules/dep1/node_modules/dep2/node_modules/dep1/node_modules/dep2/node_modules/dep1/node_modules/dep2/index.js',
  '/Users/alex/repos/ied/sandbox/node_modules/dep1/node_modules/dep2/node_modules/dep1/node_modules/dep2/node_modules/dep1/node_modules/dep2/node_modules/dep1/node_modules/dep2/node_modules/dep1/node_modules/dep2/node_modules/dep1/node_modules/dep2/node_modules/dep1/node_modules/dep2/node_modules/dep1/node_modules/dep2/node_modules/dep1/node_modules/dep2/node_modules/dep1/node_modules/dep2/node_modules/dep1/node_modules/dep2/node_modules/dep1/node_modules/dep2/node_modules/dep1/node_modules/dep2/node_modules/dep1/index.js',
  '/Users/alex/repos/ied/sandbox/node_modules/dep1/node_modules/dep2/node_modules/dep1/node_modules/dep2/node_modules/dep1/node_modules/dep2/node_modules/dep1/node_modules/dep2/node_modules/dep1/node_modules/dep2/node_modules/dep1/node_modules/dep2/node_modules/dep1/node_modules/dep2/node_modules/dep1/node_modules/dep2/node_modules/dep1/node_modules/dep2/node_modules/dep1/node_modules/dep2/node_modules/dep1/node_modules/dep2/node_modules/dep1/node_modules/dep2/node_modules/dep1/node_modules/dep2/node_modules/dep1/node_modules/dep2/index.js',
  '/Users/alex/repos/ied/sandbox/node_modules/dep1/node_modules/dep2/node_modules/dep1/node_modules/dep2/node_modules/dep1/node_modules/dep2/node_modules/dep1/node_modules/dep2/node_modules/dep1/node_modules/dep2/node_modules/dep1/node_modules/dep2/node_modules/dep1/node_modules/dep2/node_modules/dep1/node_modules/dep2/node_modules/dep1/node_modules/dep2/node_modules/dep1/node_modules/dep2/node_modules/dep1/node_modules/dep2/node_modules/dep1/node_modules/dep2/node_modules/dep1/node_modules/dep2/node_modules/dep1/node_modules/dep2/node_modules/dep1/index.js',
  '/Users/alex/repos/ied/sandbox/node_modules/dep1/node_modules/dep2/node_modules/dep1/node_modules/dep2/node_modules/dep1/node_modules/dep2/node_modules/dep1/node_modules/dep2/node_modules/dep1/node_modules/dep2/node_modules/dep1/node_modules/dep2/node_modules/dep1/node_modules/dep2/node_modules/dep1/node_modules/dep2/node_modules/dep1/node_modules/dep2/node_modules/dep1/node_modules/dep2/node_modules/dep1/node_modules/dep2/node_modules/dep1/node_modules/dep2/node_modules/dep1/node_modules/dep2/node_modules/dep1/node_modules/dep2/node_modules/dep1/node_modules/dep2/index.js',
  '/Users/alex/repos/ied/sandbox/node_modules/dep1/node_modules/dep2/node_modules/dep1/node_modules/dep2/node_modules/dep1/node_modules/dep2/node_modules/dep1/node_modules/dep2/node_modules/dep1/node_modules/dep2/node_modules/dep1/node_modules/dep2/node_modules/dep1/node_modules/dep2/node_modules/dep1/node_modules/dep2/node_modules/dep1/node_modules/dep2/node_modules/dep1/node_modules/dep2/node_modules/dep1/node_modules/dep2/node_modules/dep1/node_modules/dep2/node_modules/dep1/node_modules/dep2/node_modules/dep1/node_modules/dep2/node_modules/dep1/node_modules/dep2/node_modules/dep1/index.js',
  '/Users/alex/repos/ied/sandbox/node_modules/dep1/node_modules/dep2/node_modules/dep1/node_modules/dep2/node_modules/dep1/node_modules/dep2/node_modules/dep1/node_modules/dep2/node_modules/dep1/node_modules/dep2/node_modules/dep1/node_modules/dep2/node_modules/dep1/node_modules/dep2/node_modules/dep1/node_modules/dep2/node_modules/dep1/node_modules/dep2/node_modules/dep1/node_modules/dep2/node_modules/dep1/node_modules/dep2/node_modules/dep1/node_modules/dep2/node_modules/dep1/node_modules/dep2/node_modules/dep1/node_modules/dep2/node_modules/dep1/node_modules/dep2/node_modules/dep1/node_modules/dep2/index.js',
  '/Users/alex/repos/ied/sandbox/node_modules/dep1/node_modules/dep2/node_modules/dep1/node_modules/dep2/node_modules/dep1/node_modules/dep2/node_modules/dep1/node_modules/dep2/node_modules/dep1/node_modules/dep2/node_modules/dep1/node_modules/dep2/node_modules/dep1/node_modules/dep2/node_modules/dep1/node_modules/dep2/node_modules/dep1/node_modules/dep2/node_modules/dep1/node_modules/dep2/node_modules/dep1/node_modules/dep2/node_modules/dep1/node_modules/dep2/node_modules/dep1/node_modules/dep2/node_modules/dep1/node_modules/dep2/node_modules/dep1/node_modules/dep2/node_modules/dep1/node_modules/dep2/node_modules/dep1/index.js' ]

So even under the assumption that the current _algorithm_ is correct, the actual implementation is currently incorrect. This PR broke circular dependencies.

Therefore I believe that it would absolutely make sense to revert this PR and think about a better solution.


Compared to 5.0.0

dep1
[ '/Users/alex/repos/ied/sandbox/index.js',
  '/Users/alex/repos/ied/sandbox/node_modules/dep1/index.js',
  '/Users/alex/repos/ied/sandbox/node_modules/dep2/index.js' ]

All 204 comments

This would break npm link, wouldn't it?

if you are requiring dep1, and it is a symlink, and it doesn't have dep2 in its path then something is off is it not? Should the module not include its own node_module folder?

That being said flat dependencies in npm@3 do offer a weird edge case to this. But it is worth bringing up that I am pretty sure they are deprecating peer dependencies.

@Trott why would it? In theory, everything that works with the real path should also work fine with the virtual path. The only real problem that i see is when a module gets required multiple times through different symlinks it would not be cached by npm and get loaded multiple times. But if that really happens, something is badly designed i think.

@TheAlphaNerd
No, because its a peer dependency.
I don't think NPM is depreciating peer dependencies in gerenal, but changing its behavior to just warn when a peer dependency is not fullfilled. Maybe they're even gonna drop the peerDependencies property, but that does not prevent people to peer depend on things. Devs will then just need to manage them on their own.

I don't think the need for peer dependencies will go away in the future.

I've also asked for a new npm command that would only work if we fix the behavior of require like i suggested here: https://github.com/npm/npm/issues/10000#issuecomment-148737934

@VanCoding I was thinking of certain edge cases but I wasn't thinking very deeply about it, so yeah, that particular comment may be a non-issue.

The real point I was sorta kinda trying to gently make was better put by @othiym23 (emphasis added):

I do think you're right that require()'s behavior is unhelpful in this case, but I also think that given how important having that behavior locked down is to the stability of the whole Node ecosystem, it's going to be very tough to change now.

Any semver-major change to require() could result in all sorts of (potentially hard-to-predict) ecosystem breakage. So the bar is very, very high.

I know, but I think the current behavior maybe even counts as a bug because
there seems to be no reason to behave like this..
Am 17.10.2015 7:06 nachm. schrieb "Rich Trott" [email protected]:

Moreover, the Modules API (which require is a part of) is Locked which
means:

Only fixes related to security, performance, or bug fixes will be accepted.
Please do not suggest API changes in this area; they will be refused.

โ€”
Reply to this email directly or view it on GitHub
https://github.com/nodejs/node/issues/3402#issuecomment-148934364.

The litmus test is this: is there a non-zero chance the proposed change is going to break someone's existing application? If the answer is 'yes' (and I think it is), then it should be rejected.

Well, how are we going to calculate that chance? And what does a zero
chance mean? Zero like not even module is going to break or not 1 % or more
of the pakages? If it's the latter, then i really doubt that 1% of the
modules would be affected by that change. If there really are modules that
break with this change then theyre making use of undocumented (in my
opinion buggy) behavior and therefore it's okay when it breaks.

Let's look at it like this: one can always get the real path from the
virtual path, but if one gets the real path, the virtual path is lost.

So even if we break some modules, isn't the right decision to fix this? I
really think it would help fix issues for npm as well.

But it of course is your decision.
Am 17.10.2015 7:34 nachm. schrieb "Ben Noordhuis" <[email protected]

:

The litmus test is this: is there a non-zero chance the proposed change is
going to break someone's existing application? If the answer is 'yes' (and
I think it is), then it should be rejected.

โ€”
Reply to this email directly or view it on GitHub
https://github.com/nodejs/node/issues/3402#issuecomment-148936831.

@VanCoding, maybe hard links would help. I haven't tried it.

@dlongley You can't hardlink a directory I think?

I am having the exact same issue as @VanCoding and I'm also a bit confused by the current behaviour...?

I would expect a linked package to resolve dependencies based on the location it was linked to. It should of course check it's own node_modules first, but when going back "up the tree" I would expect it to check the folder structure it was linked to, not where it was linked from.

@asbjornenge,

You can't hardlink a directory I think?

No, you can't, but maybe you could write a script to create a mock directory and hard link any js files. I _think_ that would be all it would take for simple modules. It's not a perfect solution but maybe it would help in some cases. Clearly there needs to be a better way to link peer dependencies together during development.

I would expect a linked package to resolve dependencies based on the location it was linked to.

Me too, but I believe this has been discussed before -- and there may be projects out there that are depending on the current behavior. We'll just need to come up with a way to specify that the other behavior is desired.

and there may be projects out there that are depending on the current behavior

As I said before:

  • while there most likely are such modules, there are only a few (estimated way below 1% of the modules)
  • they'll be easy to fix, while it's impossible to work around the current behavior
  • they're making use of undocumented behavior

And as it seems...

  • developers already expect it to behave like i suggested, yet they'd be surprised when they find out how it really behaves

We'll just need to come up with a way to specify that the other behavior is desired.

@dlongley, I don't like it. I'd rather 'keep it simple ..' in the long-term.

@VanCoding makes a compelling argument.

Perhaps we should make this minor 'breaking change', and just add an opt-out capability. The opt-out should be removed in a future version..

I agree with @VanCoding's argument.

Perhaps we should make this minor 'breaking change', and just add an opt-out capability. The opt-out should be removed in a future version.

:+1: I'm in favor of this if it can get consensus.

Perhaps we should make this minor 'breaking change', and just add an opt-out capability. The opt-out should be removed in a future version.

That is an addition, so semver-minor. I will warn you that we are very conservative about doing anything with the module system.

If someone wants to put together a PR for this (with the understanding that, again, the bar for this is very, very high and so even if it's a Great Idea and an Excellent Implementation, it's still very likely that this Will Not Happen), it might be interesting to use https://github.com/nodejs/citgm (if that tool is ready for general use?!) to see if anything Too Big To Fail in the ecosystem chokes on the change.

(To head off the obvious and explain the bold comment above: Yes, it's easy to fix any modules that break. That's not good enough though. Here's the deal: Let's say this breaks the async module. Fixing it in a new version of async is theoretically no problem. But then getting the thousands and thousands of modules that depend on async to update to the fixed version is a huge problem. And getting everyone using those thousands and thousands of modules to update _those_ modules to fixed versions... This is why a stable and imperfect module API is vastly preferable to a dynamic module API.)

Of course, if there was a way to confirm that no modules on npm depend on the current behavior, that would go a long way (in my opinion, anyway) towards making the change palatable. I have no idea how one would go about demonstrating that, though. But you can use citgm to at least confirm that it won't break any of a handful of the most important ones.

@Fishrock123, I tend to think my suggestion should wait until the next semver-major since I suggested a breaking change with 'opt-out'?

@Trott, These are good points. The last thing sysadmins want is trouble when upgrading a datacenter to a new version of Node.js.

We should also consider any arguments for current behavior besides compatibility. Are there cases where the current behavior is more intuitive than the proposed change? If not then I think the proposed change should be implemented sooner or later.

I know that Modules are at Stability 3 - Locked.

Only fixes related to security, performance, or bug fixes will be accepted.
Please do not suggest API changes in this area; they will be refused.

There are good ways to introduce a breaking change over a period of time. Statistics would indicate the best course of action - even if it means extra work on the front end before this change can be implemented.

This is causing me pain right now. I'm working on an enhancement to a library that automagically loads plugins by doing require(pluginName). But since I've used npm link to install the library, the require fails because pluginName can't be found in the tree.

I think this behavior is unexpected, especially given the interaction with npm link. I think a symlinked package should behave identically to an installed package in the same location.

From the modules docs:

If it is not found there, then it moves to the parent directory, and so on, until the root of the file system is reached.

I suppose this is ambiguous, but I would argue that the "parent directory" should be relative to the path used to load the module. If I do require('/home/me/project/node_modules/foo/index.js'), and index.js does require('something-else'), I think it is incorrect to look up something-else relative to anything other than the full path in the first require.

@Trott it might be worth taking a look at https://github.com/brson/taskcluster-crater to see how they test the rust ecosystem against potentially dangerous changes.

@fenduru :+1:

Looking at the examples from the documentation, it is, IMO, unexpected behavior for a file that happens to be a symlink to behave differently. I would expect it to behave the same as any other file, which is as @fenduru described.

Furthermore, the documentation gives one good reason for the way module searching is performed:

This allows programs to localize their dependencies, so that they do not clash.

The fact that searching does not work this way with symlink'd modules makes it very difficult to make use of localized dependencies (this is the chief complaint). As pointed out above, it also causes different (and unexpected) behavior with npm link. Perhaps this issue is better classified as an "unexpected behavior" bug rather than a request for change.

Just a note here that a fix to the unexpected behavior shouldn't change the existing behavior for module identifiers that are native or that begin with '/', '../', or './'. In those latter three cases it does make sense to use realpath; changing that would be incorrect and I imagine would wreak havoc.

I also found this require/symlink behavior to be unexpected and undesirable. It goes against common UNIX practice.

Out of curiosity - does node on Windows also have this require/symlink behavior? (At one time Windows did not support symlinks). If node behavior on UNIX and Windows differ in this regard that would strengthen to case against symlink resolving in require() on UNIX to be consistent.

At the very least an optional node command line flag could be introduced to disable resolving symlinks when require()ing.

@kzc,

Out of curiosity - does node on Windows also have this require/symlink behavior?

Assuming that fs.realPathSync dereferences symlinks on windows then it seems so (but I only checked quickly): https://github.com/nodejs/node/blob/master/lib/module.js#L150

It looks like the problem is that no distinction is made between loading a module with an identifier that starts with '/', '../', or './' and one that doesn't -- all are given the realpath treatment (and it doesn't matter which OS you're using). If, instead, realpath were only used in the former cases, then symlinks would be treated like other files (I think) and you'd get the expected behavior and the ability to localize dependencies for symlink'd modules.

Regardless of whether require()d modules have relative or absolute paths they still could be converted to canonicalized absolute paths _without_ resolving symlinks. I see the need for the module cache to use absolute paths so that it knows when it's dealing with the same module referenced from various different relative paths. But if someone explicitly introduces a symlink in the path of a module then I'd argue that most would expect it to be treated differently than the symlink resolved version according to its non-resolved directory position.

@kzc,

Regardless of whether require()d modules have relative or absolute paths they still could be converted to canonicalized absolute paths without resolving symlinks.

This is problematic for the case shown here: http://grokbase.com/t/gg/nodejs/12cbaygh7a/require-resolving-symlink

This is required because of the way that executables are typically
linked. On Unix, you have something like:

/usr/local/node_modules/foo/bin/foo.js

which does `require("../lib/foo-internals.js")`.

That script is linked to

/usr/local/bin/foo

If we didn't realpath, then it'd be looking for
/usr/local/lib/foo-internals.js instead of
/usr/local/node_modules/foo/lib/foo-internals.js.

The node start script is a special case that could be symlink resolved if the file has a shebang or a non .js suffix.

I'm having the same problem as @fenduru.

  1. Lib installs plugins via npm install, they end up in the correct directory (user project directory).
  2. I also checked require.main.paths[0], it's 100% correct and contains the user project directory.
  3. Nonetheless: require and require.resolve both throw.

This is unexpected.

It only happens when symlinked / npm link'd. In a normal installation everything works as expected.
I don't want to npm publish every single change - how am I gonna continue developing the lib?

Since the directory is actually included in require.main.paths this issue is undoubtedly a bug if you ask me. At the very least it's worth fixing.

:+1:

+1 this is a pain and makes npm link almost useless to me (see https://github.com/npm/npm/issues/5080).

Resolving symlinks this way is just plain wrong and node should not be doing it.

In order to fix this issue, it seems node would need to be able to load multiple instances of the same module file (same realpath location).

This means storing symlinked module files in the require.cache using their symlink names instead of their realpath names. There would need to be a new instance in the cache for each module that is loaded via a different path (different symlink) in order to ensure that the proper search paths are used for any require() calls made from within that module. Once this is done, the search path for finding another module could first walk up the symlink path and then, if no match is found, it could walk up the realpath. This might help with backwards compatibility.

However, this seems to be a significant departure from how the require cache currently works. How existing projects would be affected by this is unclear. Though, it may really only affect projects that use peer dependencies -- which are the projects that need this fix. Is a change to fix this unexpected behavior palatable? What do you think, @jasnell?

Is there anyone interested in this issue that feels comfortable enough to create a PR to try and resolve it in a sensible way?

To summarize the core problem here:

If you want to work on a dependency in the context of a dependent package, you add a node_modules sub-directory to the dependent package and link to its dependency there. Whatever changes you make to the dependency will be reflected in the dependent package.

If you want to work on one of N peer dependencies in the context of a dependent package, you have no such remedy. There is no simple mechanism by which to link to a single peer dependency in the context of the dependent package. This makes developing systems that use peer dependencies unnecessarily difficult.

If we change the module search path behavior to walk up looking for <parent>/node_modules using the symlink path first and then the real path, this problem would be eliminated. Developers of projects that do not use peer dependencies but want to ensure the proper dependencies are loaded, may continue to use the current behavior of adding a node_modules sub-directory. This is how dependency conflicts were already resolved in existing projects.

The non-peer dependency use case that would change would be the following:

A dependent package required a non-peer dependency that was symlinked. That dependency did not have all of its sub-dependencies in a local node_modules sub-directory. Therefore, its realpath parent directories were walked looking for the sub-dependency in <parent>/node_modules until it was found. With this new behavior, its symlink path parent directories would be walked first, prior to walking its realpath parent directories. This could potentially find a different sub-dependency. The original behavior can be easily restored by adding a local node_modules sub-directory.

The only question is, were packages previously depending on this behavior?

This behavior should not affect any npm installed packages unless they have custom install scripts that create symlinks in this manner (and it seems questionable why this would be done at all). There is a potential for this to affect globally installed and npm linked packages, but only if npm install -g does not ensure all required sub-non-peer-dependencies are installed in a node_modules sub-directory. If it doesn't have this assurance, it could be added in future versions to ensure correct behavior with new versions of node.

Here's an alternative proposal to resolve this issue that should not impact any packages unless they are already using peer dependencies, and therefore, may not need to be considered a potentially breaking change.

Node could be made aware of "peerDependencies" by checking for this value in package.json:

Suppose we have a package top which depends on two packages framework and plugin, where plugin is symlinked to another directory containing a development version of plugin:

|--top
   |--node_modules
     |--framework
     |--plugin (symlink to /dev/plugin)
|--dev
   |--plugin
      |--package.json ({"peerDependencies": {"framework": "^1.0.0"}})
   |--framework

In this scenario, we want top to load plugin from the dev directory and we want plugin to load top/framework not dev/framework.

The plugin package includes a peerDependencies entry in its package.json that has framework as a key. When the top script calls require('plugin'), node finds plugin via its normal mechanism. However, when loading plugin, node sees its package.json file and that it has a peerDependencies entry. This causes plugin to be identified via its absolute path (which may be a symlink) instead of its realpath. From an implementation perspective, an entry in require.cache uses the absolute path as the key and a new module instance as the value (unless previously cached from the absolute path).

Now, when the plugin script calls require('framework'), since framework is in the peerDependencies entry in package.json, it will be be searched for by first walking plugin's module.parent (which will be top's path) before walking its realpath. Any other require target (not present in peerDependencies would be searched for using the module's realpath as usual.

I believe this approach would handle the use case at hand (developing packages that use peer dependencies and using npm link with peer dependencies) without affecting any other packages.

Does anyone here have feedback on this proposal? Would people prefer this one over the previous one -- or neither? General thoughts?

I really don't think node should be looking in package.json and doing special things with peerDependencies. It should work as expected for any symlinked package. There is no need to special case peerDependencies if node ceases to mess around with resolving symlinks.

I don't see the need to walk up the realpath at all and I would suggest that (excepting any case for backwards compatibility, possibly temporarily) this should be avoided if possible.

Consider:

A/node_modules/B (symlink to B)
A/node_modules/C
B
  • require('C') from A should use A/node_modules/C
  • require('C') from B should use B/node_modules/C if available, and fallback to A/node_modules/C (this does not happen at the moment)

Why would you _ever_ want to start searching in A/B's parent?

Even if you think about linking sub-dependencies:

A/node_modules/B/node_modules/C (symlink to C)
C

Searching the realpath _still_ makes absolutely no sense - that would mean searching A/C's parent.

The only possible scenario I can think of where searching the realpath would be necessary would be to link to a dependency which is halfway down someone else's tree:

A/node_modules/D (symlink to B/node_modules/D???)
B/node_modules/C
B/node_modules/D

...which makes no sense whatsoever. Why would you do this!?!?

So, never resolve symlinks ever, always use the virtual path and don't do anything special.

I'd be interested if someone could make a patch for this (even if imperfect) so we could go about testing it. I also think we could get someone to help whoever is interested, if necessary. :)

Related issue:

[rfc] module: change load order #4595

I agree 100% with @SystemParadox, I see no reason what so ever to resolve the symlink.
Links are supposed to help development by simulating a classic node environment and still be able to make changes easily.
If changing this affects anyone it most likely is only in local environment using crazy hacks because of this behavior.
You can't write a module assuming your parent or dependencies are at a different "real" location since you don't have any control on the file structure when your module is being installed on another machine. Plus you most likely can't write any code that will create symlinks since you can't assume you are even allowed to (Creating symlinks are disabled for users on Windows by default).
Plus, I still fail to see a use case where the current behavior is desirable or even used anywhere at the moment. The only thing I see are people trying very hard to circumvention this bad behavior in node to be able to work with symlink. Sometimes, I had to resolve to copy my module to the final destination and develop directly there so I could test it.

I created an issue a long time ago at nodejs/node-v0.x-archive/issues/18203 and I think I will repost the issue here since this is where the discussion is.

I have been working for a while with link to develop multiple module or simply reuse my local version.
However, every time I link a module using peerDependencies the link just becomes very painful.

For instance, if I have a structure like

[email protected] D:\temp\pkg1
โ”œโ”€โ”€ [email protected]
โ””โ”€โ”€ [email protected] -> D:\temp\pkg2

And for whatever reason pkg2 tries to require lodash then I get the error Error: Cannot find module 'lodash'.
However, without the link, it works perfectly.
An even greater problem is if I happened to have lodash installed globally it works with the link, but not using the version intended because I could have version 2.8.0 installed globally.
This is worst, because you don't get an error, everything runs and at one point it might break and you will have to search everywhere to finally realise you were not using the right version because of your link.

Now I wanted to try to find the source of the problem and maybe fix it.
I realized that when you require pkg2 in my example, its path will be D:\temp\pkg2\index.js instead of D:\temp\pkg1\node_modules\pkg2\index.js which result in the resolve paths to be

paths: Array[3]
0: "D:\temp\pkg2\node_modules"
1: "D:\temp\node_modules"
2: "D:\node_modules"

Instead of

paths: Array[4]
0: "D:\temp\pkg1\node_modules\pkg2\node_modules"
1: "D:\temp\pkg1\node_modules"
2: "D:\temp\node_modules"
3: "D:\node_modules"

I tried to change the function tryFile in module.js as follow

// check if the file exists and is not a directory
function tryFile(requestPath) {
  var fs = NativeModule.require('fs');
  var stats = statPath(requestPath);
  if (stats && !stats.isDirectory()) {
    return requestPath; // <= Fix: use the symlink directly
    //return fs.realpathSync(requestPath, Module._realpathCache);
  }
  return false;
}

And this was the extend of the change required to fix the issue

Instead of using fs.realpathSync() for module require resolution, it ought be normalized with path.resolve() which returns an absolute canonical path without resolving symlinks.

Conceptually it would better to have something like this patch against node-v5.5.0:

--- lib/module.js.orig  2016-01-24 12:25:44.000000000 -0500
+++ lib/module.js   2016-01-24 12:41:44.000000000 -0500
@@ -123,7 +123,7 @@
 }

 function toRealPath(requestPath) {
-  return fs.realpathSync(requestPath, Module._realpathCache);
+  return path.resolve(requestPath);
 }

 // given a path check a the file exists with any of the set extensions
@@ -287,6 +287,7 @@
     debug('Module._load REQUEST %s parent: %s', request, parent.id);
   }

+  if (isMain) request = fs.realpathSync(request);
   var filename = Module._resolveFilename(request, parent);

   var cachedModule = Module._cache[filename];

The sole place where resolving symlinks is necessary is if the node start script happens to be a symlink - see isMain above.

Of course lstat data would have to be cached through some other means since Module._realpathCache would no longer be used. Patch above did not address that implementation detail.

@VanCoding

The only real problem that i see is when a module gets required multiple times through different symlinks it would not be cached by npm and get loaded multiple times.

Yes, caching is a real problem. Currently, you could just symbol link all deps to a global store because they share the same cache. Though npm doesn't do that but theroically someone could create a package manager use such method. See: https://nodejs.org/dist/latest-v5.x/docs/api/modules.html#modules_addenda_package_manager_tips

I created a PR for this issue here: https://github.com/nodejs/node/pull/5950

Passes all of Canary in the Gold Mine on linux and windows, things are looking pretty good as far as stability goes. W/R/T caching, modules still get cached, just according to their canonical symbolic path -- I'd argue that modules should never get cached according to their realpath because then we're just back to fighting the symbolic path abstractions that the filesystem gives us.

I think this issue can be closed out now.

Done!

Unfortunately, just like my previous comment metioned, it may break the cache behavior of alternative package managers. And I tested ied, pnpm, npminstall under Node V6 today. All those alternative package installers are rely on the original symbol link behavior and now they all lose cache in Node V6! That means if you install babel6 with these tools, you will suffer from performance issue same as babel6 in npm2. ๐Ÿ˜ญ

@jasnell I suggest revert this PR before we found the solution.

cc the authors of ied @alexanderGugel , pnpm @rstacruz, npminstall @fengmk2 and @dead-horse

https://github.com/nodejs/node/pull/5950/files#diff-d1234a869b3d648ebfcdce5a76747d71R119 It's broken all faster node package installers. I think module api is locked long ago, why we need to chagne this behavior?

@hax @fengmk2 Well, it's a new major version, so it's allowed (and expected) to introduce breaking changes. If you're relying on the old behavior, there are still the old versions, including LTS.

Also please do not forget, that the previous behavior was not documented in any way, so packages that now are broken made use of undocumented behavior, which you always have to expect to break.

I think me and others made very clear why this change was needed.

The good thing is: There should be ways to fix the now broken packages to work with Node 6.x again, since you can always get the real path from the virtual path. It was not possible the other way around.

It needs changing because resolving symlinks like this is wrong and broken.

Can someone please explain to me why package managers are having issues with this? It just sounds to me like they're using the same fragile behaviour of resolving symlinks when they shouldn't. If they were doing it right already I don't think they would have any issues... regardless of which way node is doing it.

Also please do not forget, that the previous behavior was not documented in any way, so packages that now are broken made use of undocumented behavior, which you always have to expect to break.

It is actually documented.

Since Node.js looks up the realpath of any modules it loads (that is, resolves symlinks), and then looks for their dependencies in the node_modules folders as described here, this situation is very simple to resolve with the following architecture:

Source: Addenda: Package Manager Tips

We're heavily relying on symlinks in ied. So do a lot of other projects. This change will result into hard to debug bugs and the benefit is IMO very limited. The previous behaviour is still well-documented and it's even described under Addenda: Package Manager Tips. If not on that, what can we rely on when writing a package manager?

cc @TheAlphaNerd

And module API is Stability: 3 - Locked https://nodejs.org/api/modules.html#modules_modules, it should be keep very stable.

image

If we lost the module cache, there is no way to make npm install faster and efficient.

@alexanderGugel wow, you're right. I didn't know this exists.

But still, it's not documented clearly. Actually, it only states that it resolves symlinks when searching for a module. But It does not state that __dirname and __filename in the loaded module will also have the resolved path. It also does not state, that requires from such modules will originate from the resolved path.

So, basically, we could keep the documented behavior and still break your code ;)

@fengmk2 only documented behavior is locked, I think. Changes are still allowed, as long as it follows the spec.

But still, it's not documented clearly.

Since Node.js looks up the realpath of any modules it loads (that is, resolves symlinks), and then looks for their dependencies in the node_modules folders as described here, this situation is very simple to resolve with the following architecture:

I would argue it is extremely clear and breaks a locked API.

This change will result into hard to debug bugs and the benefit is IMO very limited.

In addition to being poorly documented the previous behavior of symlinked modules was counter-intuitive, ran counter to the common practise of UNIX symlinks and prevented developers from easily testing their modules without copying and installing.

@kzc That's a different question and the original reasoning behind the change. It does not, however, justify breaking a locked API that was mentioned under an agenda for package managers.

@alexanderGugel This new behavior is consistent with my interpretation of the documented module spec.

What breaks in these other package managers exactly? Perhaps we help can find a fix.

@VanCoding @kzc

As my understanding, changes are allowed but not in this manner.

This change is too rush (just 2 days before 6.0 release) and lack the careful thought and testing. I already pointed out current behavior is not an undocumented and give the document link, and warned the possible breakings in the comment 29 days before.

Now it breaks all alternative package installers. :rage1:

@hax,

Now it breaks all alternative package installers.

When you say it "breaks" them, do you mean they don't function properly or do you mean they are now less efficient?

As I said here:

https://github.com/nodejs/node/pull/5950#issuecomment-215101726

This was actually _broken_ behavior for another use case, with no recourse, prior to the fixing of this bug.

Thanks for the heads up on this. Indeed this will break pnpm's structure,
bummer!
On Apr 27, 2016 9:48 PM, "HE Shi-Jun" [email protected] wrote:

@VanCoding https://github.com/VanCoding @kzc https://github.com/kzc

As my understanding, changes are allowed but not in this manner.

This change is too rush (just 2 days before 6.0 release) and lack the
careful thought and testing. I already pointed out current behavior is not
an undocumented and give the document link, and warned the possible
breakings in the comment 29 days
https://github.com/nodejs/node/issues/3402#issuecomment-202921652.

Now it breaks all alternative package installers [image: :rage1:]

โ€”
You are receiving this because you were mentioned.
Reply to this email directly or view it on GitHub
https://github.com/nodejs/node/issues/3402#issuecomment-215088696

Since Node.js looks up the realpath of any modules it loads (that is, resolves symlinks)

https://nodejs.org/download/release/v6.0.0/docs/api/modules.html#modules_addenda_package_manager_tips v6.0.0 module document still tell us will resolves symlinks.

cc @nodejs/ctc

@fengmk2,

v6.0.0 module document still tell us will resolves symlinks.

For the main script, it will dereference, otherwise it will not.

@dlongley any modules it loads it's meaning every module, not only the main script.

and then looks for their dependencies in the node_modules folders...

It still looks for dependencies the same way. If you symlink to a module that has a node_modules subdir in its realpath, it will look there for dependencies. This behavior has not changed. What it will not do is change the identifier for the module to the realpath, thereby breaking module resolution behavior when moving _up_ the path. That documentation does not say what path will be used to identify the module you loaded, it just says how it will find dependencies.

When looking for dependencies, the most sane, expected behavior is to look in any local node_modules directories and then move up the path -- from where the module was required, not from its dereferenced location on disk. If you do use the dereferenced path, you won't pull in modules that are specified in your project, you'll pull things from somewhere else, which is unexpected _and_ you can't do anything to fix it. If you specify a framework in your project, that is the one that should load your plugin, not some other framework elsewhere on disk.

I've been thinking about this more in the context of caching package managers and the package manager tips. Perhaps I am missing something, but shouldn't that still work as described even with this change?

Maybe someone with package manager issues could explain in more detail what's going wrong? @rstacruz, how does this break pnpm? I looked at your .store layout and I can't see why that would have any issues with this fix (on a side note, pnpm looks awesome! I am definitely going to try it).

Let's take a step back just a bit. Before considering reverting anything or debating whether or not this breaks a locked API, we should work on finding a solution that addresses _both_ issues. As I understand it so far (please correct me if I'm wrong) the new behavior does not _stop_ the alternative package managers from operating, just makes their operation less efficient? Is that correct?

If I'm reading the the above conversation correctly, this change renders a performance optimization used by some of the alternative package managers ineffective because it causes the module to be cached under according to the symlink rather than the realpath, leading to a situation where the same module code can be loaded multiple times if the symlinks are different? Is that also correct?

Assuming that is the situation, it would seem to me that the real problem is that we are inappropriately conflating module identity with lookup paths. A proper solution should seek to decouple the lookup path from the cache key.

@dlongley ... can I ask that you (and whomever else you've been working on this) look into finding a solution to the identity issue that also preserves the fix to problem this issue was originally looking to solve.

There is another issue here: it would appear that we had zero tests verifying the old behavior and no coverage of alternative package management strategies in our ecosystem smoke-testing. @TheAlphaNerd, can you please see about adding some of these alternative package managers to the citgm tests?

@jasnell,

If I'm reading the the above conversation correctly, this change renders a performance optimization used by some of the alternative package managers ineffective because it causes the module to be cached under according to the symlink rather than the realpath, leading to a situation where the same module code can be loaded multiple times if the symlinks are different? Is that also correct?

If a symlink is different, a different module instance is assumed, yes.

Assuming that is the situation, it would seem to me that the real problem is that we are inappropriately conflating module identity with lookup paths. A proper solution should seek to decouple the lookup path from the cache key.

Well, for peer dependencies, you _want_ a different in-memory instance of a module if the symlink path is different. If you have the same instance in memory, you could potentially erroneously communicate with two different frameworks (with the same in memory module) if your application happens to load two different versions of the same framework. I do think this is perhaps an unlikely use case, but it would be unexpected, unfixable behavior if you did encounter it. Specifying a different path to a module is a way to establish a clear boundary. Perhaps a cache could still be used but avoid instantiating the code.

In any case, I am +1 for finding a solution for both issues.

I take a look at lib/module.js#L388, may be we can change the Module._cache key to realpath and solve lose cache problem.

  var filename = Module._resolveFilename(request, parent, isMain);
  var cacheKey = fs.realpathSync(filename);

  var cachedModule = Module._cache[cacheKey];
  if (cachedModule) {
    return cachedModule.exports;
  }

The node 6.0.0 behavior is completely consistent with the documented behavior of require:

https://nodejs.org/api/modules.html#modules_module_caching_caveats

Module Caching Caveats#

Modules are cached based on their resolved filename. Since modules may resolve to a different filename based on the location of the calling module (loading from node_modules folders), it is not a guarantee that require('foo') will always return the exact same object, if it would resolve to different files.

Additionally, on case-insensitive file systems or operating systems, different resolved filenames can point to the same file, but the cache will still treat them as different modules and will reload the file multiple times. For example, require('./foo') and require('./FOO') return two different objects, irrespective of whether or not ./foo and ./FOO are the same file.

@kzc ... at this point what would likely be more productive would be to say that there is some ambiguity in the docs as to what is or is not the expected behavior, and the module system is a currently a bit inconsistent with itself in several regards. Rather than dwell on that, we have one specific problem to address: how can we _preserve_ the corrected behavior this change implements while _also_ keeping the alternative package managers from having to suffer a performance hit. If at all possible, let's focus the conversation on that without dwelling any longer on what the docs happen to say (or not say)

I think if you're not using peer dependencies (or that is, modules that are identified by a path located _up_ path in your parent), you should not assume that you'll be getting the same copy of a particular module in memory. The good news is -- if you do want modules to share the same instance of a module in memory, you can be guaranteed to if you place that shared module (or a symlink to it) in your parent's node_modules path. I'm not familiar with alternative node module installers -- can they be reworked to take advantage of this guarantee?

I am very for solving the cache issue and can devote some time into coming up with a solution.

@fengmk2 unfortunately, caching on the realpath will just introduce more issues here -- say you want to have two different versions of a framework in your app (that still share the same name). In v6, symlinking to each framework directory accomplishes that and will load each in as separate, but if we cache on realpath then one of the projects is going to load in and supersede the other in a kind of non deterministic way.

@jasnell
In theory, there are ways how package managers could work around the problem:

On installation, modify the package.json's "main" property, to point to a newly created file that's in the same directory as package.json and contains:

module.exports = require("/absolute/path/to/module");

It's not pretty, but I think it could work.

but if we cache on realpath then one of the projects is going to load in an already cached and initialized object.

@lamara I think same realpath should be require once only.

Here's an attempt to pin down the semantics underlying the problem here:

Modules need to be uniquely identified.

Alternative package managers are essentially indicating that the same package.json version of a module should indicate the uniqueness of a module. I'm only guessing, but I would expect that they are written with this guarantee but implement it by exploiting the old realpath behavior. I assume they put a module on disk somewhere, and for that particular version of the module, they always symlink to that one instance on disk.

Now, that kind of uniqueness isn't explicitly true in node; it never has been. The reason is that node doesn't actually know about the version field in package.json. In node, modules are _actually_ uniquely identified by path. This is also a simple way to reason about modules and how they'll be loaded -- the simplest identifiers we have to work with are file names. The documentation also indicates that loading a module using a different path should give you a different module.

In any case, in order to implement the kind of uniqueness alternative package managers desired, an undocumented (and unexpected) implementation detail, namely, the use of realpath to establish module identifiers, was exploited. To be clear, the child paths searched when loading modules is documented by example -- but this behavior isn't actually reliant upon the use of realpath which is misleading in the documentation. This same loading behavior persists when realpath is not used. To reiterate @jasnell's point -- clearly the documentation is inconsistent and needs work.

Now, symlinks are files in their own right. They are a filesystem abstraction that allow you to create new identifiers but, when read, actually pull data from elsewhere. In keeping with the mechanism used to identify modules, a symlink identifies a new module, even if the contents of that module on disk is the same. This ensures that you get a fresh copy of that module if it hasn't loaded yet. Furthermore, by preserving them as identifiers, the lookup path for modules that are in the _parent path_ functions in an expected fashion now. Previously, modules that you had specifically indicated in your project would not be loaded, but rather, undesirable modules from elsewhere on disk would load.

Is it possible for alternative package managers to similarly exploit this now consistent, expected behavior to achieve the kind of uniquness that they want? Can you find ways to place modules in parent paths in an appropriate fashion? npm v3 takes a similar approach to ensure versions of the same module are only installed once and instantiated once.

I should also mention -- that the form of uniqueness put forth by alternative package managers (if I'm correct about it), won't actually work in some cases for users of peer dependencies, if they are employing more than one version of a framework in their application. So if that use case is to be supported, a change would be needed.

_Author of ied here (from which pnpm was derived AFAIK)_

Alternative package managers are essentially indicating that the same package.json version of a module should indicate the uniqueness of a module.

No, the shasum of the package is being used in order to uniquely identify it. That being said, the way the package manager itself identifies those modules is irrelevant. What is relevant is that fact that those are being used in order to construct the path names from which the actual (as in "no symlinks") are being linked.

I'm only guessing, but I would expect that they are written with this guarantee but implement it by exploiting the old realpath behavior.

Correct. The fact that Node follows the realpath was exploited. I would argue that's the way it is being documented:

Since Node.js looks up the realpath of any modules it loads (that is, resolves symlinks), and then looks for their dependencies in the node_modules folders as described here [...]

I assume they put a module on disk somewhere, and for that particular version of the module, they always symlink to that one instance on disk.

Correct as well. If a depends on b, a symlink will be created in a/node_modules/b -> /some/where/b.

Now, that kind of uniqueness isn't explicitly true in node; it never has been.

I would argue it was, since

  1. the API is locked
  2. the Module cache has always been implemented that way
  3. it's pretty common to require singletons assuming that the same package will be returned

The reason is that node doesn't actually know about the version field in package.json. In node, modules are actually uniquely identified by path.

Which is perfectly fine. Node doesn't (nor should it) care about a specific version of the package. Instead it should use the real path (like it did before). That way uniqueness can be guaranteed without sacrificing performance due to redundant modules.

In any case, in order to implement the kind of uniqueness alternative package managers desired, an undocumented (and unexpected) implementation detail, namely, the use of realpath to establish module identifiers, was exploited.

To be clear, the child paths searched when loading modules is documented by example -- but this behavior isn't actually reliant upon the use of realpath which is misleading in the documentation. This same loading behavior persists when realpath is not used. To reiterate @jasnell's point -- clearly the documentation is inconsistent and needs work.

Agreed. Again, even if it wasn't documented, the API is locked.

Is it possible for alternative package managers to similarly exploit this now consistent, expected behavior to achieve the kind of uniquness that they want? Can you find ways to place modules in parent paths in an appropriate fashion? npm v3 takes a similar approach to ensure versions of the same module are only installed once and instantiated once.

Unfortunately not. We need to be able to explicitly reference packages and get the exact same module instance back. Otherwise the only way to implement circular dependencies is by flattening trees and basically doing what npm3 does.

Let's make this a bit more concrete. Suppose we have the following...

     /index.js
     /node_modules
                  /foo --> ../../modules/foo
                  /bar (@version=1)
                  /baz
                      /node_modules
                                   /foo --> ../../../../modules/foo
                                   /bar (@version=2)
modules
       /foo
          require('bar')
       /bar (@version=3)

Suppose the the foo module exports the following, making the assumption that bar is a peer-dependency:

const bar = require('bar');
module.exports = bar;

Now suppose that our three different versions of bar each do (in order of version)

myApp/node_modules/bar/index.js

module.exports = 1;

myApp/node_modules/baz/node_modules/bar/index.js

module.exports = 2;

modules/bar/index.js

module.exports = 3;

And baz does the following:

module.exports = require('foo');

What is the expected result when running node myApp/index.js?

const foo = require('foo');
const baz = require('baz');
console.log(foo);
console.log(baz);

Under v5.11.0, the behavior is for an error to be thrown because bar cannot be resolved relative to the realpath of foo:

bash-3.2$ node -v
v5.11.0
bash-3.2$ node index
module.js:341
    throw err;
    ^

Error: Cannot find module 'bar'
    at Function.Module._resolveFilename (module.js:339:15)
    at Function.Module._load (module.js:290:25)
    at Module.require (module.js:367:17)
    at require (internal/module.js:20:19)
    at Object.<anonymous> (/Users/james/tmp/symlink/modules/foo/index.js:3:13)
    at Module._compile (module.js:413:34)
    at Object.Module._extensions..js (module.js:422:10)
    at Module.load (module.js:357:32)
    at Function.Module._load (module.js:314:12)
    at Module.require (module.js:367:17)
bash-3.2$ 

Note that v5.11.0 throws _even_ with /modules/bar existing at the same level as /modules/foo.

The behavior in master (and v6) with this change applied is for foo to find the version of bar that is appropriate for it:

bash-3.2$ ~/node/main/node/node -v
v7.0.0-pre
bash-3.2$ ~/node/main/node/node index
1
2
bash-3.2$ 

This has the side effect that despite foo being pulled from the same location on disk, it maintains different internal state (one instance uses bar version 1, the other uses bar version 2, which, based on the way the symlinks are set up, is _exactly_ what one would expect it to do. The two symlink'd instances of foo are treated as if they were not symlinks at all and the peer dependencies are resolved relative to their location in the relevant node_module paths.

@jasnell,

This has the side effect that despite foo being pulled from the same location on disk, it maintains different internal state (one instance uses bar version 1, the other uses bar version 2, which, based on the way the symlinks are set up, is exactly what one would expect it to do. The two symlink'd instances of foo are treated as if they were not symlinks at all and the peer dependencies are resolved relative to their location in the relevant node_module paths.

+1, yes, this is exactly the expected behavior and what node 6.x does. Thinking from the perspective of the person that set up myApp, that's just the output I'd want to see.

@VanCoding

module.exports = require("/absolute/path/to/module");
It's not pretty, but I think it could work.

Yes this could work and lerna use this method. Maybe alternative package managers could also use this method, I'm not sure...

Before we resort to "not pretty" hacks ;-) ... let's see if there's a better solution to be found.

@alexanderGugel ... I note that you gave my comment https://github.com/nodejs/node/issues/3402#issuecomment-215146463 a thumbs down, care to explain your thoughts further?

@jasnell I do not agree with the premise that that's the expected behaviour. I'm currently putting together an example as well.

I make a pr https://github.com/nodejs/node/pull/6425 to let require cache back.

@jasnell,

I haven't put much thought into this, so just throwing the idea out there ...

One potential way forward is to change the caching behavior based on whether a module is loaded from a child node_modules directory, or from somewhere up the parent path. The repercussions of this would have to be thought through. However, I'm not a fan, in general, of anything more complex than uniquely identifying modules by path, because it gets difficult to reason about.

@fengmk2 I don't think your pr can solve the problem completely... because with current change of behavior (treat symbol link same as copy), it means it's possible to get different lookup path with different module refer same module by symbol links. I think we need to solve the semantic problem first.

Let's assume we have the following directory structure:

.
โ”œโ”€โ”€ index.js
โ””โ”€โ”€ node_modules
    โ”œโ”€โ”€ dep1
    โ”‚ย ย  โ”œโ”€โ”€ index.js
    โ”‚ย ย  โ””โ”€โ”€ node_modules
    โ”‚ย ย      โ””โ”€โ”€ dep2 -> ../../dep2
    โ””โ”€โ”€ dep2
        โ”œโ”€โ”€ index.js
        โ””โ”€โ”€ node_modules
            โ””โ”€โ”€ dep1 -> ../../dep1

7 directories, 3 files

Where dep1 -> dep2, dep2 -> 1 - a very simple circular dependency (index.js -> dep1).

sh-3.2$ cat index.js
require('dep1')

sh-3.2$ cat node_modules/dep
dep1/ dep2/

sh-3.2$ cat node_modules/dep1/index.js
require('dep2')
console.log('dep1')

sh-3.2$ cat node_modules/dep2/index.js
require('dep1')
console.log('dep2')

If I run node index.js, I get the following output:

sh-3.2$ node index.js
dep1
dep2
dep1
dep2
dep1
dep2
dep1
dep2
dep1
dep2
dep1
dep2
dep1
dep2
dep1
dep2
dep1
dep2
dep1
dep2
dep1
dep2
dep1
dep2
dep1
dep2
dep1
dep2
dep1
dep2
dep1
dep2
dep1

Which is definitely not expected.

require.cache:

[ '/Users/alex/repos/ied/sandbox/index.js',
  '/Users/alex/repos/ied/sandbox/node_modules/dep1/index.js',
  '/Users/alex/repos/ied/sandbox/node_modules/dep1/node_modules/dep2/index.js',
  '/Users/alex/repos/ied/sandbox/node_modules/dep1/node_modules/dep2/node_modules/dep1/index.js',
  '/Users/alex/repos/ied/sandbox/node_modules/dep1/node_modules/dep2/node_modules/dep1/node_modules/dep2/index.js',
  '/Users/alex/repos/ied/sandbox/node_modules/dep1/node_modules/dep2/node_modules/dep1/node_modules/dep2/node_modules/dep1/index.js',
  '/Users/alex/repos/ied/sandbox/node_modules/dep1/node_modules/dep2/node_modules/dep1/node_modules/dep2/node_modules/dep1/node_modules/dep2/index.js',
  '/Users/alex/repos/ied/sandbox/node_modules/dep1/node_modules/dep2/node_modules/dep1/node_modules/dep2/node_modules/dep1/node_modules/dep2/node_modules/dep1/index.js',
  '/Users/alex/repos/ied/sandbox/node_modules/dep1/node_modules/dep2/node_modules/dep1/node_modules/dep2/node_modules/dep1/node_modules/dep2/node_modules/dep1/node_modules/dep2/index.js',
  '/Users/alex/repos/ied/sandbox/node_modules/dep1/node_modules/dep2/node_modules/dep1/node_modules/dep2/node_modules/dep1/node_modules/dep2/node_modules/dep1/node_modules/dep2/node_modules/dep1/index.js',
  '/Users/alex/repos/ied/sandbox/node_modules/dep1/node_modules/dep2/node_modules/dep1/node_modules/dep2/node_modules/dep1/node_modules/dep2/node_modules/dep1/node_modules/dep2/node_modules/dep1/node_modules/dep2/index.js',
  '/Users/alex/repos/ied/sandbox/node_modules/dep1/node_modules/dep2/node_modules/dep1/node_modules/dep2/node_modules/dep1/node_modules/dep2/node_modules/dep1/node_modules/dep2/node_modules/dep1/node_modules/dep2/node_modules/dep1/index.js',
  '/Users/alex/repos/ied/sandbox/node_modules/dep1/node_modules/dep2/node_modules/dep1/node_modules/dep2/node_modules/dep1/node_modules/dep2/node_modules/dep1/node_modules/dep2/node_modules/dep1/node_modules/dep2/node_modules/dep1/node_modules/dep2/index.js',
  '/Users/alex/repos/ied/sandbox/node_modules/dep1/node_modules/dep2/node_modules/dep1/node_modules/dep2/node_modules/dep1/node_modules/dep2/node_modules/dep1/node_modules/dep2/node_modules/dep1/node_modules/dep2/node_modules/dep1/node_modules/dep2/node_modules/dep1/index.js',
  '/Users/alex/repos/ied/sandbox/node_modules/dep1/node_modules/dep2/node_modules/dep1/node_modules/dep2/node_modules/dep1/node_modules/dep2/node_modules/dep1/node_modules/dep2/node_modules/dep1/node_modules/dep2/node_modules/dep1/node_modules/dep2/node_modules/dep1/node_modules/dep2/index.js',
  '/Users/alex/repos/ied/sandbox/node_modules/dep1/node_modules/dep2/node_modules/dep1/node_modules/dep2/node_modules/dep1/node_modules/dep2/node_modules/dep1/node_modules/dep2/node_modules/dep1/node_modules/dep2/node_modules/dep1/node_modules/dep2/node_modules/dep1/node_modules/dep2/node_modules/dep1/index.js',
  '/Users/alex/repos/ied/sandbox/node_modules/dep1/node_modules/dep2/node_modules/dep1/node_modules/dep2/node_modules/dep1/node_modules/dep2/node_modules/dep1/node_modules/dep2/node_modules/dep1/node_modules/dep2/node_modules/dep1/node_modules/dep2/node_modules/dep1/node_modules/dep2/node_modules/dep1/node_modules/dep2/index.js',
  '/Users/alex/repos/ied/sandbox/node_modules/dep1/node_modules/dep2/node_modules/dep1/node_modules/dep2/node_modules/dep1/node_modules/dep2/node_modules/dep1/node_modules/dep2/node_modules/dep1/node_modules/dep2/node_modules/dep1/node_modules/dep2/node_modules/dep1/node_modules/dep2/node_modules/dep1/node_modules/dep2/node_modules/dep1/index.js',
  '/Users/alex/repos/ied/sandbox/node_modules/dep1/node_modules/dep2/node_modules/dep1/node_modules/dep2/node_modules/dep1/node_modules/dep2/node_modules/dep1/node_modules/dep2/node_modules/dep1/node_modules/dep2/node_modules/dep1/node_modules/dep2/node_modules/dep1/node_modules/dep2/node_modules/dep1/node_modules/dep2/node_modules/dep1/node_modules/dep2/index.js',
  '/Users/alex/repos/ied/sandbox/node_modules/dep1/node_modules/dep2/node_modules/dep1/node_modules/dep2/node_modules/dep1/node_modules/dep2/node_modules/dep1/node_modules/dep2/node_modules/dep1/node_modules/dep2/node_modules/dep1/node_modules/dep2/node_modules/dep1/node_modules/dep2/node_modules/dep1/node_modules/dep2/node_modules/dep1/node_modules/dep2/node_modules/dep1/index.js',
  '/Users/alex/repos/ied/sandbox/node_modules/dep1/node_modules/dep2/node_modules/dep1/node_modules/dep2/node_modules/dep1/node_modules/dep2/node_modules/dep1/node_modules/dep2/node_modules/dep1/node_modules/dep2/node_modules/dep1/node_modules/dep2/node_modules/dep1/node_modules/dep2/node_modules/dep1/node_modules/dep2/node_modules/dep1/node_modules/dep2/node_modules/dep1/node_modules/dep2/index.js',
  '/Users/alex/repos/ied/sandbox/node_modules/dep1/node_modules/dep2/node_modules/dep1/node_modules/dep2/node_modules/dep1/node_modules/dep2/node_modules/dep1/node_modules/dep2/node_modules/dep1/node_modules/dep2/node_modules/dep1/node_modules/dep2/node_modules/dep1/node_modules/dep2/node_modules/dep1/node_modules/dep2/node_modules/dep1/node_modules/dep2/node_modules/dep1/node_modules/dep2/node_modules/dep1/index.js',
  '/Users/alex/repos/ied/sandbox/node_modules/dep1/node_modules/dep2/node_modules/dep1/node_modules/dep2/node_modules/dep1/node_modules/dep2/node_modules/dep1/node_modules/dep2/node_modules/dep1/node_modules/dep2/node_modules/dep1/node_modules/dep2/node_modules/dep1/node_modules/dep2/node_modules/dep1/node_modules/dep2/node_modules/dep1/node_modules/dep2/node_modules/dep1/node_modules/dep2/node_modules/dep1/node_modules/dep2/index.js',
  '/Users/alex/repos/ied/sandbox/node_modules/dep1/node_modules/dep2/node_modules/dep1/node_modules/dep2/node_modules/dep1/node_modules/dep2/node_modules/dep1/node_modules/dep2/node_modules/dep1/node_modules/dep2/node_modules/dep1/node_modules/dep2/node_modules/dep1/node_modules/dep2/node_modules/dep1/node_modules/dep2/node_modules/dep1/node_modules/dep2/node_modules/dep1/node_modules/dep2/node_modules/dep1/node_modules/dep2/node_modules/dep1/index.js',
  '/Users/alex/repos/ied/sandbox/node_modules/dep1/node_modules/dep2/node_modules/dep1/node_modules/dep2/node_modules/dep1/node_modules/dep2/node_modules/dep1/node_modules/dep2/node_modules/dep1/node_modules/dep2/node_modules/dep1/node_modules/dep2/node_modules/dep1/node_modules/dep2/node_modules/dep1/node_modules/dep2/node_modules/dep1/node_modules/dep2/node_modules/dep1/node_modules/dep2/node_modules/dep1/node_modules/dep2/node_modules/dep1/node_modules/dep2/index.js',
  '/Users/alex/repos/ied/sandbox/node_modules/dep1/node_modules/dep2/node_modules/dep1/node_modules/dep2/node_modules/dep1/node_modules/dep2/node_modules/dep1/node_modules/dep2/node_modules/dep1/node_modules/dep2/node_modules/dep1/node_modules/dep2/node_modules/dep1/node_modules/dep2/node_modules/dep1/node_modules/dep2/node_modules/dep1/node_modules/dep2/node_modules/dep1/node_modules/dep2/node_modules/dep1/node_modules/dep2/node_modules/dep1/node_modules/dep2/node_modules/dep1/index.js',
  '/Users/alex/repos/ied/sandbox/node_modules/dep1/node_modules/dep2/node_modules/dep1/node_modules/dep2/node_modules/dep1/node_modules/dep2/node_modules/dep1/node_modules/dep2/node_modules/dep1/node_modules/dep2/node_modules/dep1/node_modules/dep2/node_modules/dep1/node_modules/dep2/node_modules/dep1/node_modules/dep2/node_modules/dep1/node_modules/dep2/node_modules/dep1/node_modules/dep2/node_modules/dep1/node_modules/dep2/node_modules/dep1/node_modules/dep2/node_modules/dep1/node_modules/dep2/index.js',
  '/Users/alex/repos/ied/sandbox/node_modules/dep1/node_modules/dep2/node_modules/dep1/node_modules/dep2/node_modules/dep1/node_modules/dep2/node_modules/dep1/node_modules/dep2/node_modules/dep1/node_modules/dep2/node_modules/dep1/node_modules/dep2/node_modules/dep1/node_modules/dep2/node_modules/dep1/node_modules/dep2/node_modules/dep1/node_modules/dep2/node_modules/dep1/node_modules/dep2/node_modules/dep1/node_modules/dep2/node_modules/dep1/node_modules/dep2/node_modules/dep1/node_modules/dep2/node_modules/dep1/index.js',
  '/Users/alex/repos/ied/sandbox/node_modules/dep1/node_modules/dep2/node_modules/dep1/node_modules/dep2/node_modules/dep1/node_modules/dep2/node_modules/dep1/node_modules/dep2/node_modules/dep1/node_modules/dep2/node_modules/dep1/node_modules/dep2/node_modules/dep1/node_modules/dep2/node_modules/dep1/node_modules/dep2/node_modules/dep1/node_modules/dep2/node_modules/dep1/node_modules/dep2/node_modules/dep1/node_modules/dep2/node_modules/dep1/node_modules/dep2/node_modules/dep1/node_modules/dep2/node_modules/dep1/node_modules/dep2/index.js',
  '/Users/alex/repos/ied/sandbox/node_modules/dep1/node_modules/dep2/node_modules/dep1/node_modules/dep2/node_modules/dep1/node_modules/dep2/node_modules/dep1/node_modules/dep2/node_modules/dep1/node_modules/dep2/node_modules/dep1/node_modules/dep2/node_modules/dep1/node_modules/dep2/node_modules/dep1/node_modules/dep2/node_modules/dep1/node_modules/dep2/node_modules/dep1/node_modules/dep2/node_modules/dep1/node_modules/dep2/node_modules/dep1/node_modules/dep2/node_modules/dep1/node_modules/dep2/node_modules/dep1/node_modules/dep2/node_modules/dep1/index.js',
  '/Users/alex/repos/ied/sandbox/node_modules/dep1/node_modules/dep2/node_modules/dep1/node_modules/dep2/node_modules/dep1/node_modules/dep2/node_modules/dep1/node_modules/dep2/node_modules/dep1/node_modules/dep2/node_modules/dep1/node_modules/dep2/node_modules/dep1/node_modules/dep2/node_modules/dep1/node_modules/dep2/node_modules/dep1/node_modules/dep2/node_modules/dep1/node_modules/dep2/node_modules/dep1/node_modules/dep2/node_modules/dep1/node_modules/dep2/node_modules/dep1/node_modules/dep2/node_modules/dep1/node_modules/dep2/node_modules/dep1/node_modules/dep2/index.js',
  '/Users/alex/repos/ied/sandbox/node_modules/dep1/node_modules/dep2/node_modules/dep1/node_modules/dep2/node_modules/dep1/node_modules/dep2/node_modules/dep1/node_modules/dep2/node_modules/dep1/node_modules/dep2/node_modules/dep1/node_modules/dep2/node_modules/dep1/node_modules/dep2/node_modules/dep1/node_modules/dep2/node_modules/dep1/node_modules/dep2/node_modules/dep1/node_modules/dep2/node_modules/dep1/node_modules/dep2/node_modules/dep1/node_modules/dep2/node_modules/dep1/node_modules/dep2/node_modules/dep1/node_modules/dep2/node_modules/dep1/node_modules/dep2/node_modules/dep1/index.js',
  '/Users/alex/repos/ied/sandbox/node_modules/dep1/node_modules/dep2/node_modules/dep1/node_modules/dep2/node_modules/dep1/node_modules/dep2/node_modules/dep1/node_modules/dep2/node_modules/dep1/node_modules/dep2/node_modules/dep1/node_modules/dep2/node_modules/dep1/node_modules/dep2/node_modules/dep1/node_modules/dep2/node_modules/dep1/node_modules/dep2/node_modules/dep1/node_modules/dep2/node_modules/dep1/node_modules/dep2/node_modules/dep1/node_modules/dep2/node_modules/dep1/node_modules/dep2/node_modules/dep1/node_modules/dep2/node_modules/dep1/node_modules/dep2/node_modules/dep1/node_modules/dep2/index.js',
  '/Users/alex/repos/ied/sandbox/node_modules/dep1/node_modules/dep2/node_modules/dep1/node_modules/dep2/node_modules/dep1/node_modules/dep2/node_modules/dep1/node_modules/dep2/node_modules/dep1/node_modules/dep2/node_modules/dep1/node_modules/dep2/node_modules/dep1/node_modules/dep2/node_modules/dep1/node_modules/dep2/node_modules/dep1/node_modules/dep2/node_modules/dep1/node_modules/dep2/node_modules/dep1/node_modules/dep2/node_modules/dep1/node_modules/dep2/node_modules/dep1/node_modules/dep2/node_modules/dep1/node_modules/dep2/node_modules/dep1/node_modules/dep2/node_modules/dep1/node_modules/dep2/node_modules/dep1/index.js' ]

So even under the assumption that the current _algorithm_ is correct, the actual implementation is currently incorrect. This PR broke circular dependencies.

Therefore I believe that it would absolutely make sense to revert this PR and think about a better solution.


Compared to 5.0.0

dep1
[ '/Users/alex/repos/ied/sandbox/index.js',
  '/Users/alex/repos/ied/sandbox/node_modules/dep1/index.js',
  '/Users/alex/repos/ied/sandbox/node_modules/dep2/index.js' ]

We're still not quite at the "hey let's revert this" stage just yet. Let's keep working this through to see if we can solve _both_ problems. Great example tho @alexanderGugel ... I'm working through it now to see if I can spot a path forward.

:+1: For reverting. The current implementation is incorrect and the behaviour unexpected.

I hope any module resolution decision is not based on a contrived circular dependency loop example. The previous undocumented symlink resolution mechanism had many problems of its own. Let's discuss real life examples.

@mcollina Care to elaborate on the downvote? Could you not reproduce the bug?

@kzc Circular dependencies are very real-world. And the above one is literally only the simplest one. I don't think it's going to be a pleasure to debug this in a 5-level deep nested node_modules directory.

The documented example for modules cycles works as expected:

https://nodejs.org/api/modules.html#modules_cycles

@alexanderGugel,

I don't think it's going to be a pleasure to debug this in a 5-level deep nested node_modules directory.

Under what circumstances would such a scenario arise, other than with an alternative package manager that does installation this way?

@dlongley When symlinking (private) dependencies for local development.

E.g. I'm working on an application that depends on 3 private packages, let's call them priv-a, priv-b, priv-c.

I decide to move some functionality from priv-a into priv-b and priv-c, but I want to try out the final application before I merge anything.

So I might do the following:

  1. Clone down priv-a, priv-b, priv-c
  2. Clone down the application
  3. Remove priv-a/my-company/priv-b, priv-b/my-company/priv-c etc.
  4. Symlink the individual private dependencies so that the latest, local version can be used within them.
  5. Start the app
  6. Wonder why require.cache is massive.

Point taken, it's not 5 levels deep. What I'm trying to say is that this becomes less trivial when locally working with private packages.

@alexanderGugel,

If those modules aren't designed to be used in a cyclical pattern already (e.g. delayed require), won't their behavior change by referencing them in a cyclical way (see: Module Cycles) if they are the same instance?

@dlongley No, assuming you don't override module.export (but use export instead), everything would work fine. That being said, this was rather meant to provide a "kind of" real-world example.

@alexanderGugel I agree with what jasnell has said repeatedly, this discussion should not be about reverting the patch, but instead about finding the best way forward. I believe that this patch corrects a defect in the node module system. Package managers, modules and developer dependency management strategies that relied on the defective behavior will need to be revised. Hopefully we can come up with a way to make that process as painless as possible. Furthermore, I think that a version release is the ideal time to make a change of this magnitude.

@alexanderGugel ... just wanting to draw out your scenario a bit more... Let's modify it a bit such that one of the circular dependencies has an additional peer dependency. In the example below, foo and bar are circular dependencies exactly like in your example. However, foo has a peer dependency on baz. bar depends on foo but also depends on version 2 of baz. myApp3 uses foo but uses baz version 1.

      /index.js
      /node_modules
                   /baz (@version = 1)
                   /foo
                       /node_modules
                                    /bar --> ../../bar
                   /bar
                       /node_modules
                                    /baz (@version = 2)
                                    /foo --> ../../foo

myApp3/index.js is:

console.log(require('foo'));

myApp3/node_modules/baz/index.js is

module.exports = 1;

`myApp3/node_modules/foo/index.js' is

module.exports.baz = require('baz');
module.exports.bar = require('bar');

`myApp3/node_modules/bar/index.js' is

module.exports = require('foo');

`myApp3/node_modules/bar/node_modules/baz/index.js' is

module.exports = 2;

Running myApp3/index.js using v5.11.0 yields:

bash-3.2$ node -v
v5.11.0
bash-3.2$ node index.js
{ baz: 1, bar: [Circular] }
bash-3.2$

Running myApp3/index.js using master yields:

bash-3.2$ ~/node/main/node/node -v
v7.0.0-pre
bash-3.2$ ~/node/main/node/node index.js
{ baz: 1, bar: { baz: 2, bar: { baz: 2, bar: [Object] } } }
bash-3.2$ 

Again, this would seem to be the expected behavior as far as the _peer dependency_ is concerned.

You're absolutely correct that the cyclic dependency is an issue here but the peer dependency issue is equally legitimate. It's certainly not a trivial problem.

@alexanderGugel,

  1. Wonder why require.cache is massive.

Would you actually wonder that -- or just not notice (or care) in that scenario? If your module is actually not just exposing an API, I would expect you'd need to plan for circular dependencies anyway.

@mattcollier

Package managers, modules and developer dependency management strategies that relied on the defective behavior will need to be revised.

The reason why I think this is not a reasonable request is that it provides very little benefit for 90 % of the use-cases, while fundamentally "breaking" (in the sense that it is backwards incompatible) a locked API. Which is not the end of the world, but what I'm worried about is the fact that this becomes extremely difficult to debug for the average user.

I think it is safe to assume that there are in fact a lot of modules out there that depend on the current behaviour. E.g. some people symlink files for one reason or another, not necessarily individual packages, but even files that have have some sort of state or that have side effects when being required. E.g. they might establish a data base connection.

But I feel like there are actually three different discussions going on here:

  1. A current bug in module.js that incorrectly resolves circular dependencies.
  2. A discussion about the correct behaviour when requiring symlinked packages.
  3. A request to revert the changed behavior.
  4. Various PRs about either reverting the commit or solving the issue in some other way

I think we should probably concentrate on 2.

@alexanderGugel,

Wonder why require.cache is massive.

Would you actually wonder that -- or just not notice (or care) in that scenario?

Other than through marginally increased memory usage, you probably wouldn't notice it at all I assume. What I'm worried about is that fact that you suddenly have three different "version" (as in module instances) of the same dependency, making it significantly harder to debug. Especially when there is state being stored in one way or another in them.

E.g. the typical scenario could be a file (I'm specifically talking about files here, not (external) dependencies) that establishes a database connection when being required.

E.g. Something like this:

const redis = require('node-redis')
const client = redis.createClient()
module.exports = client

@alexanderGugel,

E.g. the typical scenario could be a file (I'm specifically talking about files here, not (external) dependencies) that establishes a database connection when being required.

If you're sharing like that, then I think you should be using a peer dependency.

You also can't guarantee that some other module you load that you don't control won't come in and load that same dependency and do something to change its configuration. If you want to hook into some shared infrastructure/framework, you should use peer dependencies, that's what they are for.

@dlongley What about packages that have side effects then?

E.g. packages that decorate String.prototype to support colours, or override console.log - not saying that's a great idea, but those kind of cases are problematic here.

@alexanderGugel,

Are you saying they are problematic because they may repeat their code more than once -- if the requiring application has some symlinks setup in some non-optimal way? What are the specific problems?

@alexanderGugel,

Note that a package that has side effects should certainly check to see if those side effects have already been applied. Similar to a polyfill. What if a different version of the package (on disk) was already loaded? The same problem would arise.

@alexanderGugel ... I definitely agree that the circular dependency is problematic here. Just working through various scenarios to draw this out further. Let's take another modified version of the peer dependency case I describe in my last comment (https://github.com/nodejs/node/issues/3402#issuecomment-215177747)

Let's make baz and _optional_ peer dependency of foo. bar depends on foo and _requires_ baz. myApp4, depends on foo but does not require baz. We have a structure like:

      /index.js
      /node_modules
                   /foo
                       /node_modules
                                    /bar --> ../../bar
                   /bar
                       /node_modules
                                    /baz
                                    /foo --> ../../foo    

myApp4/node_modules/bar/index.js is (to simulate the requirement on baz being present)

module.exports = require('foo');
require('assert')(module.exports.baz);

myApp4/node_modules/foo/index.js is (to simulate baz being optional to foo)

try {
  module.exports.baz = require('baz');
} catch (err) {

}

module.exports.bar = require('bar');

Keeping in mind that bar was written with the specific assumption that baz would be present, running this example in v5.11.0 yields:

bash-3.2$ node -v
v5.11.0
bash-3.2$ node index

assert.js:90
  throw new assert.AssertionError({
  ^
AssertionError: undefined == true
    at Object.<anonymous> (/Users/james/tmp/symlink/myApp4/node_modules/bar/index.js:3:18)
    at Module._compile (module.js:413:34)
    at Object.Module._extensions..js (module.js:422:10)
    at Module.load (module.js:357:32)
    at Function.Module._load (module.js:314:12)
    at Module.require (module.js:367:17)
    at require (internal/module.js:20:19)
    at Object.<anonymous> (/Users/james/tmp/symlink/myApp4/node_modules/foo/index.js:8:22)
    at Module._compile (module.js:413:34)
    at Object.Module._extensions..js (module.js:422:10)
bash-3.2$ 

Running this example on master yields:

bash-3.2$ ~/node/main/node/node -v
v7.0.0-pre
bash-3.2$ ~/node/main/node/node index
{ bar: { baz: 2, bar: { baz: 2, bar: [Object] } } }
bash-3.2$ 

Which, again, is what (I think) one would expect.

Sorry @jasnell I just need to ask to revert this PR asap again.

Because I just found it not only affect performance due to cache, but also BREAK all three alternative package managers totally!

Just try install babel-cli in ied, pnpm, npminstall, and run babel-node xxx.js

Result:

pnpm and npminstall just hang until exaust 1.4GB memory limit. (Same as https://github.com/nodejs/node/issues/3402#issuecomment-215166795 case)

ied report error:

> @ test /Users/hax/repo/contributing/test-module-cache/test/node6-ied
> babel-node .

module.js:440
    throw err;
    ^

Error: Cannot find module 'core-js/library/fn/get-iterator'
    at Function.Module._resolveFilename (module.js:438:15)
    at Function.Module._load (module.js:386:25)
    at Module.require (module.js:466:17)
    at require (internal/module.js:20:19)
    at Object.<anonymous> (/Users/hax/repo/contributing/test-module-cache/test/node6-ied/node_modules/5f08788603a474f19afc575a8394071afd5fbe0c/node_modules/babel-core/node_modules/babel-types/node_modules/babel-traverse/node_modules/babel-types/node_modules/babel-traverse/node_modules/babel-types/node_modules/babel-traverse/node_modules/babel-types/node_modules/babel-traverse/node_modules/babel-types/node_modules/babel-traverse/node_modules/babel-types/node_modules/babel-traverse/node_modules/babel-types/node_modules/babel-traverse/node_modules/babel-types/node_modules/babel-traverse/node_modules/babel-types/node_modules/babel-traverse/node_modules/babel-types/node_modules/babel-traverse/node_modules/babel-typenpm ERR! Test failed.  See above for more details.

(Note the last longlong line of the error info...)

@alexanderGugel what comment are you talking about?

@mcollina .. I _believe_ that was a misedit... I think the comment was meant for @mattcollier

@hax Node 6.x is still fresh, there's no one depending on that specific version already. So we have time to talk about this and package managers have time to find ways to support it.

It's also important to keep in mind that reverting the commit requires a PR which would require some review time before it could be landed, which would then sit for a bit while a new release was put together. That's at least a few days process and the earliest we'd put out a new release is likely next monday or tuesday. Yes, we know that some folks would like the commit reverted but please understand that simply reverting the commit today doesn't actually solve the issues. We can afford to take a bit of time to review the issue further and see if there's a better path forward. Reverting is always a possible course _if_ we can't find a reasonable solution to _both_ problems.

@mcollina
@jasnell Yes, sorry about that.

Node 6.0.0 module resolution solves a number of historical problems and makes module resolution much more logical and predictable. Please don't assume node is necessarily the code base that must be altered due to user code reliance on poorly specified behavior. Take the process.exit/stdout output truncation case - evidently it was decided that user modules must change.

Guys... can't we just agree that the new behavior is more natural and expected by most devs at this point? Sorry, @alexanderGugel, I know you disagree here, but maybe you're being dishonest to yourself about this because you're being blinded by your wish to revert to the old behavior. It's not meant offinsive, I really respect your opinion, but I see the possibility of it being the case.

You've mentioned a lot of use cases that are not really a problem with the new behavior, as @jasnell already could point out. I also think that your case with the dependency circle is a non-issue. There's no way you would ever encounter such a problem using a properly working package manager. And if you do it by hand, you won't do it for so many packages, that it will be hard to debug.

I mean, if we argue like this, we could also state that we should ban loops and recursive functions from JavaScript because it allows for endless loops. But if you do it right, this won't happen.

How do other languages treat symlinks?
I thought it might be interesting to see how other languages treat symlinks and thus did some research:

C/C++: pragma once does load the file multiple times when it's included through different symlinks
Python: import does load a module multiple times when it's imported through different symlinks
PHP: include does load a file multiple times when it's included through different symlinks

For me, this makes even more clear that this is expected and wanted that way by developers. I know, I've said it multiple times already, but since it's important I say it again: You can always get the real path from the virtual path. But not the other way around. This means we can always work around the new behavior (pretty or not), but it's almost impossible to work around the old behavior.

Also, I've learnt in programming that it's always best to design the code to support as many use cases as possible, so that more specific things can be built on top of it. I think this is the case for the new behavior.

With this said, I strongly recommend to focus on searching ways to fix the other package managers now. I've already shown a possible way of doing it. Maybe there are better ways. Let's discuss!

while the effort that went into this discussion is great, we're heavily using symlinks for all kinds of stuff. I just tried running our app with Node 6.0.0, but it failed for the reasons from above (private packages, etc). I'm really glad we found this issue, because those kind of changes are really hard to track down.

While certainly purer, the current behavior is documented and we've built a on it expecting it would be there. is there any chance it can be kept?

I want to point out the following comment that was made by @bnoordhuis at the beginning of this conversation:

I mean, if we argue like this, we could also state that we should ban loops and recursive functions from JavaScript because it allows for endless loops. But if you do it right, this won't happen.

Reductio ad absurdum. That being said, it seems like this does in fact introduce at least one bug into the current require mechanism.

Sorry, @alexanderGugel, I know you disagree here, but maybe you're being dishonest to yourself about this because you're being blinded by your wish to revert to the old behavior. It's not meant offinsive, I really respect your opinion, but I see the possibility of it being the case.

I am pretty sure I am not. Resolving to the "real" target paths is the only way how you can guarantee the uniqueness of required packages without doing any of the npm3 "flatten tree" magic (see pnpm). That being said, I'm more than happy to be convinced of the opposite.

The litmus test is this: is there a non-zero chance the proposed change is going to break someone's existing application? If the answer is 'yes' (and I think it is), then it should be rejected.

I think after this discussion it is safe to assume that this _does_ in fact break application code.

With this said, I strongly recommend to focus on searching ways to fix the other package managers now. I've already shown a possible way of doing it. Maybe there are better ways. Let's discuss!

I apologise if I missed that, but what path did you propose for those package managers?

@alexanderGugel No problem. That happens fast in a big discussion like this. Here's the comment

Resolving to the "real" target paths is the only way how you can guarantee the uniqueness of required packages without doing any of the npm3 "flatten tree" magic (see pnpm).

We don't need to uniquely identify files by their absolute path. We already have a system to uniquely identify them: the package-name + the file's relative path in that package. That's all that's needed. You don't even have to use the flatten tree magic, you could do it the way i proposed. And there are potentially other ways.

Reductio ad absurdum. That being said, it seems like this does in fact introduce at least one bug into the current require mechanism.

That's not a bug, it's just the way you've programmed it. It's the same as calling while(true) console.log('hello world') a bug. You could do the same with PHP as well by the way. Is it a bug? I don't think so.

The litmus test is this: is there a non-zero chance the proposed change is going to break someone's existing application? If the answer is 'yes' (and I think it is), then it should be rejected.

I think after this discussion it is safe to assume that this does in fact break application code.

Good point. Did @bnoordhuis approve merging this change?

@alexanderGugel No problem. That happens fast in a big discussion like this. Here's the comment

This does not work, since the directory into which the package has been downloaded needs to be shared by multiple dependencies.

This does not work, since the directory into which the package has been downloaded needs to be shared by multiple dependencies.

Can you maybe further explain this?

So - for example - you can no longer require specific files from a package, e.g. require('my-package/some-file.js') won't work. Most popular example might be rxjs.

@alexanderGugel,

The litmus test is this: is there a non-zero chance the proposed change is going to break someone's existing application? If the answer is 'yes' (and I think it is), then it should be rejected.

I think after this discussion it is safe to assume that this does in fact break application code.

That's the litmus test for whether or not the change should be part of a semver-patch bump as a bug fix. Instead, the change went into a semver-major bump, which is appropriate.

@alexanderGugel -- can you think of a way to modify alternative package managers so that they'll work with node 6.x?

I'm not ready to call it for either side of this discussion just yet as I firmly believe there is a path forward that addresses both concerns. As I said, we can afford a bit of time to explore the issue further.

Drawing out the thought exercise even further (bear with me...). Let's assume that rather than using _either_ the realpath _or_ the symlink as the cachekey, we assume some function of a tuple of both... let's call it f(symlink,realpath). Let's walk through it.

Assuming the following structure using a circular path:

      /index.js
      /node_modules
                   /baz (@version=1)
                   /foo
                       /node_modules
                                    /bar --> ../../bar
                   /bar
                       /node_modules
                                    /baz (@version=2)
                                    /foo --> ../../foo

(The tuple is f(symlink , realpath))

Instance 1 of foo =

f(myApp4/node_modules/foo, myApp4/node_modules/foo)

In this case Instance 1 of foo will get baz (@version=1)

Specifically, when resolving foo relative to myApp, it is determined that there is currently no instance of foo relative to myApp, so an instance of foo is created.

Instance 1 of bar =

(myApp4/node_modules/bar, myApp4/node_modules/bar)

Specifically, when resolving bar relative to myApp, it is determined that there is currently no instanceof bar relative to myApp, so an instance of bar is created.

Instance 1 of bar, Instance 2 of foo =

(myApp4/node_modules/bar/node_modules/foo, myApp4/node_modules/foo)

Instance 2 of foo would see baz (@version=2)

Specifically, when resolving foo relative to bar relative to myApp, it is determined that an existing instance of bar exists relative to myApp so it is used, however, an instance of foo relative to bar relative to myApp does not exist, so an instance of foo is created.

Instance 1 of foo, Instance 2 of bar =

(myApp4/node_modules/foo/node_modules/bar, myApp4/node_modules/bar)

Specifically, when resolving bar relative to foo relative to myApp, it is determined that an existing instance of foo exists relative to myApp so it is used, however, an instance of bar relative to foo relative to myApp does not exist, so an instance of bar is created.

Instance 1 of foo, Instance 2 of bar, Instance 1 of foo =

(myApp4/node_modules/foo/node_modules/bar/node_modules/foo, myApp4/node_modules/foo)

Here, when resolving foo relative to bar relative to foo relative to myApp, it is determined that an existing instance of foo exists relative to myApp so it is used. It is determined that an existing instance of bar exists relative to foo relative to myApp, so it is also used. It is then determined that an existing instance of foo relative to bar relative to foo exists as an ancestor to bar, so it is used. A new instance of foo is _not_ created.

Instance 1 of bar, Instance 2 of foo, Instance 1 of bar =

(myApp4/node_modules/bar/node_modules/foo/node_modules/bar, myApp4/node_modules/bar)

Here, when resolving bar relative to foo relative to bar relative to myApp, it is determined that an existing instance of bar exists relative to myApp so it is used. It is determined that an existing instance of foo exists relative to bar relative to myApp, so it is also used. It is then determined that an existing instance of bar relative to foo relative to bar exists as an ancestor to foo, so it is used. A new instance of bar is _not_ created.

This can continue out... note that by checking the ancestor, we can determine if a new instance is required or not.

Instance 1 of bar, Instance 2 of foo, Instance 1 of bar, Instance 2 of foo

(myApp4/node_modules/bar/node_modules/foo/node_modules/bar/node_modules/foo, myApp4/node_modules/foo)

Instance 1 of foo, Instance 2 of bar, Instance 1 of foo, Instance 2 of bar

(myApp4/node_modules/foo/node_modules/bar/node_modules/foo/node_modules/bar, myApp4/node_modules/bar)

Going through this, it appears obvious to me that there's an obvious opportunity to switch up the way the cache key is being computed to ensure that the circular path is dealt with _and_ the peer dependencies are handled appropriately.

Let's make it a bit more extreme... and add a zzz module somewhere in the mix:

      /index.js
      /node_modules
                   /baz (@version=1)
                   /foo
                       /node_modules
                                    /bar --> ../../bar
                   /bar
                       /node_modules
                                    /baz (@version=2)
                                    /foo --> ../../foo
                                    /zzz
                                        /node_modules
                                                     /baz (@version=3)
                                                     /foo --> ../../../../foo
(myApp/node_modules/bar/node_modules/zzz/node_modules/foo, myApp/node_modules/foo)

When resolving foo relative to zzz relative to bar relative to myApp, there is no instance of foo in the direct ancestry. There's an instance of foo relative to bar relative to myApp, but there is no instance of foo relative to zzz relative to bar relative to myApp, so a new instance of foo is created.

Following this approach we would need to split the cache in two. One cache that caches the actual code pulled from the realpath location, and a secondary cache that contains the actual instances of the objects generated by that code. This is so we don't have to go hunting for that code again, we already have a copy in memory that we execute again when a new instance of the module is required.

The tl/dr version is: for every module, we locate it on disk and cache the actual code, we then determine if a new instance of that module is required or not based on whether or not there is an existing instance in the direct require ancestry.

Now, _obviously_ this kind of approach would have a performance impact so we'd have to test the heck out of it to verify that it not only met the expectations but actually performed decently enough to use... and I'm quite certain that I've missed some nuanced detail in here somewhere.

(I seriously hope at least some of that made sense ;-) ...)

(written before @jasnell )

No, I think it is impossible to fully support it. I'm not too worried about the specific strategies of pnpm, ied or npm install, but what I'm kind of afraid of is that this basically means that there is no other way other than either super long path names that break under Windows or a flat dependency graph that takes forever to compute to support a sensible installation mechanism that serves identical package versions to different dependencies.

The old behaviour opened the door to alternatives that relied on symlinks. The new algorithm is limiting in that sense. That being said I fully understand and appreciate the reasoning behind this change, but I'm disappointed that the decision was made to break existing packages rather than finding a solution that adds the desired functionality in a backwards compatible manner.

(WHOA! I got a thumbs up from @alexanderGugel !! ;-) Maybe these github reactions aren't too bad afterall!)

I don't agree "the new behavior is more natural and expected by most devs".

I will try my best to explain my opinion. (pls forgive my poor english)

Symbol links is NOT equals to copy.

From the start, symbol links naturally have two different aspect, follow the link OR not follow the link. For example, what's the parent directory of a symbol link? It depends on whether you follow or not follow. If not follow, you just treat it like copy. You may think it's more natural. But there are some IMPORTANT cases you must follow. Eg. executing script. That's why there is a special case even in this new behavior -- main script.

I could argue that most devs familiar with the symbol links of main script so why they should change the mind in other place? Such inconsistent is painful.

But I'd like to clarify that I never against this issue, and I also meet the npm link and peerDeps problem (that's why I watch this issue.)

The problem here is we should not ship such changes without carefully thought and testing. In my memory, node core is conservative. For example, it cost several years to discuss whether to add promise-based api though adding api will not break any existing codes. On the other side, this case are too hurry and break all alternative package managers which is valuable part of Node ecosystem. I can not understand those actions are based on coherent rules.

It's harmful and unfair to block the users of alter package managers to upgrade to 6.0. Most alt package users are enterprise users and use it in CI env for performance. They give many useful feedback. For example as my knowledge Alibaba which is one of the biggest node heavy player use tnpm which rely on npminstall. So current situation make Alibaba node devs can not try 6.0 and give the feedbacks.

That's why I suggest this problem should be resolved asap.

Of course I hope we can find a solution satisfy both requirements. But as my experiences in module resolutions I suspect we can achieve this goal very quickly. I hate revert too, but sometimes it's a best choice.

Thank you for hear my voice.

@hax ... we are definitely working to resolve the problem as quickly as possible but it's not something that can be fixed _today_ as it would likely still take several days (up to a week) before a new release could be made. Right now we're discussing options for fixing and reverting the most recent change is _definitely_ one of the options. What I'm trying to do before we decide is take just a bit of time to figure out what the additional options are.

It's about time for me to get the kids from school now so I'm going to drop off for a bit. Will be back on later to continue the discussion / exploration of this.

@jasnell The problem with your approach is that you still get different instances of the module, when using different symlinks, right? If I understand the problem of other package managers correctly, they want to store all packages in one single folder, and symlink them together, so that the hierarchy is completely flat.

@alexanderGugel Thanks, I didn't think about that. But is still think there are ways to do it, but they may be more complex. For example, package managers could add the following to the beginning of every .js file:
var __filenameReal = require("fs").realpathSync(__filename); if(__filenameReal != __filename) return require(__filenameReal);

But that would also just be solving the tip of the iceberg. There are a lot of edge cases, like files with shebangs, files with other endings like coffescript, or files that never get required at all. The package manager would have to know which files are directly required from other packages, which would need a static analysis, like for example browserify does, which would maybe be even worse performance-wise.

But, as I said in the 2nd part of this comment, we could also just conclude that our problems with peer dependencies originate from NPM and not from require and that maybe NPM needs to fix this instead, so that we can revert to the previous behavior.

But then again, it just feels wrong to use the absolute path to uniquely identify a module. Maybe, we could also make it configurable through a command-line option. That might not be the worst of all options we have.

I'm using pnpm and :heart: it as much as the next person.

I still think this new approach of resolving dependecies based on the symlinked path is the preferred behaviour. Anything else is just super confusing (had a really hard time debugging this the first time I encountered it).

I also think that requiring from two different paths -> two instances is preferred behaviour. Having the behaviour change depending on if it's a symlinked file or not is much more confusing.

Both of the old behaviours require implicit knowledge of the inner workings. The new approach provides the better abstraction, less implicit knowledge and less cognitive overhead.

Keep it :+1:

Not quite. With the approach I suggest there are two caches. One that
caches the code based on realpath, and a second that caches instances
relative to ancester context. A module requiring 'foo' would get either an
existing or new instance of foo based on whether or not an instance already
exists in its ancestry. If the symlinks point to the same location, the
code would come from that same location.
On Apr 27, 2016 3:03 PM, "Patrik Stutz" [email protected] wrote:

@jasnell https://github.com/jasnell The problem with your approach is
that you still get different instances of the module, when using different
symlinks, right? If I understand the problem of other package managers
correctly, they want to store all packages in one single folder, and
symlink them together, so that the hierarchy is completely flat.

@alexanderGugel https://github.com/alexanderGugel Thanks, I didn't
think about that. But is still think there are ways to do it, but they may
be more complex. For example, package managers could add the following to
the beginning of every .js file:
var __filenameReal = require("fs").realpathSync(__filename);
if(__filenameReal != __filename) return require(__filenameReal);

But that would also just be solving the tip of the iceberg. There are a
lot of edge cases, like files with shebangs, files with other endings like
coffescript, or files that never get required at all. The package manager
would have to know which files are directly required from other packages,
which would need a static analysis, like for example browserify does, which
would maybe be even worse performance-wise.

But, as I said in the 2nd part of this
https://github.com/nodejs/node/pull/5950#issuecomment-215109810
comment, we could also just conclude that our problems with peer
dependencies originate from NPM and not from require and that maybe NPM
needs to fix this instead, so that we can revert to the previous behavior.

But then again, it just feels wrong to use the absolute path to uniquely
identify a module. Maybe, we could also make it configurable through a
command-line option. That might not be the worst of all options we have.

โ€”
You are receiving this because you were mentioned.
Reply to this email directly or view it on GitHub
https://github.com/nodejs/node/issues/3402#issuecomment-215244154

@jasnell if this that either or perhaps maybe with multiple caches and a symlink. real clear, easy debugz :scream:

Well I might have another option:

What if we make require's behavior configurable through an additional file, for example ".require_realpath".

Example:
We do require('foo/lib/bar.js');

Now, when require finds the file and is about to load it, it checks if 'bar/.require_realpath' exists. If it does not, the file is required the new way. If it exists, it is loaded the old way.

This would be very easy to implement for package managers and would need very little changes for node.js.

But this sounds too good to be true, what are the drawbacks of this?

FWIW, after having skimmed this thread, resolving using symlinks will be important when using es6 modules. Since the path cannot be resolved at runtime, and when you want to choose between a Debug or Release build for a native module. That's really far ahead, but just throwing it out there.

@trevnorris Can you elaborate on that? Or can you give an example? I'm not sure I understand how ES6 modules would affect this.

@VanCoding I was just about to suggest something similar to your proposal with dot files except at directory granularity. Please correct me if it is the same proposal as yours. If a dot file .node_legacy exists in a directory then any symlinked file in that directory will be followed as node 5.x and earlier versions did. If it's not there then the new non-symlink-following module resolution heuristics would be used. The main script would still follow legacy symlink resolution rules regardless.

I'd also ask for a new node command line flag or environment variable to override the module resolution behavior to be all legacy or the all new way and the dot files would be ignored altogether.

@kzc yes, that's pretty much the same as my proposal. With the only exception that require only looks up the '.node_legacy' file when the path starts with a package name. Absolute and relative paths are always required the new way.

But I'm not sure what would be better. We'd have to elaborate the use cases...

Also, I'm +1 for the additional command line flag.

Also, I'm +1 for the additional command line flag.

An environment variable setting of some sort has the advantage of letting you run executable shebang JS scripts using the node binary already in your path. But at this point, I'd accept either way or both.

I think an environment variable would make more sense, but it might also confuse people even more....

Why I think an environment variable makes more sense: The primary use case for the new require strategy is when you symlink packages for local development. So I would imagine that that's something you only want to do temporarily.... then you should fall back to what is actually installed...

@alexanderGugel One could also argue that the primary use case for the old behavior is to allow package managers to do cache optimization, and for everything else, the new system can be used,

Then, the benefit of having dotfiles would be that you don't have to mess with environment variables or command line arguments because the package manager does it for you.

But honestly, I like the direction this is going. Make it configurable and everyone is happy either way.

@trevnorris ... it's important to point out that the new behavior _still allows_ resolving via symlinks. The difference, however, is that the module instances are cached based on the symlink path rather than the realpath, which means multiple instances of the same code will be loaded based on the symlink.

@alexanderGugel and @VanCoding ... let's walk through the dot file approach a bit and see where things land (let's use .module_legacy for this hypothetical file)

Scenario 1: No .module_legacy

     /index.js
     /node_modules
                  /baz (@version=1)
                  /foo --> ../../foo
                  /bar
                      /node_modules
                                   /baz (@version=2)
                                   /foo --> ../../../../foo
modules
      /foo
          /node_modules
                       /bar --> ../../../myApp/node_modules/bar

In this case, there are no .module_legacy files anywhere. The new semantics apply. Each symlinked foo is it's own instance and we end up with the circular dependency huge cache problem. However, the baz peer dependency is correctly resolved by each separate symlinked instance of foo.

Scenario 2: myApp/.module_legacy

     /.module_legacy
     /index.js
     /node_modules
                  /baz (@version=1)
                  /foo --> ../../foo
                  /bar
                      /node_modules
                                   /baz (@version=2)
                                   /foo --> ../../../../foo
modules
      /foo
          /node_modules
                       /bar --> ../../../myApp/node_modules/bar

In this case, the .module_legacy file exists at the top level of myApp. In this instance, the old rules would apply in all cases. You'd have a single instance of foo loaded loaded avoiding the circular dependency problem but Node would fail to locate the baz peer dependency because it is not in a path relative to the realpath of foo.

Scenario 3: modules/foo/.module_legacy

     /index.js
     /node_modules
                  /baz (@version=1)
                  /foo --> ../../foo
                  /bar
                      /node_modules
                                   /baz (@version=2)
                                   /foo --> ../../../../foo
modules
      /foo
          /.module_legacy
          /node_modules
                       /bar --> ../../../myApp/node_modules/bar

In this case the .module_legacy file appears only within foo. What we would end up here is a mix of new and old behaviors. The module myApp/node_modules/bar would be loaded under the new rules, while the module/foo would be loaded under the old. It's a bit more difficult to reason about what it happening here because of the circular dependency. For instance, if I do require('bar') from inside myApp/index.js would I get the same instance of bar that I would get if I do require('bar') from inside module/foo/index.js?

Also, which version of baz is foo going to get when it does require('baz').

Scenario 4: myApp/node_modules/bar/.module_legacy

     /index.js
     /node_modules
                  /baz (@version=1)
                  /foo --> ../../foo
                  /bar
                      /.module_legacy
                      /node_modules
                                   /baz (@version=2)
                                   /foo --> ../../../../foo
modules
      /foo
          /node_modules
                       /bar --> ../../../myApp/node_modules/bar

This is similar to Scenario 3 except the .module_legacy file is moved to bar. The same issues apply here as Scenario 3.

Scenario 5: myApp/node_modules/bar/.module_legacy and `modules/foo/.module_legacy

     /index.js
     /node_modules
                  /baz (@version=1)
                  /foo --> ../../foo
                  /bar
                      /.module_legacy
                      /node_modules
                                   /baz (@version=2)
                                   /foo --> ../../../../foo
modules
      /foo
          /.module_legacy
          /node_modules
                       /bar --> ../../../myApp/node_modules/bar

In this scenario, both foo and bar have their own .module_legacy files. The app itself uses the new rules. Suppose my myApp/index.js does the following:

const bar = require('bar');
const foo = require('foo');

A single instance of foo should be returned per the old rules, but it will fail to find the peer dependency baz. This ends up suffering the same problem as Scenario 2.

Looking at this, allowing .module_legacy at arbitrary levels gets complicated quickly, and makes it quite difficult to reason about what is happening. Having the .module_legacy file at the top level (or not at all) makes it much easier -- either we're using the new behavior or we're not. A command line flag or environment variable would do the same, but each come with their own cost.

Given this, one possible approach we can take in the short term would be to add a command line flag that, when set, reverts module loading to the prior behavior such that, if a developer does encounter issues with circular dependencies in their dependency graph, they could switch on the flag to revert. I personally think this is a much better approach than using the dot files.

@jasnell

I'm not a big fan of the environment variable / legacy file approach either.

It seems like we're currently talking about two objectives here:

  1. not breaking application code
  2. allowing package managers and users that depend on the old semantics to require the resolved path in one way or another

Assuming we're ok with breaking application code (seems like we are pretty ok with it), I want to propose a slightly more radical approach:

Instead of having a single node_modules directory, either have an additional shared_node_modules directory from which symlinks will be resolved according to the old algorithm or somehow indicate that symlinks should be resolved according to the old algorithm using their name.

Approach 1: shared_node_modules

- project
  - node_modules
    - a -> ../some/where/else (might not even be a symlink)
  - shared_node_modules
    - a -> ../some/where

Since a is in shared_node_modules, it will be required using the old algorithm.
If node_modules and shared_node_modules are present, shared_node_modules takes precedence.

Approach 2: Prefix

- project
  - node_modules
    - a -> ../some/_where

_where is prefixed with _, which indicates that _where is _internal_, meaning it should be resolved according to the old behavior.

Approach 3: Prefix on linkname (probably prettier)

- project
  - node_modules
    - _a -> ../some/where

_a is prefixed with _, which indicates that _a is _internal_, meaning it should be resolved according to the old behavior.

Notice that users would still require _a using require('a').


The opposite would also work: We could...

  1. revert to the old behavior
  2. resolve symlinks in extra_node_modules according to the new algorithm.

That way it would be a backwards compatible change, both cases would be supported and it's easier to differentiate between the two strategies. It's also fairly easy to change.

Thoughts?


Thinking more about this I really think approach 3 would be a good compromise. That way we could support both cases, while still allowing package managers (of any kind really) and users that require a more advanced workflow that involves a lot of symlinks (e.g. if you work with a lot of private packages at the same time) to use the original require mechanism.

Would be more than happy to make this change if everyone is ok with it.


Not that this wouldn't conflict with npm package names either: https://github.com/npm/validate-npm-package-name#naming-rules (_ prefix).

cc @rstacruz

@jasnell In the dot file scheme I envisioned the dot file would literally be in the same directory where the symlink lives. If it is present the symlink is followed as in the previous module resolution algorithm, if not then the new resolution scheme would apply. A dot file in a parent directory would not apply to its child directories. Each directory would be independently evaluated as to the resolution scheme used.

Approach 3: Prefix on linkname (probably prettier)

@alexanderGugel this approach look great and easy to follow symlink realpath or not.

@kzc ...thank you for the clarification :-) ... let's walk through that a bit. Hopefully I understand your intent correctly...

Scenario 1:

     /index.js
     /node_modules
                  /.module_legacy
                  /baz (@version=1)
                  /foo --> ../../foo
                  /bar
                      /node_modules
                                   /baz (@version=2)
                                   /foo --> ../../../../foo
modules
      /foo
          /node_modules
                       /bar --> ../../../myApp/node_modules/bar

This would have the result of creating multiple instances of foo. The first (at symlink myApp/node_modules/foo) would fail to find it's peer dependency baz (@version=1), while the second find it's peer dependency baz (@version=2). Multiple instances of bar would be created,

Our cache would look something like...

{
  "modules/foo": { },
  "modules/foo/node_modules/bar": { },
  "modules/foo/node_modules/bar/node_modules/foo": { },
  "modules/foo/node_modules/bar/node_modules/baz": { },
  "modules/foo/node_modules/bar/node_modules/foo/node_modules/bar": { },
  ...
  "myApp/node_modules/bar": { },
  "myApp/node_modules/bar/node_modules/foo": { },
  "myApp/node_modules/bar/node_modules/baz": { },
  "myApp/node_modules/bar/node_modules/foo/node_modules/bar": { },
  "myApp/node_modules/bar/node_modules/foo/node_modules/bar/node_modules/foo": { },
  "myApp/node_modules/bar/node_modules/foo/node_modules/bar/node_modules/baz": { },
  ...
}

In other words, all we've done here is _combined_ both problems we're trying to solve :-)

Scenario 2:

     /index.js
     /node_modules
                  /baz (@version=1)
                  /foo --> ../../foo
                  /bar
                      /node_modules
                                   /.module_legacy
                                   /baz (@version=2)
                                   /foo --> ../../../../foo
modules
      /foo
          /node_modules
                       /bar --> ../../../myApp/node_modules/bar

Again, two instances of foo would be created. One would find it's peer dependency, the other would not.
Two instances of bar would be loaded.

Our cache would look something like:

{
  "myApp/node_modules/baz": { },
  "myApp/node_modules/foo": { },
  "myApp/node_modules/foo/node_modules/bar": { },
  "modules/foo": { },
  "modules/foo/node_modules/bar": { },
}

Scenario 3:

     /index.js
     /node_modules
                  /.module_legacy
                  /baz (@version=1)
                  /foo --> ../../foo
                  /bar
                      /node_modules
                                   /.module_legacy
                                   /baz (@version=2)
                                   /foo --> ../../../../foo
modules
      /foo
          /node_modules
                       /bar --> ../../../myApp/node_modules/bar

Only a single version of foo would be loaded, which would fail to find it's peer dependency. Two instances of bar would be loaded.

Our cache would look something like:

{
  "modules/foo": { },
  "modules/foo/node_modules/bar": { },
  "myApp/node_modules/bar": { }
}

Scenario 4:

     /index.js
     /node_modules
                  /.module_legacy
                  /baz (@version=1)
                  /foo --> ../../foo
                  /bar
                      /node_modules
                                   /.module_legacy
                                   /baz (@version=2)
                                   /foo --> ../../../../foo
modules
      /foo
          /node_modules
                       /.module_legacy
                       /bar --> ../../../myApp/node_modules/bar

Once instance of foo would be created, that would fail to find it's peer dependency. One instance of bar would be created.

Our cache would look something like:

{
  "modules/foo": { },
  "myApp/node_modules/bar": { }
}

Scenario 5:

     /index.js
     /node_modules
                  /.module_legacy
                  /baz (@version=1)
                  /foo --> ../../foo
                  /bar
                      /node_modules
                                   /baz (@version=2)
                                   /foo --> ../../../../foo
modules
      /foo
          /node_modules
                       /.module_legacy
                       /bar --> ../../../myApp/node_modules/bar

This is an interesting one... one instance of bar would be created. I _think_ two instances of foo would be created that would each point to the single bar, which points to only one of the foo instances.

Our cache would look something like:

{
  "modules/foo": { },
  "myApp/node_modules/bar": { },
  "myApp/node_modules/bar/node_modules/baz": { },
  "myApp/node_modules/bar/node_modules/foo": { }
}

Scenario 6:

     /index.js
     /node_modules
                  /baz (@version=1)
                  /foo --> ../../foo
                  /bar
                      /node_modules
                                   /baz (@version=2)
                                   /foo --> ../../../../foo
modules
      /foo
          /node_modules
                       /.module_legacy
                       /bar --> ../../../myApp/node_modules/bar

One instance of bar would be created. Two foo instances that point to bar. The bar would point to one of the instances of foo.

Our cache would look something like:

{
   "myApp/node_modules/baz": { },
   "myApp/node_modules/foo": { },
   "myApp/node_modules/bar": { },
   "myApp/node_modules/bar/node_modules/baz": { },
   "myApp/node_modules/bar/node_modules/foo": { },
}

@alexanderGugel ... at this point, walking through the various exercises on this, It appears that any scenario where we have _both_ the old and new approaches being used within a single process only makes the problem worse in that we end up with the downsides of both approaches more than it actually solves anything. We need a single behavior, not multiple. It needs to be either (a) Revert to the old behavior, (b) Stick with the new behavior or (c) Find a behavior that works for both.

@jasnell I think @alexanderGugel's Approach 3 can fix both issues. No need to revert and keep symlink path has a way to use the realpath for require.

In other words, all we've done here is combined both problems we're trying to solve :-)

@jasnell The old symlink scheme never found the correct peer dependencies. So that legacy "behavior" would be preserved. I'm not suggesting it's an improvement. Hence the push for the newer resolution algorithm. The only way the legacy symlink behavior worked in my experience was in a flat node_modules scenario where every module was at the same level relative to each other.

We need a single behavior, not multiple. It needs to be either (a) Revert to the old behavior, (b) Stick with the new behavior or (c) Find a behavior that works for both.

  • @jasnell

We already have both scenarios in there, which is perfectly fine. On isMain, we resolve to the real path, otherwise we use the linkname. Those are exactly the two cases we need to cover.

@jasnell I think @alexanderGugel's Approach 3 can fix both issues. No need to revert and keep symlink path has a way to use the realpath for require.

  • @fengmk2

The suggested change is actually pretty trivial! Instead of using the real path when isMain, we use it if the name filename start with _ (assuming we want to keep things backwards compatible, otherwise it would be the exact other way around: Use the real path only if the name start with _).

@jasnell The old symlink scheme never found the correct peer dependencies. So that legacy "behavior" would be preserved. I'm not suggesting it's an improvement. Hence the push for the newer resolution algorithm. The only way the legacy symlink behavior worked in my experience was in a flat node_modules scenario where every module was at the same level relative to each other.

That's why I'm ๐Ÿ‘Ž on using a the filename + real path as a key for the module cache. It actually combines both issues and that's the solution that introduces an additional strategy (legacy file).

Hmm... I'm going to remain skeptical until I see a working example ;-) Running through all of the scenarios I'm not seeing a consistent path forward with an implementation that uses both semantics. I'm happy to be proven wrong tho! :-)

@jasnell PR is on its way :)

@alexanderGugel
You mean _ detection is reversed for main script? It seems weird...
Maybe we could make it consistent by only use new behavior if start with _ , and change npm link behavior to always prefix _.

@hax No, there is no prefix detection in there (at least currently not). The "second" strategy I was mentioning referred to the fact that require is currently already using the resolved real path for the first (=main) modules, which is indicated via isMain in module.js.

I'm actually gonna have to go to bed now. I didn't completely get it to work yet, but it should be fairly straightforward (something along those lines).

+1 to continuing to explore the prefix option in @alexanderGugel's gist.

The common point of both the prefix and dot file schemes is to add one extra bit of information to a symlink to know whether to realpath it or not. It would be nice keep the symlink name and encode that bit in another way. On Mac it is possible to change the permission bits on the symlink itself which otherwise are not used but I don't think it's true of other OSes. Are there any other unused bits related to symlinks at our disposal that would work on all platforms? Just throwing the idea out there.

I see more than a couple of JS files with _ prefixes in node_modules for my projects. lodash is one. And one instance of a directory beginning with _ - node_modules/fbjs/lib/__mocks__

Using the lesser used @ may be a better alternative for a prefix.

I don't see adding a prefix as an option. The reason for this is that a lot of modules assume that they can reach their direct dependencies folders under ./node_modules/x

So, if someone has an express app like this:

app.use(express.static("/public/bootstrap","./node_modules/bootstrap/dist/"))

This would no longer work. It would need to require.resolve the module's location first and then path.resolve relative from there. I assume that would affect a lot of modules.

Basically, we're talking about 2 different approaches here:

A) adding the information to the symlink
B) adding the information to the module

What @kzc and @alexanderGugel are proposing falls under approach A. The proplem with A is that it is really complicated, and hard to reason about.

Let's assume the following scenario:

/
    a/ -> symlink to ./b, configured to use real path
    b/
        index.js -> symlink to ../b.js, configured to use virtual path
    b.js -> console.log(__filename)
    index.js -> require("./a/index.js");

question: what would the output?
/a/index.js?
/b/index.js?
/b.js?

The problem is, that there can be multiple symlinks in a path, and we would have to walk up the whole path and check if it's a symlink and how it is configured and decide on which configuration has precedence over the other. Not a trivial task.

That's why I proposed doing it with approach B and configure it where the path points to: in the module. That means, if we require a module through different symlinks, we always get the same configuration. But it also means that we can't configure it differently for each symlink.The question is: does it matter? Maybe not.

Let's look at the 3 real world scenarios:

NPM

app/
    node_modules/
        a/
            index.js -> require("b");
            node_modules/
                b/
                     index.js -> require("c"); //peer dependency
        c/
            index.js
    index.js -> require("a"); require("c");

In this scenario, everything works fine in in both the new and the old way, since no symlinks were used.

NPM developing a

app/
    node_modules/
        a/ -> symlink to ../../a_dev
        c/
            index.js
    index.js -> require("a"); require("c");
a_dev/
    index.js -> require("b");
    node_modules/
        b/
            index.js -> require("c"); //peer dependency

In this scenario, everything works fine in in the new way, but did not work in the old way, because b/index.js could not find 'c'.

other package manager

app/
    node_modules/
        a/ -> symlink to ../../a
        c/ -> symlink to ../../c
     index.js -> require("a"); require("c");
a/
    node_modules/
        b/ -> symlink to ../../b
     index.js -> require("b");
     .modules_legacy
b/
    node_modules
        c/ -> symlink to ../..c
    index.js -> require("c"); //peer dependency
    .modules_legacy
c/
    index.js
    .modules_legacy

This had worked in the old way, but does not currently work in the new way. But using .modules_legacy files, we could fix it.

It's important to note, that the .modules_legacy file is nothing a developer would add manually to his module. In fact, a developer shouldn't maybe even know that it exists. The proposed feature would basically be for package-manager exclusive use. So package managers decide on adding the file or not depending on the module hierarchy they build. We should prevent developers from adding a '.modules_legacy' file to their modules manually, because they would no longer work with NPM if they do so.

Yes, we could also just go for the commandline/environment var option but then either NPM users would need to provide it when they're developing the app and a module simultaneously, or users of other package managers would have to provide it every the time they run their application.

I have no :cow: with either of the emerging approaches (_symlink, .modules_legacy or env variable), I'm a :+1: for either one.

The important thing is that the standard resolve strategy is clear and simple.

Adding a fallback hook in either one of those ways is an acceptable corner case that provide great benefits (yay pnpm :heart:) :rocket: Let's just remember to document it well :stuck_out_tongue_winking_eye:

@alexanderGugel Example code:

import p from `mod/${process.config.target_defaults.default_configuration}/p`;

In this case I'm attempting to load a native resource (namely p.node) and determine whether it should load the Release or the Debug build based on the build type of node I'm running. Unfortunately es6 modules don't allow runtime path resolution. So I'd have to symlink p.node before running my script, if I didn't want to change my script every time I wanted to switch configurations that is.

This could be pretty easily automated while building, with a small patch. e.g. using a command like ./node_g $(which node-gyp) it could auto detect which configuration of node you're building with and automatically rewrite the symlink.

@VanCoding Yes, under the "other package manager" scenario with a flat node_modules structure the dot file approach at the module level would work as well as legacy resolution did. We're not trying to make the legacy resolution mechanism better than it was, just to preserve its behavior.

In your proposal would the use of .modules_legacy only apply to symlinks of individual javascript files in the same directory as the dot file and any symlinks within the immediate node_modules child directory?

Also, in your example:

c/
    index.js
    .modules_legacy

was .modules_legacy here superfluous? It would still work the same if removed because there are no symlinks present in the immediate directory nor does c/ have a child node_modules directory containing symlinks, correct?

What @kzc and @alexanderGugel are proposing falls under approach A.

I was just pointing out that _ as a prefix would not be a good choice due to its widespread use. I prefer a dot file approach or encoding the information within the actual symlink itself somehow instead of its name.

Edit: encoding the legacy realpath behavior bit within the OS symlink itself wouldn't survive tarring and untarring so I withdraw this idea as unworkable.

๐Ÿ‘ for using _ as a prefix for symlinks that should be resolved according to the old algorithm.

I was just pointing out that _ as a prefix would not be a good choice due to its widespread use.

As @alexanderGugel pointed out above, package names starting with _ aren't valid npm package names. Linking packages is by far the most common use case for using the old mechanism.

@kzc Can you point out a specific use case where a _ prefixed symlink is actually being used to share a node_modules directory located at the real path? I think this should be extremely uncommon.

Also I wouldn't call this "legacy behaviour". I think the above discussion is proof enough that there are legitimate use cases for both approaches.

That being said, prefixing using . also makes sense. . is hidden, so it kind of implies that it behaves differently.

๐Ÿ‘Ž for using a separate file. It's less flexible to enforce this behaviour on a "real" module-by-module basis, rather than per-link.

This does affect code completely outside of node_modules as well. And breaks one of the (to my understanding) old and reliable node module system rules: Each file is executed exactly once. Especially given the status of the module system as "locked", that shouldn't be expected (I think).

This file for example:

const bar = require('./bar'); // bar contains an index.js
const foo = require('./foo'); // foo is a symlink to bar
console.log('foo === bar', foo === bar); // expected to be `true`

In node 4 this runs the top-level code in bar/index.js once and in node 6 it will run it multiple times - including any side effects it might have (putting stylistic concerns about top-level side effects aside). Making such a change to a locked part of the API without any deprecation / opt-in period seems weird.

As @alexanderGugel pointed out above, package names starting with _ aren't valid npm package names. Linking packages is by far the most common use case for using the old mechanism.

@nowtvicebergteam That's fine for package names, but as @jkrems points out what about individual symlinked JS files? They can potentially have _ prefixes.

The dot file approach handles both symlinked files and symlinked directories.

@jkrems,

Each file is executed exactly once.

A symlink is a file. With the new behavior, this statement is now actually true.

@jkrems that depends on your definition of a file. Simply said - a symlink _is_ a file that points to another file. The question is now: Should the symlink be resolved and should the resolved path used for caching, module lookup etc. or should the unresolved symlink be used instead?Half of the posts in this issue are exactly about that question.

About your specific problem: I don't think this affects the ecosystem, but rather only local installations. Simple reason: A lot of modules are under git version control and git doesn't allow symlinks. But that's just my personal view. I might be wrong.

Edit: What @dlongley said ๐Ÿ˜„

@benurb git does support symlinks (at least when you only care about *nix, not sure about support on Windows). And I know of quite a few projects that check symlinks into git. And that do depend on the fact that the require semantics are "same file, as in you-edit-one-and-both-change". It was part of the contract for all this time, after all.

To be clear: I'm not saying "never ever make this change". I'm saying: making a change to a locked part of the API contract of node _without deprecation and/or opt-in transition_ doesn't look good.

@jkrems Windows doesn't support them afaik. But that's another topic. Sorry for bringing this up. Back to the main topic: "same file, as in you-edit-one-and-both-change". As @dlongley and I pointed out this is not the case. If you edit the content of the symlink target only its content changes. Changing the content of a symlink itself means changing the path the link points to. The link and the target are technically two files. You can see that e.g. from the fact how rm behaves. You don't need to call rm -r even if a symlink points towards a directory. Simply because it's a file.

In *nix a file consists of a name and an inode (its content). As both, the symlink and its target, have a name and a content they _are_ in fact two different files.

@benurb I don't think a philosophical argument about "what's the content of a symlink / what's in a name" is furthering the discussion. All I said was: There are apps that depend on the current behavior and it was well-known to be part of a locked API node provided. And, again, I'm not saying that it's better or worse than the new behavior. I'm saying changing it without any process around rollout isn't in line with how "locked" is described in the docs.

Yeah, sorry for the lengthy explanation. We're on the same page here as I agree the change didn't happen as smooth as it should have. The thing is that this change fixes a problem I experienced in a lot of projects since I started using peer deps regularly. I like the new behavior and don't think it should be reverted. On the other side I see that this causes problems with linked modules, but these can probably be fixed with the solutions provided above (_ dirname, etc.). But linking files instead of modules/directories is something completely different and imho behaves correctly the way you observed it.

@jkrems, as hashed out earlier in this thread, the documentation is inconsistent on this issue. This led to people having mixed expectations and surprises -- and some cases simply not working at all. Because the old behavior could be considered a bug, an original proposal was to make this change without changing semver-major, but that was rejected because it was determined that it was possible that some people were explicitly relying on the old behavior. This was also before it came to light that alternative package managers used it to achieve greater efficiency. So the change was made as a semver-major change instead.

I agree that, if possible, we should try to provide a means by which people can use the old behavior as needed -- and I think most people on this thread support that idea and have proposed ways forward to achieve that.

@jasnell The problem with using the _ prefix version is that we essentially need to stat twice the number of files, which might result into significant performance regressions for require().

Therefore - just as another idea - what if we choose the appropriate mechanism based on the target of the link, rather than the linkname?

If the symlink points to an absolute path, use the old mechanism, otherwise use the new one.

If the symlink points to an absolute path, use the old mechanism, otherwise use the new one.

Using symilnks pointing to absolute paths is legitimate in the new resolution scheme. So I'd prefer we come up with another explicit mechanism.

Here's yet another unifying module resolution proposal that's a variation of the _ prefix idea:

If a symlinked file or directory has a file of the same name in the same directory with a @ suffix then the old realpath resolution scheme is used on that link, otherwise the new resolution scheme is used.

New resolution scheme:

foo1.js --> ../../foo1.js

bar1 --> /usr/local/whatever1/

Old (realpath) resolution scheme:

foo2.js --> ../../foo.js
foo2.js@

bar2 --> /usr/local/whatever2/
bar2@

It has the advantage of keeping the symlink name intact as well as being simple to understand. Symlinks within the same directory can either use the new or old module resolution scheme on a case by case basis.

@kzc I like this version as well. Would be even simpler to implement actually (could be done by additional stat in readFilePath). Are you working on a PR for this? Or can I pick that up?

I just submitted a PR for the _ solution, but @ as a suffix would be trivial to add. I think both solutions would be fine.

@alexanderGugel I saw your PR which already does most of the work. I'd be happy if you picked it up. Thanks!

The additional @-suffixed-files beside symlinks indicating old realpath behavior has the advantage that it would work with old-style symlink installs using node versions 5.x and prior.

While I think the solution with the @-files is way better that prefixing symlinks with "_", I still believe we're on the wrong road now.

I again want to point out the following scenario:

/a/b/c/d/e -> /e
/e/f/g/h/j.js

now let's do require('/a/b/c/d/e/f/g/h/j.js');

Currently, that's absolutely no problem, just read the file and that's it.
With your proposal, we'd have to do the following checks first:

isSymlink('/a')
isSymlink('/a/b')
isSymlink('/a/b/c')
isSymlink('/a/b/c/d')
isSymlink('/a/b/c/d/e')
isSymlink('/a/b/c/d/e/f')
isSymlink('/a/b/c/d/e/f/g')
isSymlink('/a/b/c/d/e/f/g/h')
isSymlink('/a/b/c/d/e/f/g/h/j.js')

And then when we've discovered that /a/b/c/d/e is a symlink, we'd also have to check if /a/b/c/d/e@ exists.

A lot of checks to load a single file, don't you think?

But it gets even more complicated when a path consists of 2 symlinks:

/a/b/c -> /c
/c/d/e -> /e
/e/f/g.js

In this case, we'd have to check for the existence of /a/b/c@ and /c/d/e@ and to decide which one has precedence over the other. Even better, maybe we would have to resolve to some semi-real path. If for example /a/b/c@ exists, but /c/d/e@ does not, the semi-realpath could be /c/d/e/f/g.js.

One mor case:

/a/b/c -> /c
/c -> /d
/d/e/f.js

This is a dangerous situation, because /c@ would not be visible through the virtual path. So we would have to realpath('/a/b/c') first to then check for the existence of /c@, because checking for the existence of /a/b/c@ is a different thing.

I think this approach is just too complex and potentially means a big performance loss.

I know that this approach would give us the maximum flexibility. But do we really need it? Isn't our goal here to keep the new bahavior and find the most lightweight & elegant solution to still support the other package managers?

What I've proposed earlier would exaclty do that. With as little extra work as possible. The only thing that would be even more lightweight would be the environment and command line options. Maybe I should also do a PR so you can see how lightweight it is.

@VanCoding you're talking about your .module_legacy file proposal, right? Maybe it would also be an option to apply a property like "_loadingBehavior": "realpath" to the package.json.

@benurb exactly! Putting it into the package.json file would also be possible, since there always would be a package.json file where we would put a ".module_legacy" file.

It would be faster to just look up the existence of a file than parsing a json file an check if a key exists, though. But you definitely got the concept :)

@benurb ๐Ÿ‘Ž On doing this on a module-by-module rather than symlink-by-symlink basis. It's less flexible and @alexanderGugel's PR shows that the impact on performance is very limited: https://github.com/nodejs/node/pull/6460#issuecomment-215617902

The _-solution does not require an additional check if a file exists, only if the file (or symlink) could not be found in the first place.

On a side note, I would not call it module_legacy, since requiring via the real path of a package should not be deprecated due to the flexibility it provides.

The comparison with PHP above is not really a valid argument IMO, since Node's require is significantly different in the first place. One could also argue that falling back to the parent's node_modules directory is "wrong" from a user's perspective, which is definitely not the case and enables e.g. circular dependencies in the first place.

@VanCoding

With your proposal, we'd have to do the following checks first:

isSymlink('/a')
isSymlink('/a/b')
isSymlink('/a/b/c')
isSymlink('/a/b/c/d')
isSymlink('/a/b/c/d/e')
isSymlink('/a/b/c/d/e/f')
isSymlink('/a/b/c/d/e/f/g')
isSymlink('/a/b/c/d/e/f/g/h')
isSymlink('/a/b/c/d/e/f/g/h/j.js')

I suggest that parent directory symlinks are not followed to keep the scheme simpler if the alternate package manager folks do not have issue with it. Only the final module directory or JS file link is followed in the presence of a @-suffixed file of the same name in the same directory.

Edit: I previously quoted the wrong section. Fixed.

And then when we've discovered that /a/b/c/d/e is a symlink, we'd also have to check if /a/b/c/d/e@ exists.

That's correct. Let's wait for @alexanderGugel's @-suffix symlink peer file PR before we make any performance judgements.

Yet another variation on the _-prefix and '@' file proposals...

List the symlink names in a file (let's say .module_realpath) in each directory that you wish to use the old realpath symlink module resolution behavior. This would result in fewer stat() calls than the "@ file" proposal.

Still another variation...

Node would only read in a single .module_realpath file at startup located in the same directory where the realpath main script lives and only those named symlinks with be realpath'ed. The location of this .module_realpath file could be optionally overridden by a command line flag --realpath-symlink-file to point to a different file. That way all decisions whether to realpath or not can be done in memory and be completely explicit. (Edit: --module-realpath-file for the flag name may be better.)

After thinking a bit more about this, I really think that the _-prefix approach is the right one. I played around with the @-file idea as well. I think there might also be some confusion about how the currently proposed resolution mechanism works.

@VanCoding @kzc

While I think the solution with the @-files is way better that prefixing symlinks with "_", I still believe we're on the wrong road now.

I again want to point out the following scenario:

/a/b/c/d/e -> /e
/e/f/g/h/j.js
now let's do require('/a/b/c/d/e/f/g/h/j.js');

Currently, that's absolutely no problem, just read the file and that's it.
With your proposal, we'd have to do the following checks first:

isSymlink('/a')
isSymlink('/a/b')
isSymlink('/a/b/c')
isSymlink('/a/b/c/d')
isSymlink('/a/b/c/d/e')
isSymlink('/a/b/c/d/e/f')
isSymlink('/a/b/c/d/e/f/g')
isSymlink('/a/b/c/d/e/f/g/h')
isSymlink('/a/b/c/d/e/f/g/h/j.js')
And then when we've discovered that /a/b/c/d/e is a symlink, we'd also have to check if /a/b/c/d/e@ exists.

This is an interesting scenario. Let's have a look what actually happens.

The easiest way to find out is by adding a couple of debugs to 6460:

diff --git a/lib/module.js b/lib/module.js
index 249b010..67b2aeb 100644
--- a/lib/module.js
+++ b/lib/module.js
@@ -113,6 +113,7 @@ function readPackagePath(filename, exts, isMain) {
 }

 function readFilePath(requestPath, isMain) {
+  debug(`readFilePath ${requestPath}`);
   if (isMain) {
     return fs.realpathSync(requestPath);
   }
@@ -123,6 +124,7 @@ function readFilePath(requestPath, isMain) {
 // resolve to the absolute realpath if running main module,
 // otherwise resolve to absolute while keeping symlinks intact.
 function tryFile(requestPath, isMain) {
+  debug(`tryFile ${requestPath}`);
   const rc = stat(requestPath);
   if (rc === 0) {
     return readFilePath(requestPath, isMain);

Because that's the "expensive" part where we're actually resolving the filename / stating it.

If we now create the directory structure you proposed in your original example, we end up with a tree that looks something like this:

/Users/alex/a
โ””โ”€โ”€ b
    โ””โ”€โ”€ c
        โ””โ”€โ”€ d
            โ””โ”€โ”€ e -> /Users/alex/e

4 directories, 0 files
/Users/alex/e
โ””โ”€โ”€ f
    โ””โ”€โ”€ g
        โ””โ”€โ”€ h
            โ””โ”€โ”€ j.js

3 directories, 1 file

Now, let's see what _actually_ happens when we require the a file "under" the symlink:

NODE_DEBUG=module ./out/Release/node -e "require('/Users/alex/a/b/c/d/e/f/g/h/j.js')"

Output:

MODULE 93110: Module._load REQUEST vm parent: [eval]
MODULE 93110: load native module vm
MODULE 93110: Module._load REQUEST /Users/alex/a/b/c/d/e/f/g/h/j.js parent: [eval]
MODULE 93110: looking for "/Users/alex/a/b/c/d/e/f/g/h/j.js" in ["/Users/alex/repos/node/node_modules","/Users/alex/repos/node_modules","/Users/alex/node_modules","/Users/node_modules","/Users/alex/.node_modules","/Users/alex/.node_libraries","/Users/alex/repos/node/out/lib/node"]
MODULE 93110: readFilePath /Users/alex/a/b/c/d/e/f/g/h/j.js
MODULE 93110: load "/Users/alex/a/b/c/d/e/f/g/h/j.js" for module "/Users/alex/a/b/c/d/e/f/g/h/j.js"
required j.js

In fact the _ solution is fast, because of the very fact that we do not need to check whether or not an @ file exists.

TL;DR In short, that's simply not the way the module resolution works. Keep in mind that we also never check if a file is a symlink, but instead we resolve its real path.

@alexanderGugel I'm okay with the _-prefixed symlink scheme. It's simple and easy to reason about what behavior you're getting. I just thought the @-suffix-file proposal or the other variations would make the install tree backwards compatible with older node version behavior as it does not alter the symlink name.

Regarding https://github.com/nodejs/node/issues/3402#issuecomment-215597796

If the symlink points to an absolute path, use the old mechanism, otherwise use the new one.

Could you please clarify the use of "absolute path" in that statement?

Under the '_'-prefix proposal would the following symlink use the new resolution scheme or the old realpath one:

foo ---> /usr/local/foo

Here is another idea (conceptually similar to the @kzc's module_realpath proposal)....

Instead of relying on the dynamic behavior of require, we could provide some
way to "bootstrap" the require cache, essentially overriding Node's default
behavior.

Given an index.js file that looks something like this:

require('a')
require('b')

And a node_modules.json file in the same directory:

{
  "a": "/some/completely/different/path"
}

Now, when running node index.js, Node would read in the node_modules.json
file and bootstrap Module._cache with it.

This would provide an easy migration path for use-cases that rely on the old
mechanism (the node_modules.json file could easily be generated), while
enabling possible performance improvements through some form of static
declaration of resolved, required paths (as in "no node_modules path fallback
or symlink real path resolving").

This kind of information could be encoded in the package.json as well.
But the require strategy would be defined in the package that requires some
dependency, rather than in the package.json file of the dependency itself (the
package being required).

... just an idea that might be worth investigating (potentially independent of
this discussion).

@kzc

The _-prefix proposal and the absolute path idea are separate.

Under the '_'-prefix proposal would the following symlink use the new resolution scheme or the old realpath one:

foo ---> /usr/local/foo

When using the _-approach, this path would be resolved according to the v6.0.0 mechanism, since the linkname (foo) does not start with an underscore.

If the link would be called _foo, it would be resolved according to the original real-path mechanism.

When using the absolute path solution, this link would be resolved according to the pre-v6.0.0 mechanism (real path).


If the symlink points to an absolute path, use the old mechanism, otherwise use the new one.
Could you please clarify the use of "absolute path" in that statement?

Absolute as in path.absolute (starting with / relative to the root of the filesystem).

Absolute: /hello/world
Not absolute: hello/world

foo ---> /usr/local/foo

When using the _-approach, this path would be resolved according to the v6.0.0 mechanism

Thanks for the clarification. Just wanted to verify that the underscore prefix is the sole "decider" of realpath or v6.0.0 module resolution behavior in your _-prefix proposal.

@alexanderGugel I think the node_modules.json bootstrap proposal is good for a few reasons:

  1. It is explicit and clear.
  2. It does not alter the symlinks' names.
  3. It will be as performant as the _-prefix proposal.

Just kill circular dependencies. All your problems are gone. (except from the whiny abusers of this misfeature).

@domenkozar

Just kill circular dependencies. All your problems are gone. (except from the whiny abusers of this misfeature).

Circular dependencies have nothing to do with the ability to symlink a package.

Besides that, the majority of heavily modularised packages (e.g. browserify) have circular dependencies in some form or another. And that's totally cool. There is no reason this use-case shouldn't be supported.

(just in general, I don't think it's constructive to call people "whiny" - just my opinion)

@kzc

@alexanderGugel I think the node_modules.json bootstrap proposal is good for a few reasons:

  1. It is explicit and clear.

I would argue that prefixing symlinks with _ is also a pretty good indicator that those are somehow "different". Keep in mind that the majority of users wouldn't create those links manually, but would use tools like npm link, pnpm, npminstall, or ied that create them.

  1. It does not alter the symlinks' names.

That's a fair point. I'm not sure that's a big disadvantage though, since we would still keep things backwards compatible to v6.0.0.

  1. It will be as performant as the _-prefix proposal.

It will be slightly less performant, because you now need to read and parse the node_modules.json file for every directory in the path that might potentially contain any symlinks.

Circular dependencies have nothing to do with the ability to symlink a package.

All this nesting is a result of circular dependencies. Otherwise you could calculate a DAG graph of dependencies and be done. Just like we do it in Nix.

@domenkozar

All this nesting is a result of circular dependencies. Otherwise you could calculate a DAG graph of dependencies and be done. Just like we do it in Nix.

While I like Nix, I don't think this is the right place to rant about the way modules are being resolved in general. I would suggest filing a new issue describing the proposed behaviour.

Apparently node uses symlink resolution (https://github.com/nodejs/node/issues/6500#issuecomment-216378313) to protect itself from loading a binary module more than once.

Until another solution is found to assert particular module identity in some other, portable way (not easy!) this cannot be solved at the moment. (There would have been two different require code paths for binary and JavaScript modules). https://github.com/nodejs/node/issues/6500 is a nice example how the implementation https://github.com/nodejs/node/pull/5950 fails in that respect.

Apologies for being late here, but can anyone tl;dr what was discussed? It's a long thread, and I'm not sure what the resolution is at the moment.

Tl;Dr is this: we had a bug in the module loader that prevented symlinked
peer dependencies from finding each other. We fixed it but the fix broke
other things. So we are currently looking to revert that fix and look at
solving it a different way.
On May 3, 2016 5:21 AM, "Rico Sta. Cruz" [email protected] wrote:

Apologies for being late here, but can anyone tl;dr what was discussed?
It's a long thread, and I'm not sure what the resolution is at the moment.

โ€”
You are receiving this because you were mentioned.
Reply to this email directly or view it on GitHub
https://github.com/nodejs/node/issues/3402#issuecomment-216511065

Thank you!
On May 3, 2016 9:04 PM, "James M Snell" [email protected] wrote:

Tl;Dr is this: we had a bug in the module loader that prevented symlinked
peer dependencies from finding each other. We fixed it but the fix broke
other things. So we are currently looking to revert that fix and look at
solving it a different way.
On May 3, 2016 5:21 AM, "Rico Sta. Cruz" [email protected] wrote:

Apologies for being late here, but can anyone tl;dr what was discussed?
It's a long thread, and I'm not sure what the resolution is at the
moment.

โ€”
You are receiving this because you were mentioned.
Reply to this email directly or view it on GitHub
https://github.com/nodejs/node/issues/3402#issuecomment-216511065

โ€”
You are receiving this because you were mentioned.
Reply to this email directly or view it on GitHub
https://github.com/nodejs/node/issues/3402#issuecomment-216520014

@jasnell Peer dependencies was just one advantage. The 6.0.0 module resolution behavior also facilitates testing of modules with symlinks without the need for copying files or installs. It also allows the layout of customized hand built installs to save disk space on constrained devices. It offered a lot of flexibility that the old scheme lacks.

It would be useful if the 6.0.0 module resolution behavior could exist behind a command line flag defaulted to false rather than removing it altogether.

That very well could be an option but we need to unbreak things first, and
then move forward from there.

I am definitely considering the flag approach to enable the new behavior.
I've already started exploring that, in fact, and it shouldn't be too
difficult to do at all. I'd just also like to continue looking into whether
we can solve it without a flag tho.

@jasnell Thanks. A non-flag solution is always preferable. But even with the flag it would be very helpful.

Tl;Dr is this: we had a bug in the module loader that prevented symlinked peer dependencies from finding each other. We fixed it but the fix broke other things. So we are currently looking to revert that fix and look at solving it a different way.

I don't entirely agree that this is a bug.

However, if we are going to say that the require() look path should include the node_modules folders based on the symlinked location of a module, then it isn't acceptable to _remove_ the node_modules folders based on the realpath location of a module.

Here is a git repo with 2 clear examples of what changed, and why this is either subtle and inefficient, or outright harmful and surprising, based on how module dependencies have been linked: https://github.com/isaacs/node6-module-system-change

(Note: I've written programs that require() a module from on a symlinked location and expect it to still be able to load its deps, so while the example is contrived in its minimal-ness, it's not contrived in principle and does reflect some real-world usage.)

An ideal solution, if it is a goal to have symlinked modules find one another if they are not otherwise dependent on one another (which, again, I am highly skeptical about as being a good idea) would have to prioritize the pre6.0 realpath-location-based lookup behavior as the first priority lookup path, and _then_ add the node_modules lookup locations as a lower-priority set of paths.

The cache entry should still be based on the realpath for the sake of efficiency and minimizing the semantic change in what is a singleton.

If you are affected by this today, you can easily work around the bug by setting the NODE_PATH environment variable to the node_modules folder where you're sticking stuff.

I think that it'd be an interesting idea to add the main module's lookup paths to the require() calls done by other modules, but even that should be messaged front and center as a significant and potentially hazardous change.

Updated the git repo with a few specific proposals.

@isaacs

if it is a goal to have symlinked modules find one another if they are not otherwise dependent on one another (which, again, I am highly skeptical about as being a good idea)

Well, with the growing trend of the monorepo approach and a significant boost to development speed that thought deserves the benefit of the doubt.

Symlinking can be fixed, and works great :)

Since this issue has the _discuss_ label, I thought I could ask here ...

The second best thing to being able to do const WebSocket = require('../vendors/node/node_modules/ws'); (which cannot be done) is having a symlinked node_modules directory like below.

Which version of the behavior would allow the following?

/var/www/project/
โ”œโ”€โ”€ app
โ”‚ย ย  โ””โ”€โ”€ ws
โ”‚ย ย      โ”œโ”€โ”€ node_modules -> /var/www/project/vendors/node/node_modules
โ”‚ย ย      โ””โ”€โ”€ wsserver.js
โ””โ”€โ”€ vendors
    โ””โ”€โ”€ node
        โ”œโ”€โ”€ node_modules
        โ””โ”€โ”€ package.json

@majid4466 symlinking that node_modules dir should work with the current node.

As for the original opening question: dep1 probably shouldn't require dep2, but instead provide a factory that can produce whatever based on dep2. The factory could have a meta data property to declare which other modules the app (or a plugin manager) needs to provide. It's one flavor of dependency injection.

Was this page helpful?
0 / 5 - 0 ratings

Related issues

mcollina picture mcollina  ยท  3Comments

addaleax picture addaleax  ยท  3Comments

danielstaleiny picture danielstaleiny  ยท  3Comments

vsemozhetbyt picture vsemozhetbyt  ยท  3Comments

Brekmister picture Brekmister  ยท  3Comments