Pnpm: Fixing --preserve-symlinks. Enhancing node to exploit.

Created on 4 Dec 2016  路  54Comments  路  Source: pnpm/pnpm

To all package managers, npm, yarn, ied, pnpm

Are Symlinks The Problem?

One day long ago, a version of node was released that purported to support symlinking of module directories. Its implementation was flawed. It offered no way to turn the "support" off. And thus the ecosystem proclaimed symlinks to be "_a very bad thing_", best avoided in practice, choosing to believe they could never possibly work well with, let alone actually improve node.

This simple issue at nodejs/node challenges that by:

  • Fixing three critical problems with --preserve-symlinks:

    • _Memory Bloat_

    • _Add-on Crashing_

    • And "The Fundamental Flaw": _That it always converts "main.js" to its real path_

  • Backward compatibly enhancing node so modules can be stored separately from the directory structures that dictate their dependency version resolutions, while keeping those version-specifying structures coupled to a given top-level '/node_modules' root. This seamlessly enables:

    • _Machine level stores_

    • _Simplified single-machine, concurrent development of dependent modules_

    • _50x reduction in install times (after initial install)_

    • _A way out of symlink directory cycles_

To access a fork/branch with fixes, and to learn a little more about why and how, please visit the issue at nodejs/node. If you see the value, please offer your resolute support.

Most helpful comment

Plus the hashing issue. I just can't think of a way to guarantee module security and authenticity if modules are allowed to modify themselves during the install, which is what compiling is doing. (Wait.... Unless the compile environment was a Docker container! Hmmmm I have something fun to think about now)

All 54 comments

Awesome that you fixed the symlink issues!

And "The Fundamental Flaw": That it always converts "main.js" to its real path

I know about this issue! We use a "proxy file" as a workaround in pnpm (but only for binstubs).

Backward compatibly enhancing node so modules can be stored separately from the directory structures that dictate their dependency version resolutions, while keeping those version-specifying structures coupled to a given top-level '/node_modules' root. This seamlessly enables:

I am not sure why this is needed. When running projects with the --preserve-symlinks option, there are almost no issues. We already use a global store for packages (shared-store).

@zkochan Thanks for the hooray :)

Regarding pnpm's use of a global store, it was my understanding that once a package is installed there, the versions of dependencies it uses don't change, because they're symlinks under the package's '/node_modules' to specific versions also in the global store. This is not a bad thing in-and-of-itself, but assuming my understanding is correct, it can result in multiple versions of shared, common modules being created in memory, when a single version would have sufficed.

A larger potential concern, again assuming my understanding is correct, is it would also mean that the dependency versions for a package in a global store might change from machine-to-machine, which might not be ideal as it could result in subtle differences in behavior from machine-to-machine. Given those differences would exist in a transitive dependency, they might be a little harder to track down, and would then require changing the versions on one machine so they're inline with another, which could adversely affect some other project on the machine symlinked to the global package. This can be a problem when there are teams of developers, where's there desire to ensure everyone is using all the same versions of all dependencies, transitive or otherwise, for a given root '/node_modules'

The adjacent.node_modules approach addresses those issues in a very simple way, that can be used in flat trees, like pnpm and npm create, but also, in what I call 'bubble' trees, that npm and yarn create (where they try to 'bubble' common dependencies up the highest ancestor '/node_modules' directory), even in 'exploded' trees (like the very first ones 'npm' created way back when).

However, I am human. Did I overlook something with how pnpm uses a machine store? I'd be very interested to understand this further.

@zkochan Also, not sure of you looked into the testing tool created to verify these things, but these two test cases might be of interest to you

anm: Same Mod Dif Dep Ver
anm: Flat Tree

Regarding pnpm's use of a global store, it was my understanding that once a package is installed there, the versions of dependencies it uses don't change, because they're symlinks under the package's '/node_modules' to specific versions also in the global store. This is not a bad thing in-and-of-itself, but assuming my understanding is correct, it can result in multiple versions of shared, common modules being created in memory, when a single version would have sufficed.

yes, this is correct. The dependencies will only change when doing a pnpm update.

A larger potential concern, again assuming my understanding is correct, is it would also mean that the dependency versions for a package in a global store might change from machine-to-machine, which might not be ideal as it could result in subtle differences in behavior from machine-to-machine. Given those differences would exist in a transitive dependency, they might be a little harder to track down, and would then require changing the versions on one machine so they're inline with another, which could adversely affect some other project on the machine symlinked to the global package. This can be a problem when there are teams of developers, where's there desire to ensure everyone is using all the same versions of all dependencies, transitive or otherwise, for a given root '/node_modules'

I agree, this is a problem. We wanted to fix it the same way yarn fixed it - using a dependencies lockfile (https://github.com/alexanderGugel/ied/issues/188)

@zkochan Oh, one more thing with adjacent.node_modules, is it also allows being able to represent module dependency cycles without requiring symlink directory cycles.

FWIW, the yarn lock file can't work with machine level stores, but could with adjacent.node_modules

Oh, indeed! You are completely right about machine-level lockfiles!

By the way, we have some issues caused by circular symlinks (https://github.com/rstacruz/pnpm/issues/400), so this is great news that with your changes they will not be needed anymore.

I will try to read your documentation and look into the testing tool tonight. This is all very exciting!

Having looked at alexanderGugel/ied#188,

Say three months ago I was working on prjA that used modA@v1, that had a dependency on depB. When I installed prjA's dependencies for the first time, depB came in as depB@v1, and I created a lock file, locking depB@v1 for prjA.

A week ago another dev started prjB, that also used modA@v1, but when they installed, depB came in at [email protected]. They created a lock file.

Today I need to work on prjB. Assuming modA@v1 was in my machine store, it would sill be using depB@v1, so now what happens? I can update it so it use [email protected], but then prjA's lock file is no longer being honored.

Can you see the dilemma? adjacent.node_modules fixes all of that so simply... I hope you guys take a look and see the value, then get vocal with nodejs :)

@phestermcs I did a draft on the sjw branch. Probably tomorrow I will try it out with your fork of Node

Awesome!! You'll be the first. I'm looking forward to your experience. You might want to review the two test cases I referenced a couple of comments ago as well.

You can also use the testing tool to quickly laydown test directory structures & modules, that when ran with node will output require() resolutions, that might help inform how you might make minor changes to pnpm

Good luck!

I love everything I'm hearing, I haven't comprehended how you've solved the single-source-code-location yet multiple-possible-dependency-resolutions-depending-on-the-parent with this new adjacent node_modules directory, but I'm excited to find out!

@wmhilton Take a look here. Its the output showing how this works, using a custom test tool I've built for this whole thing. You may want to spend a couple of minutes reading the tool docs, but it should be fairly straightforward.

Re the test result link on last comment, look at how libC's version is getting resolved.

@zkochan Just FYI, I made a couple of updates to v7.2.0-sjw. First was changing the .node_modules to +node_modules like we'd discussed. I also removed checking the 'execute' bit on the "main.js", as @sam-github showed me the error of my ways. Now, if "main.js" is a file-symlink, it's resolved target always becomes the main modules __dirname (rather than converting to realpath). This should still work just fine, as someone symlinking a .js file to another .js, then placing the second away from the first, then launching with the second on the command line, and expecting the link to NOT be followed, is like never going to happen (knock on wood); today that senario would still not work, but even worse, as the released version would still convert to its realpath, as I'm sure you're fully aware of.

I will say, I'm not entirely confident the guys over at nodejs are going to go for all this. I've been working on this for about a year. Not full time, but just thinking about. I spent the last three months actually trying different things, and in my opinion, the only way to have machine stores, remain backwards compatible, be interoperable, prevent directory cycles, and then do it in just three lines of the simplest code ever, that's adjacent+node modules, which currently the key players on nodejs are adamantly viewing as a -1, and no one there has spent even a second looking at all the test cases I created, or the flawless citgm output, which they made me jump through for no reason (and trust me citgm is soooo painful to use)

I'm actually really bummed... But I have to thank you @zkochan, as over the last 3 months, in my various attempts at engaging people from nodejs, yarn, & npm, you're the _only_ one who actually 'got' it, and your support has been a warm light in cold night.

I will be curious to see how you get a prototype pnpm working, and I'll always help you in any way I can.

Sounds good. Unfortunately, I wasn't able to work on it today (I had a long flight recently and I still have jet lag, 馃槃), but I am reading all the threads and I am very excited with what you did!

Don't be bummed, I know that there are a lot of people that are excited with your proposals! I am sure all the pnpm, ied collaborators/users are totally on your side. You are saving our concept of shared stores! 馃巻

Ah, alas I haven't had time to read the two links you just gave me @phestermcs , but I've been composing a reply, here it is:

Do I understand the concept of the parent.node_modules folder?From what I can decifer, it modifies the semantics of require enough that one could essentially create a "shadow" dependency tree where the actual dependencies live, separate from the physical location of the package itself? So say I wanted to implement a package installer that placed all packages in a fixed location in versioned subdirectories, like /{version}/{package name}, AND I wanted to implement these two conflicting lockfiles:

ied/yarn.lock - installed Dec 6, 2016

[email protected]:
  version "1.0.3"
  dependencies:
    mime-types "^2.1.11"
mime-types@^2.1.11:
  version "2.1.13"

pnpm/yarn.lock - installed Dec 8, 2016

[email protected]:        <-- `pnpm` uses the exact same package and version as `ied` BUT...
  version "1.0.3"
  dependencies:
    mime-types "^2.1.11"
mime-types@~2.1.11:
  version "2.9.0"     <--- The mime types author was super productive on Dec 7th

The goal: install pnpm without changing ied in any way, such as upgrading its version of mime-types.

I could do a directory structure like this:

/bin/ied -> /modules/ied/3.0.0
/bin/pnpm -> /modules/pnpm/4.0.0
/modules/ied/3.0.0/cli.js
/modules/ied/3.0.0/node_modules/awesome -> /modules/awesome/1.0.3
/modules/ied/3.0.0/node_modules/awesome.node_modules/mime-types -> /modules/mime-types/2.1.13
/modules/pnpm/4.0.0/cli.js
/modules/pnpm/4.0.0/node_modules/awesome -> /modules/1.0.3/awesome
/modules/pnpm/4.0.0/node_modules/awesome.node_modules/mime-types -> /modules/mime-types/2.9.0
/modules/awesome/1.0.3
/modules/mime-types/2.1.13
/modules/mime-types/2.9.0

yes

I'm downloading your fork now... it's been a year since I've tried to compile node and I've got a lot on my plate, but hopefully I'll be able to test it too!

I read the documentation and stared at your test code for a while, but don't really understand what's going on. I'm hoping once I compile node and run the tests it will make more sense - in the meantime I opened an issue where I say I'm confused and need help understanding it.

You are saving our concept of shared stores! 馃巻

and static repos! Don't forget them. I want that static store / repo so I can sync the whole thing over webtorrent or scuttlebutt. Mwahahaha!

@wmhilton They did a good job of making it super easy to compile node on either win or nix (on win you still need to install VS tool chain). Replied to your issue on the test repo; hope it clarifies things.

If we ever get nodejs to make this happen, I'd like to make a service that can run on a team of developers machines in the background, and whenever one of them downloads and installs a module to the machine store, all the others are told to go get the same. So when you go and start working on something another developer was, most the modules should already be on your machine, and then you symlink to them all in 2 seconds. ...nice dream anyways :)

@zkochan @wmhilton You'll want to set NODE_SUPPORT_SYMLINKS=1 btw.

Also, there is no command line switch. This is to ensure when a node program spawns node for some reason, like if pnpm runs lifecycle scripts that live or depend on symdirs inside /node_modules for example, they both always run with same switch setting.

One other thing, when you do node -v you'll get v7.2.0-sjw so you know which one you're running. But this also means if you try to install an addon, node-gyp will try to download the headers for that version, which don't exist. However, you can tell node-gyp to get the v7.2.0 headers, and then go to, I think ~/.node-gyp, and just copy the v7.2.0 directory to v7.2.0-sjw

Still downloading visual studio updates... 馃槅

Did my response to your issue help?

@zkochan I think isaacs 1st approach really isn't viable, from my tests with hardlinking. His second approach _might_ work, but your point about require('foo/bar') no longer working I think may likely doom it. After looking into it some more, even putting the realpath in the search list wont help, because it would be the path of the module calling require('foo/bar'), and not foo/'s. He will have to add a fair amount of complexity in order to, effectively, convert foo/ into foo/contents, as right now the implementation does everything by simply dealing with the strings of path parts, and to handle mapping foo/ to foo/contents will require additional "housekeeping" and keeping maps of paths, not to mention also ensuring all __dirnames descend from "main.js"'s symbolic path (which right now I don't think he even sees as an important requirement of any solution). WAY more complicated than the couple of lines that is anm.

Further, like anm, it will require tooling like webpack to be altered to resolve modules the same way. To me, the only reason to try so hard to do everything through /node_modules is if it wouldn't require those kinds of changes, and if those kinds are on the table, then anm is a much simpler and comprehensible solution in my opinion.

I'm a bit frustrated, as not a single member from nodejs has actually acknowledged that the current anm prototype actually works at all, let alone very well, instead, if not just being outright silent and ignoring evidence to that effect, choosing to send out lots of -1's and :-1:'s on the edges of the thing or from dogma. It really takes the wind out of my sails.

Anyways, I'm curious if you're still working on a branch of pnpm that works with anm?

I ask because they use a tool called citgm to test versions of node for regressions. It simply has a list of about 60 some modules, and then sequentially uses npm to download, install, and run each one's test script. It is absolutely painful to use, as it takes quite a long time and cant scale (cant even run multiple instances on a multi-core server), and is not deterministic in the results it generates, but I digress. What I was planning to do was tweak a version of npm to use anm, and then run citgm with that version, because if THAT all worked, it would be a huge testament to anms viablity; HUGE.

So I'm wondering if we could instead us pnpm with citgm? How close in functional parity is pnpm with npm regarding execution of lifecycle scripts, bin symlinking (npm's "build" step), etc.? How hard, or easy (hope hope) do you think it would be for us to get pnpm working with citgm?

How close in functional parity is pnpm with npm regarding execution of lifecycle scripts, bin symlinking (npm's "build" step), etc.? How hard, or easy (hope hope) do you think it would be for us to get pnpm working with citgm?

It is very close. pnpm tries to do everything the same way npm does. All differences are only there to make it work with symlinked packages in node_modules.

Anyways, I'm curious if you're still working on a branch of pnpm that works with anm?

What I did so far is on a branch called sjw. I did not run it with your fork of Node yet. I just checked that the directory structure is using your anm approach.

I'll try to test it on Saturday (European time) or maybe earlier if I'll have time after work.

I like your style :), big TypeScript fan as well (since v0.8!!)

I installed pnpm then cloned the repo, and when I ran pnpm i, the engine checks failed because the node-sjw -v build outputs v7.2.0-sjw, and the -sjw at the end is a problem. I put it there because during citgm testing (which uses the version of node resolved from PATH), I had to run comparisons between it and the release version, and wanted a quick way to know from the command line which version of node was resolving from PATH.

I'm going to change the version output to v7.2.100. node-gyps header store will still need to have the v7.2.0 folder copied to v7.2.100.

@zkochan fwiw you should probably pull and rebuild v7.2.0-sjw. when I changed the . to + i didn't change it in all places. However, now knowing a + can't be in a mod name simplified one tiny piece of it. i've also changed the version to from v7.2.0-sjw to v7.2.100 so you can still tell if youre running the version, but it should pass your engine checks.

as you may discerned, im totally burnt out on this thing right now.. and you probably weren't aware that ive been trying to engage nodejs just in a discussion about this thing for almost 3 months now, and.. just tired of there general bias there.. so.. yeah.. just done with pushing this.

however, i still would like to see pnpm working with it!!! this is the only related issue i haven't unsubscribed from so if there's anything i can help or offer just let me know.. i'll also be playing with pnpm and this as well.

really dig your style @zkochan !! keep it up!!

I understand your frustration but don't give up! I really believe that it will work! Especially once we have a proof of concept with pnpm :smile:

If this change will fix the remaining issues of pnpm, then I will certainly use your branch of Node till it gets merged. Fast dependency installs are making my life so much easier!

@zkochan I believe I've quite effectively rattled the cage, although unfortunately to my great detriment it would seem. I hope you are still willing to work together on a POC using pnpm?

Sure!

You said you wanted to change dots to pluses in all places? Is your fork ready for experimenting?

Yes!

I've changed the version from v7.2.0-sjw to v7.2.100 which seems to quite most the engine checks, but you'll still be able to know what version is resolving from PATH withnode -v.

You'll also need to set NODE_SUPPORT_SYMLINKS=1

A couple of things I'm curious to look at once there's something working:

  • The 'install' lifecycle events; you may already be doing something about this, but I think they should only run when a module is 'installed' to the store, and not necessarily everytime it's referenced from a root /node_modules. npm's "build" step, that creates the bin and man stuff, should though. The challenge is that the install lifecyle scripts can depend on other modules.
  • Detecting and then handling circular dependencies without symdir cycles in a flat structure.

You said your sjw branch can already generate with anm? I'm a bit giddy to play with myself. Is there anything special to do to have it work that way?

yes, you can try it yourself. I did not use any special options just changed the logic to use anm's instead of node_modules.

https://github.com/rstacruz/pnpm/compare/sjw

So the link still shows it using .node_modules?

My solution was to make a .node_modules folder (bad naming I guess) in the root of node_modules and store all the adjacent.node_modules there. I link around those adjacent.node_modules then to where they are needed.

Something like this:

+-- node_modules
     +-- .node_modules
     |    +-- [email protected] (this is the adjacent node modules)
     +-- foo (~/.store/[email protected])
     +-- foo.node_modules (./.node_modules/[email protected])

Using .node_modules all by itself is just fine. I thought I saw it was not yet suffixing with +node_modules?

oh, yeah, I should update the suffixes. I'll do it in a minute

I pushed the fix

@phestermcs I just tried it with your fork of node. All tests passed except these two: "shrinkwrap compatibility" and "building native addons". The first one is obvious why. The second, I have to investigate the reason.

Now to answer some of your concerns

The 'install' lifecycle events; you may already be doing something about this, but I think they should only run when a module is 'installed' to the store, and not necessarily everytime it's referenced from a root /node_modules. npm's "build" step, that creates the bin and man stuff, should though. The challenge is that the install lifecyle scripts can depend on other modules.

Yes, we currently run the lifecycle events once per installation... and you are right, this might be bad in some cases, but I haven't heard about issues yet.

Detecting and then handling circular dependencies without symdir cycles in a flat structure.

I'll probably try this tomorrow, and also I'll try different scenarios when using the store with several projects

Re addons, node-gyp keeps the headers by version in .node-gyp in your home dir. copy the 7.2.0 directory to 7.2.100, see if that works

Regarding the install lifecycle, you run them once per installation? Not sure if you mean just once to put in store, or everytime they're referenced?

once to put in store. Hence, once per installation per machine

Thats good! I think that's how it should be; just my opinion. Now, here's a possible edge case. The install itself may depend on other modules that would 'normally' exist as a local copy, and knowing those versions might slightly change over time, in theory the install itself could technically end up a little different from install to install.

So, that's not a necessarily a big problem, but I think ultimately, if/when you and ied maybe work together, and modules are keyed by the hash of their content, then any variance that might exist in a lock file because the install was a little different between two machines would be addressed... does that make sense?

And.. oh yeah.. IT'S KINDA WORKING!!! Right?? Yeah!!!

I know that you worked on it for months, but I am really surprised that almost all the pnpm tests passed from the first launch! Amazing!

So, that's not a necessarily a big problem, but I think ultimately, if/when you and ied maybe work together, and modules are keyed by the hash of their content, then any variance that might exist in a lock file because the install was a little different between two machines would be addressed... does that make sense?

Actually I thought the checksum of a package would be calculated before the lifecycle events, but what you are saying might be a good point. We work on some specs in the ied repo, you can post your concerns here at the lockfile spec PR: https://github.com/alexanderGugel/ied/pull/191

@zkochan I think it works so well because it kinda is the same way things already work; just replaced a / with a + :).

I'm curious if the addon failed simply because of node-gyp trying to find headers for v7.2.100. Let me know if copying ~/.node-gyp/7.2.0 to ~/.node-gyp/7.2.100 solves that problem?

Seems like the node-gyp thing helped.

Now there's another issue. I have to update pnpm to make it create the adjacent node_modules for packages already available in the store. There was no such need while the node_modules folder was inside the package.

Shouldn't be hard to implement. Probably tomorrow will do it

wow, its a long thread

@iamstarkov and you just made it a little longer! :)

I have to update pnpm to make it create the adjacent node_modules for packages already available in the store.

Can you describe that some more? maybe with a sample structure? just curious what your meaning is.

Another thing I'm curious on, is if you can remove your proxy files now that "main.js" is being preserved?

Hi @ghost. Don't burn out. I will make this work if I have to fork Node and write pull requests for every module in npm to make it compatible.

Personally, I'm not worried about node-gyp compile steps. In the near future, I think everyone will pre-compile native add-ons. Travis is free for open source projects: all that's needed is to cross-compile for all supported machine architectures before publishing to the registry. Which needs to be done in any case, because how else can you run the unit tests on all supported machine architectures? I've personally despised packages that try to compile themselves on my machine, because they cause a huge amount of trouble, particularly after I uninstalled Python2 so I could do some Python3 work.

Plus the hashing issue. I just can't think of a way to guarantee module security and authenticity if modules are allowed to modify themselves during the install, which is what compiling is doing. (Wait.... Unless the compile environment was a Docker container! Hmmmm I have something fun to think about now)

Oh holy fuck. He really deleted his Github? Talk about burn out! I hope that was a dummy account. Um... I've got a module store installer.... check it out? https://github.com/wmhilton/modinst

node-sls may be nice compromise . Doesn't require changing node directly.

Seems like the new store design implemented in this PR #524 works pretty well. Even without --preserve-symlinks, so I am going to close this issue

Was this page helpful?
0 / 5 - 0 ratings

Related issues

nickpape picture nickpape  路  37Comments

rstacruz picture rstacruz  路  27Comments

aecz picture aecz  路  28Comments

seoker picture seoker  路  31Comments

zkochan picture zkochan  路  35Comments