Node: Suggestion: Centralized package repository that support multiple versions of a package

Created on 19 Jan 2019  Â·  28Comments  Â·  Source: nodejs/node

Problem

Projects that use Node.js likely also use npm packages, and thus contain a node_modules folder. Having multiple projects like this leads to having multiple node_modules folder which likely contain duplicated packages. This not only wastes disk space, it also wastes bandwidth and time to install these packages.

(Other platforms (such as Haskell, Rust, Java) avoid this by having a centralized package repository, I was quite surprised at Node's design decision)

Description

  • When user executes require("pkg-name"), if "pkg-name" is not found in module.paths, Node.js should proceed to search in a fixed location (let's call it $PREFIX/.node_package_store until we find a better name) for a "pkg-name" that matches criteria specified in a manifest file within the project (preferably but not necessary package.json).
  • $PREFIX/.node_package_store is not a node_modules and $NODE_PATH (i.e. module.paths) does not affect it.
  • $PREFIX/.node_package_store should support multiple runtime versions (es5, es6, nodejs versions, bundlers, etc.), multiple package versions and multiple registries.

Example structure of a .node_package_store

When use npm to install React from registry.npmjs.org

$PREFIX/.node_package_store
└── [email protected]
    └── react
        └── 16.0.0
            ├── content
            └── metadata

Alternatives I've considered

  • /node_modules, $HOME/node_modules and the likes: Does not support multiple versions, as a result, different projects still require different/separated node_modules.
  • I can create a loader myself, but package managers (npm, yarn) don't support it, so it is useless.
  • I can use pnpm — a package manager that uses hardlinks and symlinks to solve this problem, but it also makes things more complicated.
feature request module

Most helpful comment

wastes disk space

When I first used Node.js, I really liked how once I'm done with a project, I can just delete the whole directory and get rid of all the modules I installed for it in one swoop. It felt like a huge improvement compared to Python, where you can easily end up with dozens of globally installed packages that you can't remove because you don't know if some script somewhere depends on them.

So your argument about disk space is a double-edged sword. Sure, you will have multiple copies of the same module if you have many projects depending on it. But I'd rather have many copies of a module that are actually used, rather than a pile of globally installed packages that might be long obsolete.

All 28 comments

This not only wastes disk space, it also wastes bandwidth and time to install these packages.

That's up to the package manager (npm, yarn, etc), that's not really within node's remit. The popular ones cache packages locally however (e.g. $HOME/.npm)

The disk space argument isn't that strong in this age of multi-TB hard drives. (It's come up before.)

@bnoordhuis

That's up to the package manager (npm, yarn, etc),

What could a package manager do if node does not load modules that the package manager installed? (I'm refering to a centralized package repository)

The popular ones cache packages locally however (e.g. $HOME/.npm)

The popular ones copy files from cache to local node_modules.

The disk space argument isn't that strong in this age of multi-TB hard drives. (It's come up before.)

Why handicap your users? You do you assume/demand every user to have TB of disk space? Even if they have TB of disk space, why prevent them to use that space more efficiently? Efficiency is a feature, your counter-argument is not as strong.

_(BTW, there's a recurring meme portraying node_modules to be more massive than a black hole, so you know how strong of an argument this is)_

(It's come up before.)

So I guess it has been there for ages. It's good to know that I'm not the only one who wants this.

npm is experimenting with approaches to this in some of their recent tooling, IIRC, so I'm not sure this is an area that package managers can't innovate in.

Most languages only allow a single version of a package to be used at a time, process wide. Node's approach of allowing multiple versions of a dependency, possibly multiple incompatible versions, to all exist and be used in the same process is not so common, and makes the search you describe much more complex.

Have you taken a shot at implementing this, or are you hoping to motivate someone else to do it?

npm is experimenting with approaches to this in some of their recent tooling

That approach still wastes disk space and bandwidth.

Most languages only allow a single version of a package to be used at a time, process wide.

This isn't true with Rust (Cargo) and Haskell (Hackage). Rust uses semantic version (just like npm packages), which has major version indicating backward incompatibility. Haskell also has its own versioning system that support backward incompatibility. What this means is, their compiler has to pick the correct version specified by manifest files.

to all exist and be used in the same process is not so common

What would be the problem?

Another benefit this feature would provide is implementing a package manager should become far easier.

Have you taken a shot at implementing this, or are you hoping to motivate someone else to do it?

* When user executes `require("module")`, if `"module"` is not found in `module.paths`, Node.js should proceed to search in a fixed location (let's call it `~/.node_package_store` until we find a better name) for a `"module"` that matches criteria specified in a manifest file within the project (preferably but not necessary `package.json`).

* `~/.node_package_store` is **not** a `node_modules` and `$NODE_PATH` does not affect it.

* `~/.node_package_store` should support multiple runtime versions (es5, es6, nodejs versions, bundlers, etc.), multiple package versions and multiple registries.

Some notes:
module.paths already contains three global fallback locations at the end ($HOME/.node_modules, $HOME/.node_libraries and $PREFIX/lib/node) from which it will attempt to load modules (i.e. if not found earlier in module.paths):
https://nodejs.org/dist/latest-v11.x/docs/api/modules.html#modules_loading_from_the_global_folders
In practice I'm not aware of any current package managers that would actually install into those locations (e.g. npm is hard-coded to assume the folder is called node_modules). Lack of package manager support is going to be the biggest issue with any changes to the module resolving algorithm.

Packages and modules are not the same and Node.js currently doesn't interpret any package metadata other than the main field (e.g. it doesn't even look at the version field in package.json). It has no idea if a module (e.g. an addon) is compatible before attempting to load it).

module.paths already contains three global fallback locations at the end ($HOME/.node_modules, $HOME/.node_libraries and $PREFIX/lib/node) from which it will attempt to load modules (i.e. if not found earlier in module.paths):
https://nodejs.org/dist/latest-v11.x/docs/api/modules.html#modules_loading_from_the_global_folders

They are basically node_modules with different names.

Lack of package manager support is going to be the biggest issue with any changes to the module resolving algorithm.

Once node add this feature (behind an experiment flag), I'm pretty sure package managers will start supporting this.

Packages and modules are not the same

I've updated my comment, thanks for pointing this out.

Node.js currently doesn't interpret any package metadata other than the main field (e.g. it doesn't even look at the version field in package.json).

Node.js doesn't have read version field in package.json, version value can be in package path. However, Node.js still has to read dependency list from a manifest file (which might be package.json or a lock file).

It has no idea if a module (e.g. an addon) is compatible before attempting to load it).

I've removed "multiple runtime versions (es5, es6, nodejs versions, bundlers, etc.)" part from the first comment.

I'd say the general consensus is that even reading "main" was, in retrospect, a mistake. It's unlikely Node.js will start parsing more of package.json, or any other file.

The current system works well enough; your proposal is at best a marginal improvement, worst case it's a regression because it slows down the common case. A lot of effort has been sunk in making the module loader _fast_.

worst case it's a regression because it slows down the common case.

Node.js can hide this feature behind a flag. Even when enabled, Node.js will always try to load from module.paths first. Beside, it's not like there's anyone with a sane mind would try calling 1000 require() in a for loop.

I think you might be underestimating how many modules some apps require. :)

One I work on loads over 1,400 modules at startup and it's not even _that_ big and enterprise-y.

@bnoordhuis

I think you might be underestimating how many modules some apps require. :)

One I work on loads over 1,400 modules at startup and it's not even _that_ big and enterprise-y.

An act of loading a module comprises of 3 steps: (1) resolving module path, (2) reading module file into memory, and (3) "eval"-ing content of the file. How significant can step (1) be compared to the rest?

@sam-github

Have you taken a shot at implementing this, or are you hoping to motivate someone else to do it?

I will try creating a loader in form of a npm package as a proof of concept when I have time. I will not touch the Node.js repo itself.

@KSXGitHub you may be interested in Yarn's 'Plug and Play' feature (RFC) which effectively turns Yarn's package cache into the central package store, via use of a custom resolver.

@edmorley It is yarn-specific, and it requires changing require.resolve algorithm either by adding code to enable pnp (which is limited) or changing require.resolve itself (which is what this issue about).

what about the use case for npm link, or being able to edit files locally on disk without affecting other projects? Certainly for the simple cases you could use the package manager to install a copy in node_modules, but when the package you wish to edit is a singleton or part of a plugin ecosystem (like that of Resct, eslint, babel, etc), what happens?

what about the use case for npm link

npm link is to link local packages that isn't in npm registry and thus only works in one machine. Share your repo over GitHub and it won't work on someone else's.

npm install does not invoke npm link.

or being able to edit files locally on disk without affecting other projects?

This feature does not prevent user from using good 'ol node_modules.

again, though, that all seems like things the package manager, not the platform, need to address. node already has a mechanism to require from a central place, if the package manager installs there.

@ljharb

node already has a mechanism to require from a central place, if the package manager installs there

That central place does not support multiple versions of the same package.

ah, that’s a fair point.

That's still just a packaging thing... if i npm i thing@1 then it can create ~/.npm/packages/thing/1.0.0/... and link it locally. then if i install thing@2 somewhere else it could create ~/.npm/packages/thing/2.0.0/... and link it.

The key here is that versions are tied to distribution, not loading. Node itself doesn't know what the thing its loading is version 1 or version 2, nor should it need to. These systems always end being tied to the package manager, not to the runtime.

@devsnek

The key here is that versions are tied to distribution, not loading. Node itself doesn't know what the thing its loading is version 1 or version 2

Versions in Cargo (Rust) and Cabal (Haskell) are tied to loading. It is Node that is being unconventional. Node can read manifest files to learn about versions.

nor should it need to.

...unless there's gain in doing it.

Versions in Cargo (Rust) and Cabal (Haskell) are tied to loading. It is Node that is being unconventional. Node can read manifest files to learn about versions.

Cargo and Cabal are not the language, they are package managers. the rust compiler doesn't know or care about the version being used, it just uses whatever linking information cargo gives it when you use cargo build. it's the same story with ghc (haskell), which just grabs whatever module happens to be there, while cabal actually deals with the versioning of it. This is the same story with java, python, c++, c, etc.

wastes disk space

When I first used Node.js, I really liked how once I'm done with a project, I can just delete the whole directory and get rid of all the modules I installed for it in one swoop. It felt like a huge improvement compared to Python, where you can easily end up with dozens of globally installed packages that you can't remove because you don't know if some script somewhere depends on them.

So your argument about disk space is a double-edged sword. Sure, you will have multiple copies of the same module if you have many projects depending on it. But I'd rather have many copies of a module that are actually used, rather than a pile of globally installed packages that might be long obsolete.

It doesn't seem like there is broad acceptance for this proposal. I'm leaving it open for now but unless something significant happens, I'll close it out in a few days.

An act of loading a module comprises of 3 steps: (1) resolving module path, (2) reading module file into memory, and (3) "eval"-ing content of the file. How significant can step (1) be compared to the rest?

@KSXGitHub Enough that several people (including yours truly) invested plenty of time in making it faster. Take a look at the history of lib/module.js and src/node_file.cc; the commit logs are informative.

@seishun

  1. You are not obligated to use the global store. You can use node_modules if you want to.

  2. If you use latest version of npm, chances are you forget about npm caches. Same goes for yarn. If you don't mind caches, why do you mind global repository? If you can delete caches, what prevents you from doing the same to global repository?

Because you’re not requiring from the cache (also, the cache has existed for many years; it’s not new in either npm or yarn) - the cache just makes installs faster. This makes the cache always safe to delete, since it can’t possibly have any negative effect except making the next installs take longer.

Per my previous comment, I'll go ahead and close this out.

Was this page helpful?
0 / 5 - 0 ratings

Related issues

danielstaleiny picture danielstaleiny  Â·  3Comments

mcollina picture mcollina  Â·  3Comments

sandeepks1 picture sandeepks1  Â·  3Comments

dfahlander picture dfahlander  Â·  3Comments

danialkhansari picture danialkhansari  Â·  3Comments