Node: Consider methodologies for reducing the number of files within `node_modules`

Created on 16 Aug 2017  路  20Comments  路  Source: nodejs/node

At the moment, node_modules contains a comedic number of files for the majority of projects, so much so that /r/programmerhumor has started to make hundreds of posts that make fun of this:

Although the picture is a joke, I do think it touches on an important issue. Atom, for instance, contains nearly 40,000 files in the latest version. A cross-platform calculator dwarfs that size at nearly 200,000 files. It's getting so obscene that a friend of mine actually ran out of Inodes on his EXT3FS system (which limits the number of files to 2.98 million) just by installing too many node projects (48...).

I may be missing something, but I can't see many places where solutions have been discussed. Here are some suggestions my friend and I had:

  • ASAR (an electron format that allows random access and acts as a zip file without compression). Since most OS' (at least, Windows in testing) are much faster reading less larger files than so many small files (I'm getting roughly 700Kbps vs. 240Mbps for copying small vs. large files), this solution may even speed up load times of modules.
  • Alter how require works to support versioning natively using the package.json file. Contrived example:

    • package.json still contains version numbers

    • All packages are kept at the top level, syntax of like package-name_1.2.3

    • require('package-name') looks into package.json to find the allowed semver versions & uses that version, vastly reducing the number of duplicates.

Neither of these solutions are without problems though.

(Side note, not sure whether this is better to post on Node or NPM. Opted for here since it's an issue that affects the whole language and several solutions would require the support of the Node contributors)

module

Most helpful comment

Efforts have started in making a unified archive format for both web and node in https://github.com/WICG/webpackage with an upcoming Internet Draft once things finalize. Feel free to add to https://github.com/WICG/webpackage/issues/33 if you have concerns.

All 20 comments

ASAR (an electron format that allows random access and acts as a zip file without compression).

I know @bmeck has given loading archives a bit of support. I, for one, would love to see that supported by Node.

Alter how require works to support versioning natively using the package.json file. Contrived example:

I think that would not involve Node, only package managers; and I think pnpm already comes close to what you want, if I understand you correctly.

In the browser, due to the constraints around file size and processing efficiency, assets are generally minified and bundled into a single js file before being sent to the client.

On the other hand, for a nodejs application I haven't seen any kind of compilation step for performance optimization before and would be interested to to hear what people think about this. The bundling could be achieved by something like webpack.

As far as I understand, machines read horizontally much better than they read vertically so a bundling step could see a reduction in overall file count and size. Therein could also lie a performance benefit.

One critical shortcoming of ASAR, as far as I can tell, is that it doesn't support native modules. Most big node apps include a few so that's a pretty big downside.

In the Java world, they solve it by extracting the shared library to a temporary location but that doesn't always work (e.g., read-only fs) and is a security risk when the temp dir is world-writable.

Firefox at one time implemented its own ELF loader for that reason (still does, I think) but that's a big maintenance burden.

You can do clever things on Linux with memfd_create() and dlopen("/proc/self/fd/%d") but alas, it's not portable to other platforms.

Efforts have started in making a unified archive format for both web and node in https://github.com/WICG/webpackage with an upcoming Internet Draft once things finalize. Feel free to add to https://github.com/WICG/webpackage/issues/33 if you have concerns.

@bnoordhuis I would say that the Java solution is fine and can be augmented by installation tooling. I spoke to @groundwater on this a long while back.

One thing I'd like to see package managers do is to encourage authors to not include unrelated files in their packages. Many (if not most) packages on npm include unit tests and other fixtures, which are totally unnecessary at runtime and which contribute to long installation times.

I use files to include only the bare necessities in my packages, but it seems knowledge about this feature is rather limited among the broader community. Maybe npm could print a warning if neither files nor .npmignore is used during publish? Something along the lines of

warning: your package includes 500 files. Use the 'files' option in package.json or create a `.npmignore` in your project directory to define or limit which files get published.

@nodejs/npm

@silverwind, I already opened an issue about that some time ago: https://github.com/npm/npm/issues/7553

@rubennorte Interesting thread, sad nothing was done about it. Seems like there were some exceptionally strong points to implement it and few (if any?) drawbacks. It was mentioned that the NPM maintainers did not agree with the solution, could you or @sindresorhus shed some light on why this is the case?

One thing I'd like to see package managers do is to encourage authors to not include unrelated files in their packages.

I kind of agree with this sentiment. Right now if you do make install, it copies 4100 files (!), 4000 (or 96%) of which are npm's and its ridiculous number of dependencies.

FWIW, we use https://www.npmjs.com/package/kthxbai to address parts of that problem. You might find it useful too.

Does anyone knows if npm shares download statistics? That would be useful to find most dependent-upon packages, automatically analyze them for unused dependencies and unncesary files and fix them by sending pull requests.

@Ginden I'd argue that unused dependencies probably make up a tiny percentage of the files. It's just that a lot of people use only one or two functions from each package but they have 50+ of them.

I was considering whether something like an altered webpack would be able to help this by removing unused functions and things when someone does, for instance, npm install --production.

@popey456963 I had a look at npm, probably the most popular Node.js package.

This is code used by me - https://gist.github.com/anonymous/dcfb00ef2b583e658595276813be9e7c

And command used to generate this in folder with package.json with bluebird and lodash dependencies.

npm cache clean --force && rm -rf node_modules && node -r ./list-referenced.js $(which npm) install --silent | grep "\.nvm"  | grep -v .json > required_packages.txt

938 js/json files required.

Though, command

find /Users/michal/.nvm/versions/node/v8.6.0/lib/node_modules/npm/ -type f -name "*.js" | wc

indicates presence of 1299 js files and total of 3390 files. So only 29% of files are required for npm to operate!

All .js in npm dependencies have 7011214 bytes, but 4521382 of that is unused.

List of "unused" js (some of them can be lazily required) files can be found here: https://gist.github.com/870dda439d67752bf2094d13fa31f403

I would totally support an archive- or link-based implementation (though the latter would be out of the scope of node.js core). I recently had to restore a couple of backups of virtualized filesystem when I ran out of inodes, many of which where part of npm packages.

The recently released Node Prune uses a very basic blacklist to decrease the size of packages. It actually has pretty impressive results, but might not be for everyone since it removes source & other files which are sometimes necessary.

I'm wondering what the actual concern is, here. I suspect "there are many things" is less of a concern than other resource usage, but file count tends to be a place holder for other concerns.

Are you worried that ls ./node_modules takes too long? Try npm i --global-style, and you'll only have your direct dependencies in ./node_modules, with other deps nested in there (and flattened).

Are you worried about disk usage? pnpm, as @addaleax said, is a great tool for that, and I have an npm branch that works similarly which might get integrated later -- essentially globally reusing inodes for _individual files_ by hard-linking to the global npm cache. I don't particularly want this to become the default, because you lose some consistency guarantees, but eventually a good compromise would be to exploit CoW filesystems as they become more common to have our cake and eat it too: again, globally deduplicating files! There's some stuff out there (which has already been linked) for deleting unneeded files. Would be nice for more folks to use the files array, and we might end up changing our defaults around that someday to generally reduce package size -- such as ignoring test/ directories unless explicitly included. But that would be a pretty serious breaking change and quite a cheese-move for users.

Are you worried about node itself taking too long to load stuff because there's too many files? I imagine Node might find some more opportunities for optimization over time, though I've generally only really felt the pain of heavy load times when I'm hyper-optimizing startup time, and it's usually _big_ files that really kill perf rather than many small files. It'd be nice if lazy-loading modules was more common, too. Node integrating ESM might even open the doors for even more interesting optimizations as adoption spreads. And, of course, there's always application bundling.

So... I think there's plenty of solutions to be had here even before we even start talking about things like package archives. One thing that I don't think should be included in those solutions is some idea of discouraging small packages: I believe that is an incredibly valuable asset of the ecosystem, and a huge reason for its success. I think we should continue working on tools and other facilities to make it easier to have even more small modules!

It seems like perhaps this should be closed. Feel free to re-open (or leave a comment requesting that it be re-opened) if you disagree. I'm just tidying up and not acting on a super-strong opinion or anything like that.

I think this issue should be given more light.
While it's not entirely node's fault that node_modules has tendency to bloat up, it highlights a core problem that should be considered.
The cause has several factors:

  • package developers bundling useless files
  • npm allowing them to do so
  • node not providing a way to bundle modules into single files

Since the old ages libraries have been compiled to be used into projects without having to carry over the whole source every time. It's true that javascript can't really compile natively, but that doesn't mean the library/module can't be bundled into a single file. The suggestion of using uncompressed archives is actually a valid one in my opinion.

The main problem of node_modules is not the size, but the amount of files. For some reason it seems nowadays people think file allocations to be free, which they are not.
These small files are a big problem: they waste allocation space, pollute the filesystem, make fs integrity checks longer, slow down backup operations, bloat search indexes and take ages to upload if you are unfortunate enough to have to use FTP connections.
Moreover there's also the additional issue that having these files out in the open adds unnecessary overhead both when opening a project and when searching files inside it (unless the user is using an optimized IDE or knows how to ignore the folder, which is not so common).

So, as you see the "many small files" is a real problem with deep ramifications, a problem that can fortunately be solved.
There isn't even need for any complex format, a simple file with a TOC in the header will do the job, like in the old days. Or even building a separate toc and then reading from uncompressed tars.
You can surely come up with the best approach for nodejs.

@AlanDrake Of the three bullet points you list, only the last one is within Node.js core's remit.

I would suggest opening a new issue and linking back to this one to keep the discussion focused.

It would be good to summarize the state of things vis-a-vis ASAR and webpackage in the new issue.

ASAR (an electron format that allows random access and acts as a zip file without compression). Since most OS' (at least, Windows in testing) are much faster reading less larger files than so many small files (I'm getting roughly 700Kbps vs. 240Mbps for copying small vs. large files), this solution may even speed up load times of modules.

馃憤

Was this page helpful?
0 / 5 - 0 ratings

Related issues

filipesilvaa picture filipesilvaa  路  3Comments

dfahlander picture dfahlander  路  3Comments

vsemozhetbyt picture vsemozhetbyt  路  3Comments

willnwhite picture willnwhite  路  3Comments

danialkhansari picture danialkhansari  路  3Comments