Do we have something for specifying the included files for publishing yet?
I just happen to look through some of the files in node_modules for one of my apps and this is what I found:
coverage, .nyc_output.idea.eslintrc, .travis.yml, etctest files.Kind of related to https://github.com/nodejs/package-maintenance/issues/77
We should have some thing that suggest ensuring files exist in package.json and include essential files for publishing.
First of all, "files" is super dangerous - forgetting to add a new file causes breakage. Nobody should ever use it.
.npmignore is fine, but also, imo npm explore foo && npm install && npm test should always work - i consider it a best practice to always ship full tests with every package.
I have no opinion about .npmignore or files except .npmignore should be in .npmignore itself. We should have some mention about being mindful to publish without extra files and npm offers two ways.
I approach this from practical stand point and prefer my node_modules smaller. I think publishing packages without extra files is one of the most overlooked thing and I've installed packages that publish demo images or temp files in the 10s of MB size. I think if there are a lot of tests it's better to leave them, but it's up to individual package owner. The practice I follow is prepublishOnly runs npm test.
The cost of publishing an extra file is slower install times; the cost of omitting a needed file is breakage. There’s no semver contract around package sizes to break.
as a user, I prefer both: faster install and no breakage.
In the better case, sure! But in the worse case, which would you prefer, a fast broken install, or a slow working one?
I'd like to clarify what you actually have issue with.
It's clear you are against files and that test should be published. I am sure we can word it to your satisfaction.
Do you actually have issue with the fact that we should have guideline about the broader topic I raised?
In general, yes - there are very few files I think should ever be excluded from the published tarball, and I'd be concerned that we'd be unable to come up with guidance that wouldn't over-weight "package size" as a consideration.
In other words, I don't think that package authors should be solving this problem - I think npm should (and, with their "tink" project, they are).
OK, let me clarify. The primary motivation for bringing this up is because I don't think files like these should be published:
.DS_Store, .nyc_output, coverage, .idea, .vscodeI don't think tink is solving that.
npm could default to ignore those files, but I imagine that's not something they want to do either.
I think having a guideline regarding be mindful of what files you published make sense.
oh sure. But those should all be gitignored - which makes them ignored by npm by default.
Depending on how you structure your project, you can have files be very safe. Basically, put stuff in lib/ and bin/ and you're grand. And docs/ as well.
Re tests - that's very debatable. I wish there was tooling to _easily_ "switch to use this deep dep to be installed from their git" so that you can get tests, but shipping tests together with the package is an antipattern in my eyes, partly because it complicates security efforts as there is more code to review/analyse - and that is code that's never going to be used in production (yes, it's a tradeoff between dev needs and prod needs). Keep in mind, that the payload for event-stream was hidden inside tests... Either way, because it's debatable, I agree that we can't say that it's a "best practice".
Re other files - as a random one-off contributor I find it very uncomfortable when I need to add .idea to someone's .gitignore... and I know there's people who don't like it... so having _that_ as a baseline would be _very_ nice, but I'm not sure that's going to fly, because the list of common files is very large and it bloats the .gitignores. Instead, I'd really like it if npm actually respected a global .gitignore as well as the local one... There's an issue open for years.
I think @styfle should join this discussion. Mainly because we sit on the opposite side of this fence: he wrote https://packagephobia.now.sh/ and I'm a strict believer that sources, docs, tests and examples are key part of a package. Specifically, I think I should be able to build/test that package without having to open up git. I'm a huge fan of "show source feature" of the 90s web.
Reducing the size of packages is important to reduce the size of the artifacts that we deploy. I think we should build a tool that traverse node_modules and remove not-needed files for production (tests, docs). This would be very nice to build as part of this group.
packages written in ts, some are including the TS source, is that needed?
No, it is not needed. You can publish a just the type declaration file (think header file if you use C++) which is perfect for anyone installing your package.
i consider it a best practice to always ship full tests with every package.
I disagree. I think npm publish should be the minimum build artifacts only. When I run npm install yourpackage, I can't even run your tests because npm only installs the production dependencies, not the devDependencies from yourpackage. And I can tell you from experience, that most people who publish tests to npm do so by accident, not on purpose. That's why I built Package Phobia.
I'm a huge fan of "show source feature" of the 90s web.
Met too! But that's not realistic because people who are publishing to npm are not necessarily writing JavaScript. They might be writing Rust and compiling to WASM and then deploying the WASM to npm. They don't need to deploy the Rust source code, nor do I want them too. Their tests are probably written in Rust too.
Probably a better approach to this "show the source feature" is to somehow enforce a build step when publishing to npm so that the source is guaranteed to match the build artifacts. Something like ZEIT Now Builders. But that's problem/topic for a different thread.
Depending on how you structure your project, you can have
filesbe very safe.
I like files because I can just add files: ["dist"] and only the compiled output is going to be published. However, there is a known bug that is only affecting files but not .npmignore filtering. https://npm.community/t/ds-store-files-show-up-after-npm-publish/831
npm’s tink project will allow you to publish gigabytes if you like, but consumers will only install what they need - so this “phobia” will go away soon anyways, without forcing everyone to hamstring offline usage.
Again, even if you want to restrict what shows up on npm, the “files” field is very dangerous, and i do not want this project to endorse its use in any way.
While a few things are debatable with opinions on either sides of the fence, I hope we can agree on the basic things that a guideline about having the proper setup to ensure some files are not published is a good practice.
The guideline should only refer to the methods npm provides to do this, and avoid having any preferences on files or .npmignore.
Maybe note some gotchas and tips:
.npmignore makes npm ignore .gitignore, which is why some package is publishing .nyc_output and coverage, like this https://github.com/nodejitsu/node-http-proxy.npmignore would publish any unintended files left over, for example temp files.files has a bug https://npm.community/t/ds-store-files-show-up-after-npm-publish/831files ensure no unintended files get publish accidentally, but it could be dangerous, but following a well defined directory structure makes that less of an issuetest is potentially a security attack vector, ie: incident in event-stream.Other than that, I think we would leave the choice of whether to publish things like docs and test, or other files, to the individual package owner and their preferences.
also suggest to run a npm pack to inspect the tgz files content before doing a real publishing. I think there are tools out there that makes the publish process better. Maybe we can suggest one.
If you have 2FA enabled, npm itself now shows you a preview of the file list before actually publishing.
Some existing guidance: https://docs.npmjs.com/misc/developers
Nobody was in today's meeting that had context on it. Pushed to the next meeting, hopefully someone with context can join the next meeting 👍
Personally, I am a fan of _not_ including extraneous source in npm published packages, and having well structured git tags on the source repo, so the corresponding source is easy to fetch. I understand the opposite point of view, there are some pros and cons both ways, but I'm not sure there is package size and install time is an issue, and some packages have incredibly large sets of test data, I've been pretty astonished and the kinds of stuff that shows up in packages when exploring node_modules size on disk. Install time can be pretty brutal, too. These are pains that I feel often, and I can honestly say I've never wanted to run one of my dep's tests unless I was intending to modify them -- in which case I checkout the repo. I suspect many packages do publish their tests... but I also suspect that isn't a conscious choice, its because it all gets published by default.
Perhaps the guidance should be clear that if tests are published, they should be runable from a package install after an install of their dev deps, and that if tests include 100s of MB of test data, consider not publishing them.
@sam-github although i disagree, that’s fine - let’s just advocate for npmignore in that case and not “files”.
I agree with your last paragraph regardless.
Yes, I agree on .npmignore, its more robust than files.
Its particularly important to have because with transpiling, .npmignore defaults to .gitignore, which is the right choice for npm, but the wrong choice for anyone working with transpilers, because transpiled output should be git ignored, but MUST be packaged, and won't be by default.
I think there are areas where there is consensus (but still a few will disagree with the consensus viewpoint), and areas where there is not (yet?) consensus, and including tests is the latter category.
Even with disagreements on what to do, I think its possible for the guidelines to lay out the pros and cons. Its more satisfying to tell people "this is the right way, do it" than "there is disagreement on this", but even in areas of disagreement we can give some information. Perhaps particularly in areas of disagreement we should give information, because it might be harder for people to figure out why there isn't agreement yet, and waste time surfing blogs to understand the different points of view.
I prefer including the tests. They are immensely helpful as a form of docs!
I think laying out pros and cons is important. I would also recommend to create some sort of prune utility - test are not needed in a docker container after all.
@mcollina is it that people find in easier if the tests are there after the npm install? I'm trying to understand the value as the first thing I do when looking to understand an npm is to go directly to the github repo....
@mhdawson when you npm install before, say, getting on a plane, it’s very useful to be able to explore the full code without having internet access. Also not every package has a GitHub repo, or any repo - and some packages (like indexof) used to have one that was never linked from the package.json but has since been deleted - npm is forever, GitHub is not.
All of what @ljharb said. npm does not support publishing a source package and build package. It would be great to have such a support tbh.
There are a lot of valuable items but the initial ask does seem to have expanded. I take on board the comments about tests. Perhaps we should split those into a different issue. I'm actually more inclined to think that GitHub is forever. I am thinking on @ljharb comment about wanting the code. If this is to be the case we have to have a policy on code that is run through babel where only the dist folder and not the src folder would make it to the package.
? My whole desire is to not recommend just the dist folder from being published; i want both src and dist
(Separately tho, GitHub can’t be forever because repos can be deleted, always, but npm packages can’t be unpublished)
Understood. Then we need to write this up.
It looks like, as with many things, this is more complex than initially thought. Initially
@jchip asked about baseline files that should be ignored. These would include the known suspects
...
But when we examine the issue at a higher level from @ljharb he is saying that it would be
a good idea to include a number of other things in the package such as
@ljharb also makes the point that a github repo may be deleted but essentially a published
package cannot.
There does not appear to a middle ground upon this subject because we are either minimalist or
not.
I think we need to decide on the following
If we were to take on a package with no repo, we would create one and would then own it. So that really
leaves the question of are we minimalist or not.
Given that package phobia was created by @styfle to highlight non minimalism we need to address which side of
the fence we are on.
I agree that it is very handy to have all the code and tests in the package and on that front @ljharb most
likely suffer from the same issues when in transit. That is, we have to go to github to get the rest of the
useful code.
So the question is, do we want to impose and make go to github to get the rest of the useful code?
I think we should encourage packages to have a GitHub repo, surely. I think we should also only suggest npmignore for suppressing files.
I’d like to see us encourage a default of minimal suppression (i find package phobia to be spreading FUD; it doesn’t accurately report on alternate entry points or tree-shaken size, as well), and explain the tradeoffs if authors decide to suppress source, tests, etc.
@ljharb I think FUD is inaccurate and insulting.
Package Phobia is measuring a metric that was previously ignored. This metric is clearly stated and the README compares prior art and differences in other tools.
https://github.com/styfle/packagephobia/blob/master/README.md#prior-art
If you think there is a bug in the alt entry point, then please create an issue.
But it sounds like you would be more interested in a different tool, Bundle Phobia, which is measuring a different data point.
@styfle i apologize, i did not intend to insult. Specifically, I think that the size of packages doesn't actually matter - the only thing that matters is the final bundle size impact, which can only be measured at the app level, not the package level. I appreciate that there's many different metrics and many tools to measure these metrics, but that doesn't mean all of these metrics are equally valuable, nor that some of them aren't harmful overall to the ecosystem.
I'm hoping this satisfies both sides.
The maintenance team recognises that there are two schools of thought in what should be included in the released package. The two schools being minimalist and fully-inclusive.
The deployed package should contain only what is required for it to actually function. All the non executing files should be in the source code repository. If this approach is taken it is the maintenance team's opinion that many of the extraneous files and folders that end up in packages do so by mistake. Examples of this would be
Suitable use of the .npmignore and or the package.json files property shall be made to ensure these files and folders do not end up in a package by mistake.
For the minimalist approach the maintenance team recognise the advantages of using the package phobia tool to measure the size of the package.
It is also recognised that there are considerable advantages to having all the source code, tests and all other information that could assist in debugging a module included in the package. In the case of transpiled code both the src and dist folders would be included in the package along with test cases.
When a module is adopted by the package maintenace committee a decision shall be made as to whether the built module shall follow the minimalist or fully-inclusive model.
The decision will be made with the package owners and the maintainers on the package maintenance team. If the module was intened to be minimalist but some files crept in by mistake they will be removed. However if the adopting team believe that fully-inclusive approach should be followed decisions as to how much shall be included will be taken.
@ghinks under "suitable use", it'd be great to talk about the tradeoffs of npmignore vs files, as well.
I'd also like to mention the caveats for package phobia; namely that it doesn't measure non-"main" endpoints, and its "bundle size" metric doesn't account for deduplications across your graph, nor for bundler-time optimizations.
(i can't PR review that comment, but please also use the oxford comma throughout :-p )
@ljharb You quoted "bundle size" but that wording is not used anywhere on https://packagephobia.now.sh
It does account for deduplications the same way npm does because it runs npm install.
It doesn't measure bundle-time optimizations because thats what https://bundlephobia.com is for. These are different tools measuring different things.
Additionally, you haven't explained what "non-main endpoints" means.
It doesn’t account for your entire graph, because you don’t have the entire app’s graph available when you run npm install.
I mean, any deep import that isn’t in package.json’s “main” field.
@ghinks I would provide some edits on the above text.
For Minimalist approach
test and docs should be excluded. The goal is to ship only the minimum amount of files that are necessary to use the lib.
For Fully Inclusive
Some non essential files should be excluded, as an example:
The goal is to provide a full development environment out of the box (npm install && npm test should pass). It includes test, documentation and examples.
I agree that both approach are valid.
I prefer both. When developing, I'd like full docs and tests available. When deploy to production, the minimalist.
There are different solutions with tools to achieve the minimalist path, through bundling, minification, etc, but that's not ideal and there isn't a clear winner for doing it (correct me if there is).
Generally, I steer clear of any package that doesn't have a link back to a public repo, but as some valid points made regarding github being not permanent and mutable, even a public repo may mean inconsistent docs or code.
I think the goal is two fold:
| discussion point | Elaboration and work to do |
|------------------|----------------------------|
| Github repos can be deleted| When a package is adopted by the maintainers there must be a github repo. The only viabale solution is to own the rights to deletion or keep a synchronized fork of the repo. If an adopted package contained source code and the github repo was already deleted then the maintainers would create the repo. Without removing rights from the originators of an adopted package the originators may accidently delete it. A solution is a synched fork. This is quite common in the bitbucket enterprise environment. But is not provided by github. I am not sure if anyone has a connection with github or someone knows of some tool to do this. But keeping a synced fork would avoid this issue.|
| the final bundle size impact at the application != package size | It is recognized that the final impact of any package in a published application may be quite different from the published size. |
| Tooling | We should develop a tool to trim down from either a fully inclusive or minimalist package. This tool would incorporate the best practices we have. It would be capabale of both trimming down and identifying and removing unwanted and frequently overlooked files such as logos, CI integrations and configuration files. This would work in either minimalist or fully inclusive cases. It would be incorporated into our practices. The goal is for a simple npm install & npm test to work out of the box for a fully inclusive package|
| minimalist and fully inclusive | It may be best to develop package fully inclusive mode initially, and publishing it as such, then at a later date publishing it as a smaller minimalist package is also an option. But not dictating that this must happen |
| .npmignore file and files property | A full discussion of the use and differences between the .npmignore file and the package.json files property will be made. Although this information is available in several places we shall provide links to and make discussions of this file and property within the context of our best practices. |
For this item(164) I'm not quite sure where to go with it next as a number of similar items are now open. I would like to talk about it at the next meeting. Plenty of interesting stuff, we just need to decide how and what to document into our set of guidelines.
I am going to go back to the original intent for this issue as raised by @jchip which is advice on which files should be ignored. I will open separate issues for the discussion points
these items I summarized above and we shall come back to them but Joel’s initial intent was just to get a list of regular files such as .vscode and log files etc that should be ignored. My intent is to look at the existing github git ignore repo and see what additional files should be ignored. It may even be a good idea to raise a PR against the node gitignore that is being used.
From meeting: Maybe we give people pros and cons of choices, multiple best practices (plural) - offer different perspectives of different solutions. A blog post and updates to NPM docs? The blog post asks the question, doesn't necessarily provide The Solution. We could outline several use cases and examples for different paths.
agreed @craftninja . I am looking at using the gitignore.git from http://github.com/github/gitignore as a base ignore with modifications and adding, as you said a number of options, I am going off grid for 10 days though.
Since you mentioned http://github.com/github/gitignore maybe I can plug this module I maintain.
https://www.npmjs.com/package/create-git
The module uses those gitignore templates and lets you select multiple. If we want to make a "best practice" for .gitignore in node packages I agree we should PR there, and then I can add it as the default for this package. This might be the start of the "tooling" we want for this as well.
agree with the blog post showing the different solutions. Showing the pros and cons of each solution is a good compromise on the different schools of thought.
while i don't like to compare node modules to maven packages, there is something nice about being able to "go download the source" of a particular package, at least in development
From discussion in the meeting this week, sound like the next steps might be a PR for the guidance and then a blog post to publicize but the guidance might be more around discussing the options versus making a specific recommendation.
I like the blog post idea, and would support it further in hosting (or cross-posting) it on the official npm blog as a guest post by whoever wants to author it.
@jchip will you have time to put together a PR, authoring a blog post?
@mhdawson yes, I can work on it next week (end of summer, which was usually busy for me with kids).
Close as PR has landed
@mhdawson Can you link to the PR and the final web page?
there are 2 PRs about this issue: https://github.com/nodejs/package-maintenance/pull/178 https://github.com/nodejs/package-maintenance/pull/243
and the final files are:
https://github.com/nodejs/package-maintenance/blob/master/docs/drafts/PUBLISH-GUIDELINES.md
https://github.com/nodejs/package-maintenance/blob/master/docs/drafts/PUBLISH-BLOG.md
but I'm not sure the blog post has been posted jet
When I use an external dependencies I like it to be as small as possible. I want to see more I can always use npm info ws repository.url. But there are problems when git and npm are out of sync, so at the end including everything in the npm package is also ok.
Most helpful comment
While a few things are debatable with opinions on either sides of the fence, I hope we can agree on the basic things that a guideline about having the proper setup to ensure some files are not published is a good practice.
The guideline should only refer to the methods npm provides to do this, and avoid having any preferences on
filesor.npmignore.Maybe note some gotchas and tips:
.npmignoremakesnpmignore.gitignore, which is why some package is publishing.nyc_outputandcoverage, like this https://github.com/nodejitsu/node-http-proxy.npmignorewould publish any unintended files left over, for example temp files.fileshas a bug https://npm.community/t/ds-store-files-show-up-after-npm-publish/831filesensure no unintended files get publish accidentally, but it could be dangerous, but following a well defined directory structure makes that less of an issuetestis potentially a security attack vector, ie: incident in event-stream.Other than that, I think we would leave the choice of whether to publish things like docs and test, or other files, to the individual package owner and their preferences.