Yarn: Yarn --production tries to resolve devDeps as well

Created on 14 Jun 2017  路  50Comments  路  Source: yarnpkg/yarn

Do you want to request a feature or report a bug?
This is a bug, or unexpected behaviour.

What is the current behavior?

  1. Have a package.json which have devDependencies and one of them are private
  2. Run yarn --production
  3. Yarn tries to resolve all packages and fails, because have no access to private packages

To fix that I need additional workaround with authToken (the problem arises in docker build process).

What is the expected behavior?
Yarn should only resolve production packages when --production flag exposed.

Please mention your node.js, yarn and operating system version.

node -v
v8.1.0
yarn --version
0.24.6
cat-feature help wanted needs-discussion triaged

Most helpful comment

I love Yarn but this has repeatedly been raised as an issue since last year and is always closed. Surely this is a common use case to have access to different packages in production and development?

My own situation is that I'm using a private Git dependency in development and in CI/production I am using a private module hosted on NPM. I don't want to give my CI system or production systems access to GitHub - Yarn should not require access to that dependency. If NPM can figure out a solution that works, so can Yarn.

For now I am going to have to switch back to NPM.

All 50 comments

This happens because Yarn needs to consider the devDependencies so that it can produce a deterministic build between prod and dev builds (packages will be hoisted to the same locations).

Not sure I got it... lets consider two situations:

  1. (rm -rf ./node_modules) We have package.json with deps and devDeps, and we run yarn --production
  2. (rm -rf ./node_modules) We remove all devDeps from that package.json and run yarn

Will there be the difference in node_modules? If yes, why?

Actually, I think this is related to my another problem https://github.com/yarnpkg/yarn/issues/1379

Possibly. Consider something like:

dependencies {
  [email protected]
}
devDependencies {
  [email protected]
}

And A also has a dependency:

dependencies {
  [email protected]
}

With devDependencies, the dependency tree is:

[email protected]
|- [email protected]
[email protected]

so A@1 can not longer be hoisted to top level, resulting in:

node_modules/
|- A/ (1.0.0)
|  |- node_modules/
|     |- B/ (1.0.0)
|- B/ (2.0.0)

Without devDependencies, the dependency tree is:

[email protected]
|- [email protected]

But B can be hoisted to the top level, so the result is:

node_modules/
|- A/ (1.0.0)
|- B/ (1.0.0)

(This is non-deterministic because B@1 could end up in 2 different paths; either in node_modules/B or node_modules/A/node_modules/B depending on the type of install)


What Yarn wants to do for a deterministic build is to make sure that [email protected] is installed to the same location every time, so a --production build should notice that [email protected] will be hoisted to top level _if_ the devDependencies were there, so the resulting install should really be:

node_modules/
|- A/ (1.0.0)
   |- node_modules/
      |- B/ (1.0.0)

I hope that makes sense :) Basically all the devDependencies are built into the dependency tree when figuring out where to hoist packages to, and calculating that dependency tree needs the metadata for all those packages.

@Diokuz I haven't tried it myself, but you might see if NPM5 has the same issue. If I understand the NPM5 lockfile correctly, I think it saves the resolved / hoisted position for each package in their lockfile (Yarn does not), so it might not need to query the metadata for your private repos. As much as I hate to steer anyone toward NPM, this might be a case where NPM5 works better. It's worth a test anyway!

You might also be able to solve this using the Yarn offline mirror, since the metadata and package would be saved there and shouldn't have to query the actual server. However that might have other security concerns for you, since a copy of your private repo code would be in that mirror directory.

@rally25rs thank you so much for explanation!

So, I think I get my answer. The only question now: will that be done in near future? I mean, saving positions in lock file.

It's not only a question of position. Let's say you have this package.json:

{
    "dependencies": {
        "a": "^1.0.0"
    },
    "devDependencies": {
        "b": "^1.0.0"
    }
}

And that the following are true:

If we don't resolve devDependencies, then in prod we'll have the following:

/node_modules/
/node_modules/a (@1.1.0)

And in dev, because of the deduplication optimization, we'll have the following:

/node_modules/
/node_modules/a (@1.0.0)
/node_modules/b (@1.0.0)

Note that we end up using a different version of a in prod than in dev - that might not be good.

Now that I think about this, maybe in practice it could work, since the lockfile would still pin the version - at least assuming that this lockfile has been generated in dev mode.

Still, I'm not convinced this is an issue worth fixing right now. I feel like this is something that could probably be better solved through your build environment.

Well, we have solved our problem via NPM_TOKEN env var. But if _npm_ is better here, why dont you want to make _yarn_ even better in near (or far) future?)

Anyway, thanks for answers)

@arcanis We don't control the build environment. For example I am using Netlify which does not have access to private dependencies through github (not npm private registry, so I can't do something like NPM_TOKEN). I thought it would be ok to move the private deps to devDependencies but yarn in production mode still tries to resolve it. The lock file should be enough to determine the positions of the packages.

Edit: Just tested npm5. It does not need to resolve devDependencies

I love Yarn but this has repeatedly been raised as an issue since last year and is always closed. Surely this is a common use case to have access to different packages in production and development?

My own situation is that I'm using a private Git dependency in development and in CI/production I am using a private module hosted on NPM. I don't want to give my CI system or production systems access to GitHub - Yarn should not require access to that dependency. If NPM can figure out a solution that works, so can Yarn.

For now I am going to have to switch back to NPM.

Hey, bumping into this issue at well.

yarn --version
0.27.5

How to reproduce:

  • Create an empty folder, cd into it
  • Create a package.json:
{
  "devDependencies": {
    "doesnt-exist": "file:./nope"
  }
}
  • Run: yarn --production

npm i --production does work.

Have any of you tried using --prefer-offline or --offline? That should work IMO.

Just tried, doesn't seem to work.

@BYK Opened a PR for this fix: #4210

That PR has been closed without merging, any idea when a new one might make it onto the schedule?

The issue is slightly complex. I believe the solution could be attaching metadata of sorts to the lockfile, but I defer to @arcanis @BYK

The thing is, we still need those package.json files to have a consistent image of the resolved file tree, even if those dependencies won't be installed.

An alternative would be to use the offline mirror feature that would include the dev dependency packages or, their stripped-down versions which only contains the package.json files.

@olingern's suggestion also sounds good but I don't know what the implications for that are. @kaylieEB, @arcanis, @bestander - any thoughts?

I would imagine we could add a "node_modules path where the packages is hoisted to" in the metadata. I think that would eliminate the need to re-query the metadata to build the hoisted dep tree. To me the bigger issue is compatibility. Maybe this becomes something for a yarn v2 roadmap?

I think there are a lot of ways to go around this issue: linking, workspaces, fake offline mirror + tweaking yarn.lock.
And this issue is IMHO not common enough to justify some significant changes to package resolver which is the core of Yarn.

At this stage a better solution is to document the easiest way to go around this limitation and send a PR to our blog https://github.com/yarnpkg/website/tree/master/_posts

I have found several people with this problem, which doesn't occur on NPM. Sometimes you have a local dependency which you don't want to include in production, and you have to use NPM on the production env (docker or such), which can lead to problems, caused by the differences between Yarn and NPM. Could you please ELI5 why this works on NPM but not on Yarn? Thanks!

Yarn is designed to produce the same results on all computers, it verifies that all dependencies are present and fold/deduplicate the same way.
E.g.

A -> B -> C@^1.0.1
  -> (devDependency) D -> [email protected]

Yarn will try to install A, B, C, D flat and resolve versions ^1.0.1 and 1.3.0 to a single version.
If production build ignored D branch it might have fetched a different version of C.
Before Yarn people ended up having different non development dependencies (A, B, C) in dev and production mode.

That is why Yarn requires for all the four to be present at resolution time.
There are ways around this that would satisfy your case without making changes to Yarn, e.g. use a file: or link: specifier for the local dependency in dev mode and provide a fake one in your production environment.

Thanks for the explanation!
Wouldn't resolving devDependencies last (quite a naive approach) be a solution? This way, no dependencies would depend on devDependencies.

The order does not matter much, Yarn still tries to de-duplicate deep dependencies (C in the example above) and move them to the root of node_modules.

I read this post a few weeks ago, which states that "npm 5 has stronger guarantees across versions and has a stronger deterministic lockfile". Then how can it be than it works without resolving devDependencies? Core implementation? (I'm trying to fully understand why the problem can't be "fixed" here. I understand there are workarounds which can be used to make Yarn do the job)

@Zephir77167, this is a great discussion to have but this issue is not a place to have it.
Feel free to jump into the chat https://discordapp.com/invite/yarnpkg and discuss it with the community.

Alright, thanks a lot for taking the time to answer!

still seeing this with v1.3.2

OK, but why include devDependencies in the --production dependency tree to begin with?? Isn't the whole point not to include/resolve ANY of those devDependencies and their dependencies? I don't care about hoisting devDependencies' dependencies.

Is your reasoning as such:
"Since devDependencies' dependencies are hoisted in development, we want to ensure consistent behaviour in a production environment by also hoisting those same dependencies, even if their parent modules aren't installed/used."
???

@heisian Yeah, your statement is basically correct. It's to produce a deterministic node_modules hoisting tree, which is one of Yarn's key features. I wasn't part of the original team that created Yarn and implemented all that, so I'm not sure if having to re-resolve devDeps was an actual design decision, or just a side-effect of the implementation. There has been talk of putting the hoisting tree location in yarn.lock which would mean Yarn would no longer have to resolve those devDeps, but that would change the structure of the lock file, so would likely warrant a major-version change (yarn v2) and hasn't been worked on yet as far as I know.

@rally25rs gave a good answer.
I'll just add a few notes.

  1. The reason why devDependencies are important in --production install is so that you have the same set of dependencies both in your dev environment and production environment.
    For example, you can have [email protected] in devDependencies and left-pad@^1.0.0 somewhere deep in sub-dependencies.
    Yarn would resolve it to the same version 1.0.1 both for dev and production builds and you won't have surprises with an unintended version bump.

  2. Calcifying hoisting tree in yarn.lock seems to have quite a few disadvantages.
    But we were talking about trying this as an experimental opt-in feature, feel free to fork and propose it.

OK, thank you both for the additional clarification.

My main use case for hoisting tree in yarn.lock would be to reduce Docker build times and image size. A couple of the Webpack plugins we use compile binaries at install, leading to long builds times if devDependencies are resolved.

Since yarn --production is working as intended, the root cause then is that our clients and servers share the same repo, and we use devDependencies as a dumping ground for all client-related build dependencies. If we opted to split their package.jsons up this would be a non-issue.

thanks again.

Got it.

So Yarn should only resolve (download) the dev dependencies but it should not be linking (running install scripts) them. Is it what you observe?

Technically Yarn should not need to download the dev dependencies during --production install if you already have an up-to-date yarn.lock file.
Quite possible that we have a corner cut here and Yarn fetches and unzips those dependencies even though they are not copied to node_modules at the final install.
If that happens then it is a great opportunity to improve Yarn here, please go ahead and send a PR.

Here are a few ideas that might help your situation:

  1. Have you tried offline mirror feature and committing the .tgz files?
    Even if you have 1000 dependencies total, it should be around 30MB on disk, maybe that would make build times faster.

  2. What if you production app is a workspace and you dev dependencies is a workspace root?
    Then you could deploy to docker only the prod workspace.

  3. You could strip devDependencies from package.json before sending your app for prod build.
    Yarn.lock will still be used but the missing deps will be ignored.

I agree with (2), could be a clean and definitive way to go an help with our own file structure.

(3) is also a good-sounding (and very easy) option.

I've not tried the offline mirror feature before, perhaps I will take a look at that.

I think I am observing the install scripts being run, b/c node-sass will go through compilation with node-gyp, and my resulting Docker image size balloons to ~180MB, whereas if I strip the devDependencies altogether, there is no compilation and the image size is around 68MB.

I will see if I can start down the path of finding out what is causing this and make a PR if I do. thanks for the suggestions! :]

This is still an issue, and can even stop the installation when a dev dependency fails to resolve (e.g. flatmap-stream).

Repro:

git clone https://github.com/thelounge/thelounge
git checkout v3.0.0-rc.5
yarn --production

Which ends up with:

yarn install v1.12.3
[1/5] Validating package.json...
[2/5] Resolving packages...
[3/5] Fetching packages...
error https://registry.yarnpkg.com/flatmap-stream/-/flatmap-stream-0.1.1.tgz: Extracting tar content of undefined failed, the file appears to be corrupt: "Unexpected end of data"

flatmap-stream along with event-stream are only required in a devDependency, so there is absolutely no reason as to why Yarn should try to resolve them.

I ran into this in a similar scenario. My build server has an old version of node which isn't compatible with eslint 5 (which is a devDependency), but yarn install --production still fails with

error [email protected]: The engine "node" is incompatible with this module. Expected version "^6.14.0 || ^8.10.0 || >=9.10.0".

@mkopinsky This doesn't have anything to do with production. You need to give your package.json a range for valid node versions if your build server is running a different version of node.

I would recommend being consistent, though.

https://yarnpkg.com/lang/en/docs/package-json/#toc-engines

From the docs:

{
  "engines": {
    "node": ">=4.4.7 <7.0.0"
  }
}

I'm not sure we're understanding each other.

eslint's engine constraint is totally fine. My issue is that I wouldn't expect the engine constraint of a dev dependency to block a production install.

I think this is purely subjective and doesn't take the overall architecture in mind. I'm tempted to close as wontfix.

Giving it a second thought the problem you mention is a bug, @mkopinsky, but it's a different topic than what is discussed in this thread. You should open a new issue.

I ran into an issue related to this, since yarn tries to resolve devDependencies when installing in production it fails when the devDependency is not resolvable. This happens if the devDeps are part of a unpublished mono-repo but aren't needed anymore after the compilation is done and a package published.

I second a vote for embedding this information in the yarn.lock file so devDeps are required to be resolvable.

This is a really big problem when deploying to Heroku because bundle size is capped at 500mo max and some devDependencies are really big.. Can we get rid of the devDependencies installation ?

@billouboq

This is a really big problem when deploying to Heroku because bundle size is capped at 500mo max and some devDependencies are really big.. Can we get rid of the devDependencies installation ?

The issue is not installing devDependencies in production mode, it is merely resolving them. If your Heroku instance is trying to install devDependencies, make sure your NODE_ENV is set to production.

@akshetpandey

I second a vote for embedding this information in the yarn.lock file so devDeps are required to be resolvable.

This should already be the case for the initial creation of the lock file. After that, you can run Yarn with --prefer-offline and --frozen-lockfile options to skip another resolution.

@BYK --prefer-offline --frozen-lockfile doesn't work :(

@BYK By "it is merely resolving them" do you mean it is "adding them to node_modules"?
Because for most packages that indistinguishable from installing them.

If you _don't_ mean that, then this flag doesn't work at all (v1.21.1)

Very puzzled, the packages appearing in dependencies and devDependencies should be unique.
How can there be the same package?

@BYK By "it is merely resolving them" do you mean it is "adding them to node_modules"?
Because for most packages that indistinguishable from installing them.

Resolving means reading their package.json to determine the whole install tree. It does _not_ put them under node_modules. That step is the linking step.

If you don't mean that, then this flag doesn't work at all (v1.21.1)

It does since we have tests for this behavior. It maybe getting overridden by your environment variables or something else.

Temporary fixes (deleting all the dev dependencies from package.json):

jq 'del(.devDependencies)' package.json > package.json.tmp && mv package.json.tmp package.json
    && yarn install --prod --frozen-lockfile

(requires installing jq)

or

awk '/},/ { p = 0 } { if (!p) { print $0 } } /"devDependencies":/ { p = 1 }' package.json > package.json.tmp && mv package.json.tmp package.json
    && yarn install --prod --frozen-lockfile

Use of --frozen-lockfile (as in yarn install --prod --frozen-lockfile ) AVOIDS pulling (downloading, "installing") dev dependencies. You just need to resolve yarn.lock elsewhere before.

Tested with yarn 1.17.3

@dvdotsenko At least not if a dev dependency points to an external Git repo... Yarn wants to fetch the remote repo when I run yarn install --production --frozen-lockfile in a CI/CD pipeline.

Was this page helpful?
0 / 5 - 0 ratings