Yarn: "npm install" deletes everything from a Yarn-produced tree

Created on 17 Jan 2018  ·  25Comments  ·  Source: yarnpkg/yarn

Do you want to request a feature or report a bug?

Hard to say. An integration issue where neither of the parties is directly responsible but the result is confusing to everyone.

What is the current behavior?

Running npm install <something> over a Yarn-produced tree corrupts it, removing everything Yarn has previously installed.

If the current behavior is a bug, please provide the steps to reproduce.

npm --version # 5.6.0
yarn --version # 1.4.0

yarn init -y
yarn add react-scripts

# ./node_modules/.bin/react-scripts exists

npm install --save react

# ./node_modules/.bin/react-scripts is deleted with all other package content

What is the expected behavior?

I’d expect that this either works, or somehow fails gracefully. I don’t have a specific proposal. To Create React App users, this is extremely confusing because their project gets created with Yarn (if Yarn exists on the system) but then they follow the docs of some library that tells them to npm install, and that blows away the whole tree.

I reached out to @zkat about this out of band, and the response is that npm verifies integrity using metadata in fetched package.json. If Yarn wrote that metadata by default, perhaps npm could work with Yarn trees without this issue.

I want to stress I don’t blame either party for this. I understand both have reasons for this behavior. But is highly confusing to users of both packagers, and IMO hurts the overall ecosystem. I wish we could find a way to solve or work around this; even if it’s npm refusing to run install when yarn.lock exists, or Yarn writing additional metadata that npm requires.

I’m filing this in the Yarn repository because the npm repository is too flooded with issues, and the maintainers have indicated it is unlikely to get a response there.

Please mention your node.js, yarn and operating system version.

macOS Sierra
Node 8.6.0
Yarn 1.4.0 (and earlier versions)

cat-compatibility cat-feature help wanted triaged

Most helpful comment

@gaearon np!

Another note: making this change also means that npm would be able to generate a package-lock for Yarn trees. That is, yarn && npm i --package-lock-only would give you a yarn.lock and a package-lock.json that would result in _the exact same tree_ when you run yarn or npm i afterwards -- so you could actually make create-react-app bundle Yarn if you wanted to, and always do the initial install with Yarn (I know that's probably too much for some of your users, but bear with me...), and people wouldn't even notice a difference when they go and use npm as they usually would. We won't do the removal, we won't reshape the tree a ton, and you can generate a package-lock.json right out the door right in c-r-a if you want to.

There's of course potential line noise issues if you switch between the two when you're adding or removing things, but npm itself does _minimal_ changes to existing packages, and I think most of the time, it would probably generate the exact same tree Yarn would if it adds a single (or a couple) of deps like that. Would be nice to have a smoother interop story like this, but we can't do it alone and I don't feel great about shouldering the burden if folks aren't gonna work _with_ me on this.

All 25 comments

Yes, this behavior is from NPM, and is a known issue. There isn't much we can do about it unfortunately.

https://github.com/npm/npm/issues/17379#issuecomment-345042377

It comes down to either use Yarn or NPM, not both. Complicating the matter is that many install scripts in packages explicitly call NPM or Yarn, which can break things. I forget the name of it off the top of my head, but both NPM and Yarn set an npm_{something} env var that indicates which package manager is being used to run this command, so scripts should use that to run additional commands instead of explicitly using npm or yarn.

Good that last version of create-react-app has --use-npm

To be more concrete: https://github.com/npm/npm/blob/latest/lib/install.js#L695-L703

Basically, all you need to write out is a _resolved and an _integrity (or _shasum) field into the installed package.json, and npm should "just" pick that up and be perfectly happy to keep Yarn's trees around. _resolved is the full resolved URL of the tarball, if it's a registry dep (or an http dep, or a local tarball file). For git deps, it's the full resolved git URL with the resolved SHA appended (that is, foo/bar#mybranch -> git(+https/+ssh)://github.com/foo/bar#deadbeefc0ffee).

This seems like the most straightforward and least-effort way to solve this problem all-around: it's important for us to have this because it's how we can make sure that the dep that is there is actually the one that was requested, when we read the tree to verify it.

I do not believe this will have a measurable performance impact on Yarn, since these fields are cachable and it'll be a one-time cost when you populate your cache on download (you'll have to figure out how to handle old Yarn caches that lacked this stuff).

@zkat Thank you for chiming in and providing these pointers!

@gaearon np!

Another note: making this change also means that npm would be able to generate a package-lock for Yarn trees. That is, yarn && npm i --package-lock-only would give you a yarn.lock and a package-lock.json that would result in _the exact same tree_ when you run yarn or npm i afterwards -- so you could actually make create-react-app bundle Yarn if you wanted to, and always do the initial install with Yarn (I know that's probably too much for some of your users, but bear with me...), and people wouldn't even notice a difference when they go and use npm as they usually would. We won't do the removal, we won't reshape the tree a ton, and you can generate a package-lock.json right out the door right in c-r-a if you want to.

There's of course potential line noise issues if you switch between the two when you're adding or removing things, but npm itself does _minimal_ changes to existing packages, and I think most of the time, it would probably generate the exact same tree Yarn would if it adds a single (or a couple) of deps like that. Would be nice to have a smoother interop story like this, but we can't do it alone and I don't feel great about shouldering the burden if folks aren't gonna work _with_ me on this.

@zkat thanks for this information! Is there any docs around this feature? Based on your comment I'm reading this as: when Yarn copies a package's package.json into node_modules, if we add _resolved and _shasum to the package.json, then NPM won't delete it. Does NPM also add these fields to package's package.json, or was this added just for the potential to fix this issue?

@yarnpkg/core see zkat's comment above. This seems like a good thing to explore implementing!

@rally25rs this has been the case for a while, pretty far back. We've been writing metadata into package.json on install since I can remember.

Also, talking to @iarna, there's a good chance that if Yarn adds this, this would work all the way back to npm@3 (I mean, maybe npm@2 would be fine with it too but lol good luck with that). It's not really a documented feature because it's kind of internal workings of the installer -- it's basically some minimal metadata required so that we can tell packages installed from different sources apart, even if they have the exact same package.json, as they usually would.

Thanks for being open to implementing this! I think y'all should have all the necessary metadata available already to do this, so it would be a matter of writing it out at extract time and you can set it and forget it. More than that would be a bigger ask (and you can probably ignore all the _other_ _underscore fields npm happens to add, I think)

p.s. I'm going to stop capitalizing "Yarn" if people keep capitalizing "npm", I swear. I will be so petty about this.

Chiming in on this as I think I was the first one to report one of the popular variants of this issue to npm and get notifications this every other day even though the issue has been closed for a long time now.

I'd love to see this happen and I am willing to put effort into it happening if needed, as essentially my issue lies with that

It comes down to either use Yarn or NPM, not both.

breaks the promise of

Same Packages
Install any package from npm and keep your package workflow the same.

For example Yarn doesn't run postinstall scripts, so yarn && npm install package-that-depends-on-postinstall && rm package-lock.json && rm -rf node_modules && yarn has become a common place for me when I notice that something goes awry. I am totally _ok_ with yarn not running postinstall scripts. I would be totally ok if both (or either of) yarn/npm would fail gracefully with clear instructions on what went wrong and how to proceed. And I of course loved if there was some sort of solution where yarn and npm just lovely went together (possibly with warnings why this might not be optimal). Yarn became super popular because it didn't break people's workflows after all.

Like @gaearon stated,

I want to stress I don’t blame either party for this. I understand both have reasons for this behavior.

I love both the people behind yarn ❤️ :heart: ❤️ and ❤️ npm ❤️ . It's just that at large use either yarn or npm will never happen at large, as there's approximately half a million packages out there that tell the end user to install a thing with npm install which would explode pretty much any app created with create-react-app for example.

I see both _integrity and _shasum in a package installed via npm. I'm guessing _shasum is the hash of the resolved file but what's _integrity? Does it matter which one yarn provides? Is there a reason to include one vs. the other or both?

_shasum is the legacy one and is a sha1 when present. _integrity is the new Subresource Integrity feature in npm5 (and there's a PR to add that to Yarn too)

If anybody wants to PR this then the line to insert any package.json writing logic to is here.

All the relevant information should be available in scope so all that would be required is something like:

await fs.writeFile(
  path.join(this.dest, constants.NODE_PACKAGE_JSON),
  JSON.stringify(
    {
      ...pkg,
      _resolved: remote.resolved,
      _integrity: hash,
    },
    null,
    '  ',
  ),
);

Haven't tested that but it should work, some invariants are probably required to keep proper type checking.

I’d like to work on this. I can submit a PR today.

@iansu One thing of note, we also need _from for git and tarball deps. _from holds the specifier that was used to fetch the package original. So for example, if you had a dependency of:

"aproba": "github:iarna/aproba#v1.2.0"

Then you should have the following in its package.json:

  "_from": "github:iarna/aproba#v1.2.0",
  "_resolved": "github:iarna/aproba#ee43ce68c9992e8f9d0d925dc2b1f2e1e5c976de",

The _from tells us that the source was the same as what you have installed.

The _resolved tells us exactly what that tag resolved to at install time (and is what goes in the lockfile).

While we produce shasums of git tarballs, they aren't useful as historically tar implementations don't produce bit-for-bit identical tarballs on each run. (The very most recent versions of node-tar can do this.)

From a tarball dep:

"aproba": "https://github.com/iarna/aproba/archive/v1.2.0.tar.gz"

we get

"_integrity": "sha1-D6eMQF7aFyEj1jPj4GmJOOUC7jc=",
"_resolved": "https://github.com/iarna/aproba/archive/v1.2.0.tar.gz",
 "_from": "https://github.com/iarna/aproba/archive/v1.2.0.tar.gz",

And from a registry dep:

  "_integrity": "sha512-Y9J6ZjXtoYh8RnXVCMOU/ttDmk1aBjunq9vO0ta5x85WDQiQfUF9sIPBITdbiiIVcBo03Hi3jMxigBtsddlXRw==",
  "_from": "aproba@^1.2.0",
  "_resolved": "https://registry.npmjs.org/aproba/-/aproba-1.2.0.tgz",

In either case, npm honors the _shasum entry if _integrity isn't available.

Thanks for the additional information. That complicates things a bit as I don't think we have the original specifier at the point where we're writing out package.json. We also don't have the resolved ref for git dependencies, just the final download URL. I'm trying to get all that information in place now and then I will test with the different use cases you provided.

A _resolved of the download url you're using would be usable as well.

The download URL we have for GitHub packages is something like: https://codeload.github.com/foo/bar/tar.gz/c2e5e3381ccbabfe300164afc75c432aee961ece. That works?

Yup, and imo, that's more appropriate anyway. _resolved should be what you ultimately decide your source is. (npm actually uses git to clone things, so a git url makes sense for it.)

Great! That makes my life a bit easier. 😀

One caveat, having just gone and double checked these fields by manually editing stuff:

Having something in _shasum or _integrity is necessary, but for git deps the value doesn't matter.

Okay, getting close. A couple of issues:

  1. With git and tarball dependencies it's prepending the package name to _from. For example, with @iarna's first use case I end up with _from: aproba@github:iarna/aproba#v1.2.0. This doesn't happen with registry dependencies. If this is a problem it should be fairly easy to strip that.
  2. When installing a tarball dependency that comes from the yarn cache _resolved points to the cache. Not sure how to resolve that one.

I've created an initial PR for this issue. It's incomplete but I'm interested in getting feedback.

I ran yarn install for a cloned project, yet I got the issue npm start fails: sh: react-scripts: command not found #3256. What went wrong? Do I need to remove the node_modules and install them again?

Please try that, yes.

The cause of this issue is they have ":"(colon) or "/"(slash which converts to colon) in their path, as you can see from the pic in CRA#3256

temporary solution: when you are naming your directories, don't use slash or colon, stick with space, hyphen or underscore.

Looks like path library can't escape colon ¯_(ツ)_/¯

Was this page helpful?
0 / 5 - 0 ratings

Related issues

jiripospisil picture jiripospisil  ·  128Comments

carlosduclos picture carlosduclos  ·  86Comments

OshotOkill picture OshotOkill  ·  80Comments

donaldpipowitch picture donaldpipowitch  ·  75Comments

fahrradflucht picture fahrradflucht  ·  120Comments