Yarn: Bug: running 2 (maybe more) yarn commands on different projects the same time fails

Created on 11 Oct 2016  ยท  59Comments  ยท  Source: yarnpkg/yarn

I installed yarn the first time, then i ran yarnon project A, seconds later I ran yarn on project B.

Both installation failed, project A output:

yarn install v0.15.1
info No lockfile found.
[1/4] ๐Ÿ”  Resolving packages...
[2/4] ๐Ÿšš  Fetching packages...
error Couldn't find a package.json (or bower.json) file in /Users/USERNAME/.yarn-cache/npm-yargs-3.10.0
    at /usr/local/lib/node_modules/yarnpkg/lib-legacy/config.js:363:13
    at next (native)
    at step (/usr/local/lib/node_modules/yarnpkg/node_modules/babel-runtime/helpers/asyncToGenerator.js:17:30)
    at /usr/local/lib/node_modules/yarnpkg/node_modules/babel-runtime/helpers/asyncToGenerator.js:28:20
    at run (/usr/local/lib/node_modules/yarnpkg/node_modules/core-js/library/modules/es6.promise.js:87:22)
    at /usr/local/lib/node_modules/yarnpkg/node_modules/core-js/library/modules/es6.promise.js:100:28
    at flush (/usr/local/lib/node_modules/yarnpkg/node_modules/core-js/library/modules/_microtask.js:18:9)
    at nextTickCallbackWith0Args (node.js:415:9)
    at process._tickCallback (node.js:344:13)

project B output:

yarn install v0.15.1
info No lockfile found.
[1/4] ๐Ÿ”  Resolving packages...
[2/4] ๐Ÿšš  Fetching packages...
error Couldn't find a package.json (or bower.json) file in /Users/USERNAME/.yarn-cache/npm-yargs-4.8.1
    at /usr/local/lib/node_modules/yarnpkg/lib-legacy/config.js:363:13
    at next (native)
    at step (/usr/local/lib/node_modules/yarnpkg/node_modules/babel-runtime/helpers/asyncToGenerator.js:17:30)
    at /usr/local/lib/node_modules/yarnpkg/node_modules/babel-runtime/helpers/asyncToGenerator.js:28:20
    at run (/usr/local/lib/node_modules/yarnpkg/node_modules/core-js/library/modules/es6.promise.js:87:22)
    at /usr/local/lib/node_modules/yarnpkg/node_modules/core-js/library/modules/es6.promise.js:100:28
    at flush (/usr/local/lib/node_modules/yarnpkg/node_modules/core-js/library/modules/_microtask.js:18:9)
    at nextTickCallbackWith0Args (node.js:415:9)
    at process._tickCallback (node.js:344:13)

When I run those commands sequentially it works as expected.

versions:

  • node 4.5
  • yarn 0.15.1
  • osx 10.11.6
cat-bug triaged

Most helpful comment

This bug has to be resolved in order to make Lerna compatible with Yarn, because Lerna spawns multiple concurrent processes of npm install.

All 59 comments

Confirmed.

$ node -v
v5.7.1
$ yarn -v

Edit: Now I am just stuck in this state.

This bug has to be resolved in order to make Lerna compatible with Yarn, because Lerna spawns multiple concurrent processes of npm install.

@timche Thanks for the update. I'll check it out again tonight.

@moneytree-doug I think you might have misunderstood what he said. He didn't say that it has been resolved but rather has to be resolved :)

@steelbrain Haha, whoops!

I ended up with a partially written file in ~/.yarn-cache due to such a collision in parallel invocation from Jenkins tests. (gulp-jshint/src/reporters/fail.js was truncated to 512 bytes.) yarn would then successfully run, just copying the broken file from .yarn-cache to the destination directory. :weary:

Adding my case here since we're in a similar situation, with multiple yarn instances running concurrently. In my case the error I got was
Trace: Error: https://registry.yarnpkg.com/@types/node/-/node-6.0.46.tgz: EEXIST: file already exists, mkdir '/home/administrator/.yarn-cache/npm-@types/node-6.0.46' at Error (native)

Seeing this as well if Jenkins happens to run multiple builds at the same time.

The error is very short:

Trace:
  Error: http://x.x.x.x:xxxx/autoprefixer/-/autoprefixer-6.5.3.tgz: ENOENT: no such file or directory, utime '/var/lib/jenkins/.yarn-cache/npm-autoprefixer-6.5.3/package.json'
      at Error (native)

It seems the intended "solution" is to use --mutex option to yarn:

This effectively turns off concurrency completely, which I find rather ridiculous, but maybe preferable to complete crash?

Also, the default location of the mutex is NOT the location of the cache (which would make sense, since it is access to this cache that it is guarding) but the current working directory, where you are probably alone anyway.

So you have to come up with your own unique location of the mutex file, and agree on that with all your other users, otherwise it won't help.

Apart from a global mutex being an "interesting" solution to implement a concurrency safe cache, one can only wonder why this isn't enabled by default, and with a sensible default location as well...

There's - - mutex network, which works globally. Good for use in CI
environments but unfortunately undocumented.

On Dec 19, 2016 6:30 AM, "marc-guenther" notifications@github.com wrote:

It seems the intended "solution" is to use --mutex option to yarn:

This effectively turns off concurrency completely, which I find rather
ridiculous, but maybe preferable to complete crash?

Also, the default location of the mutex is NOT the location of the
cache (which would make sense, since it is access to this cache that it is
guarding) but the current working directory, where you are probably
alone anyway.

So you have to come up with your own unique location of the mutex file,
and agree on that with all your other users, otherwise it won't help.

Apart from a global mutex being an "interesting" solution to implement a
concurrency safe cache, one can only wonder why this isn't enabled by
default, and with a sensible default location as well...

โ€”
You are receiving this because you commented.
Reply to this email directly, view it on GitHub
https://github.com/yarnpkg/yarn/issues/683#issuecomment-267953198, or mute
the thread
https://github.com/notifications/unsubscribe-auth/ABJFPz_goDBKU9WuHogr_boD_Uhack0vks5rJnjtgaJpZM4KT7sH
.

Looks like the --mutex network option is documented here: https://yarnpkg.com/en/docs/cli/#toc-concurrency-and-mutex

Isn't that the link I posted? :)

--mutex network is similar to --mutex file:/tmp/.yarn-mutex in that it is a global lock on the host. You still have to specify it everywhere you run yarn. Also the port number is way easier to collide with something existing, so I would prefer the file.

We have some projects that set their own cache location, and as such would not be affected by the concurrency issue. Setting this option globally would mean they cannot run in parallel to others anymore.

None of this would be a problem if the --mutex option would have sensible defaults or the cache would be concurrency safe in the first place.

It would be great to set the default behaviour via env variable. So one doesn't need to specify it on every task of the CI server.

It should be possible to allow for concurrency (on at least UNIX) here if each yarn processes creates a cache entry in a directory with a randomized name and then creates a symlink to it. Races between processes should then be harmless (just leaving an orphaned directory around if lose). Need the symlinks since directory renames aren't atomic.

http://stackoverflow.com/questions/307437/moving-a-directory-atomically

It would be simpler if could use files in the cache, not directories, but maybe that's not viable or performant.

Creating node_modules/ dirs is almost always going to be on the critical path on test servers running things concurrently, so allowing concurrency here would be desirable.

Putting all the cache entries in one directory will fail to scale properly with a sufficiently large number of npm package versions, so might want to split across multiple directories too (directory creation should be easy to make work concurrently).

How would existing caches be migrated to a new format safely? Could use a new independent cache dir, subdir or naming scheme, though this would make people download all cache entries again.

If don't add extra directories, could allow a mixture of dirs and symlinks. If a dir already exists, nothing would want to replace it with a symlink, so there's no timing issue.

If do add extra sub directories, could symlink to the old top-level directories where necessary.

I get this exact same error, but my use case is slightly different:

On Jenkins, I get this message with Yarn > 0.19.0 but NOT 0.18.1.

I'm only running 1 Jenkins job, and I clean the workspace each time.

npm install -g yarn

These options appear to have no effect on the outcome:
yarn install --ignore-engines --cache-folder .yarn_cache --mutex file:/usr/local/share/.yarn-mutex
yarn install --ignore-engines
./build-tools/node_modules/.bin/yarn install --ignore-engines

Fails 0.20.0

00:02:03.218 yarn install v0.20.0
00:02:03.505 [1/4] Resolving packages...
00:02:06.490 [2/4] Fetching packages...

00:02:48.882 error Couldn't find a package.json file in ".yarn_cache/npm-lodash-3.10.1-5bf45e8e49ba4189e17d482789dfd15bd140b7b6"

Fails 0.19.0

00:01:50.156 yarn install v0.19.0
00:01:50.413 [1/4] Resolving packages...
00:01:52.091 [2/4] Fetching packages...
00:02:07.750 error An unexpected error occurred: "http://npm.paypal.com/lodash/-/lodash-3.10.1.tgz: ENOENT: no such file or directory, lstat '/workspace/p2pnodeweb-release-jamis/.yarn_cache/npm-lodash-3.10.1-5bf45e8e49ba4189e17d482789dfd15bd140b7b6/collection/map.js'".
00:02:07.750 info If you think this is a bug, please open a bug report with the information provided in "/workspace/jamis/yarn-error.log".
00:02:07.750 info Visit https://yarnpkg.com/en/docs/cli/install for documentation about this command.

Passes 0.18.1

00:01:34.276 yarn install v0.18.1
00:01:35.084 [1/4] Resolving packages...
00:01:40.234 [2/4] Fetching packages...


00:03:02.051 warning [email protected]: The platform "linux" is incompatible with this module.
00:03:02.051 info "[email protected]" is an optional dependency and failed compatibility check. Excluding it from installation.
00:03:02.058 [3/4] Linking dependencies...

we see this too, even if using the mutex option. if any attempt at using yarn to install simultaneously occurs, the cache is forever screwed and no installation will go forward (even tried yarn cache clear, npm cache clean, etc). if package.json or yarn.lock is updated there's a chance a subsequent install will run.

this is super annoying, is there any way to debug this?

@jgreen210 Symlinks are a bad idea - you need to be an admin user for them to work on Windows.

There's no need to use same the same caching scheme on Windows. Maybe there's another way of getting atomic directory replacement on windows? E.g. maybe straightforward renames just do what is required? If so, would just need to replace symlink op with rename on windows (ignoring failure if lose dir rename race and dir exists). Maybe you need to use a lock file there. It's not a platform I am familiar with though.

This thing should handle yarn concurrent runs cache issues.

How does that thing solve the problem? All I could find in the README is a link pointing back here.

It would still need to set the global lock for the cache, something which we already do automatically in our CI setup, so I'm not sure how it would help (at least in a CI setup).

@marc-guenther how do you run yarn in parallel in CI, why you need this, do you have monorepo with multiple packages?

Not sure I understand the question. Our CI contains thousands of jobs. A lot of them run in parallel, each of which could run yarn at any time. As all those running in the same CI slave share the same cache, we get bitten by this bug a lot.

We made a wrapper around the yarn binary, which adds the mutex option, so people don't have to do it in their jobs. This limits us to one yarn execution per CI slave, which is unfortunate, but correctness of builds is more important than speed.

Regarding repositories, we have hundreds of them, each one can choose whatever layout they want, but most are single project per repo.

From looking at your changes, it seems that you simply scan the log for a couple of hardcoded error messages, and then simply run the command again?

I see the following problems with that:

  • We run a LOT of builds on our systems. Which means even when we run yarn again, there is a high probability it will simply fail again, because new ones have started in the mean time. And retrying more than once would only make matters worse.
  • When I understand this comment correctly, any concurrent yarn calls can corrupt the cache. A simple re-run won't help, as yarn won't even notice the corruption. Instead, it seems, the cache cannot be trusted after any concurrent invocations and has to be deleted.

I don't think any external tool can do anything useful about this (except hacky workarounds). This is a bug in yarn itself and needs to be fixed there. But given that this is open for 7 months now, it doesn't seem likely. Apparently the original authors don't use yarn in a way which triggers this behavior...

@marc-guenther
For big complex environments ideally there is a need of some kind of "yarn server" that would be the only authority to make network requests and write to cache. But this seem to be tough.

Though there can be other simpler solutions for the problem, for example when client want to start to work with cache folder (when it doesn't find needed package version) just before write it should check if this folder is locked, and wait until it unlocked, if not, lock it (using some file in this folder or some other location), not so difficult to implement and make PR, I think good engineer could handle it in couple of days, so if you have resources and desire.

Are there any plans for a better solution to this than using --mutex?

Are there any plans for a better solution to this than using --mutex?

Need to implement this:

when client want to start to work with cache folder (when it doesn't find needed package version) just before write it should check if this folder is locked, and wait until it unlocked, if not, lock it (using some file in this folder or some other location)

@whitecolor excellent! will that progress be tracked here or in another issue?

@whitecolor I don't see how a mutex only for writers would be enough.

If the writes are not atomic, readers need to be prevented from reading partial writes, so we'd want RW locks.

If the writes can be made atomic, then mutual exclusion would not be needed. (Except as an optimization.)

@whitecolor: It seems you missed an important implication of what I wrote, so let me try to repeat it. You wrote a tool whose whole purpose is to run yarn in a way which triggers this bug here. This bug can lead to currupt cache directories and therefore broken builds. You do not warn your users about this, instead you even claim that you "handle" this bug (which you don't).

Regarding "yarn server", there is no need for such a thing, at least not in respect to this bug (we already use Nexus, but that is a different topic). A concurrency safe cache implementation would be more than sufficient (which is not rocket science, though not as simple as you seem to think).

@aij: Exactly!

@aliatsis: I don't think anyone is working on this issue so far... :-/

@marc-guenther

You wrote a tool whose whole purpose is to run yarn in a way which triggers this bug here.

This is a tool for monorepos, not for big environments. If the cache error happens it fixes it up the cached directory re-run failed processes without concurrency. It's a kind of hack of course, but it works. And in non-first time installations where you have most of the packages in cache already the chache error is not likely to appear at all even. If you want to be safe you may not use concurrency

btw new NPM 5 now claims that The new cache is very fault tolerant and supports concurrent access..

A concurrency safe cache implementation would be more than sufficient (which is not rocket science, though not as simple as you seem to think).

So you may help yarn team. )

re-run failed processes without concurrency

This can fail silently, see https://github.com/yarnpkg/yarn/issues/683#issuecomment-258863739

chache error is not likely to appear at all even

This depends on many variables, number of packages and their dependencies, how often versions change. I think it's next to impossible to calculate the probability of a collision and I'd strongly advise against trying out your luck here in any non-toy projects.

Thanks!

Is there any reason to not make --mutex network the default?

Is there any reason to not make --mutex network the default?

Because sometimes it may work without it)

@pauldraper: Apparently, because the original authors use Yarn in a way which does not trigger this bug. Or they simply don't care. This is a serious serious bug, and I have not seen a single comment from them here so far.

As changing that default is so trivial, I will not even try to make a PR, as I guess it won't be merged. If what you wrote would be desired behavior for maintainers, they would have done it long time ago.

Speaking of the mutex, imho the default should be --mutex file:<location of the cache directory>, as it's the cache which is not concurrency safe. No need to lock anything, if it's using different cache locations.

@whitecolor: This is a build tool. Its output is run on production systems. As such, its output HAS to be correct, not something which sometimes works. If you optimize for speed rather than correctness, you have no business writing such a tool.

Also, correctness HAS to be the default. If you want an option --silently-screw-up-my-build-every-now-and-then-just-to-annoy-me-and-everybody-else, then it's default MUST be off.

I believe everyone agrees that it is a bug, not by design decision. I'm pretty sure eventually yarn will be able to deal with concurrent installs correctly while making possible to perform the best.

As for "productions systems" that work automatically you can (need) always tune it, supply all flags, options, workaround bugs, etc not to worry and be happy.

Hi folks. I feel like using .yarnrc with https://yarnpkg.com/en/docs/yarnrc#toc-cli-arguments would solve most of the complaints for now. What do you think?

In the mean time I'll try to understand the decision process behind this and whether it makes sense to defaulting to a --mutex value.

@BYK If this drawn your attention, why not solve the problem with concurrent usage of cache?

@whitecolor that's also something we can look into. I'd rather stop the immediate bleeding first.

This problem needs to be adressed, badly. Running parallel jenkins builds that invoke yarn is highly problematic at the moment. We tried mutexes, we tried to redirect the yarn cache to the build directory, but nothing helped. There seems to be some part of yarn that always accesses a global location, and that access isn't safe for concurrency.

Which version are you using? Starting from the 1.0, we made a few changes on the mutex codebase that might solve your issues.

We're on 1.0.2. We are running Jenkins declarative Pipelines, which call Gradle, which in turn uses the Gradle Node Plugin in order to execute the yarn tasks .

The yarn install task in my gradle looks like this:

task yarnInstall(type: YarnTask, dependsOn: yarnSetup){
    def cacheFolder = file("${project.buildDir}/yarn-cache")
    logger.lifecycle("Yarn Cache Dir: " + cacheFolder.getAbsolutePath());
    // run yarn with these arguments
    args = ['install', '--cache-folder ' + cacheFolder.getAbsolutePath(), '--mutex network:31997']
}

So what is executed is yarn install --cache-folder <something local to the build> --mutex network:31997. And still, I get the following error when more than one build runs in parallel:

An unexpected error occurred: "https://registry.yarnpkg.com/core-js/-/core-js-2.5.0.tgz: ENOENT: no such file or directory, utime '/home/user/.cache/yarn/v1/npm-core-js-2.5.0-569c050918be6486b3837552028ae0466b717086/library/fn/math/rad-per-deg.js'" (the offending JS file varies). The interesting part is that yarn attempts to open something in /home/user/..., even though --yarn-cache is set to something else.

I currently cannot eliminate the possibility that the gradle node plugin has a bug, or is interfering in any way here. However, as Yarn integrates with all popular build servers except for Jenkins, we have little choice. An official yarn gradle plugin would be nice too.

I think there's an error in your code. Have you tried using

args = ['install', '--cache-folder', cacheFolder.getAbsolutePath(), '--mutex', 'network:31997']

instead of

args = ['install', '--cache-folder ' + cacheFolder.getAbsolutePath(), '--mutex network:31997']

?

... interesting point. Will give it a try, thanks!

@MartinHaeusler as a workaround you may also try using a separate cache folder for each invocation. That would trade space for time (mutex makes other instances wait for each other).

@BYK yes, true. We used --cache-folder when we realized that --mutex wasn't doing what we intended on our Jenkins. But maybe it was just my configuration that was wrong all along (as @arcanis pointed out). I will investigate this further.

Unfortunately the fix proposed by @arcanis (even though it was a nice hint!) did not change the situation.

That's really weird :hushed: We use Yarn internally at fb, and as you can imagine we do have a fair amount of parallel tests running with the network mutex, so I have no idea why it doesn't work on your side.

I am suspecting the gradle node plugin by now which runs our yarn tasks to be at fault here... This issue will be one tough nut to crack.

@MartinHaeusler may be run Yarn with --verbose to see how processes interact with each other with the mutex?

Nice suggestion, I'll give it a shot. Somehow I suspect that either the gradle node plugin does not properly pass the --mutex option to yarn, overrides it somehow, or yarn ignores it. Looking at the verbose logs should provide some insight.

By the way, apparently I'm not the only one struggling with this problem on Jenkins:
Ticket 2629

I feel like we can close this. Objections? @MartinHaeusler @neophob or anyone else?

Being able to corrupt the cache if run yarn concurrently with itself using its default options is a bug. Using a sensible default for --mutex (i.e. a per-cache-folder path) or reimplementing cache so that it doesn't need a mutex would solve this (see my above comments for how this might be possible).

Close this? Has it been fixed in a different ticket? Cause I haven't seen anything here. What's the fix?

Using a sensible default for --mutex (i.e. a per-cache-folder path) or reimplementing cache so that it doesn't need a mutex would solve this (see my above comments for how this might be possible).

Correct, and we will happily review and merge any PR that would help with that.

Close this? Has it been fixed in a different ticket? Cause I haven't seen anything here. What's the fix?

Using the --mutex flag solves the issue reported in the OP (running two Yarn instances concurrently will not corrupt the cache anymore). At this point what you're asking seem to be an improvement over this, which deserves its own issue (but as you might have guessed by my previous statement, we would very much prefer a PR over an issue ;).

ah, the best kind of QA: sweep it under the rug ;-)

Yes, it is completely ridiculous. First ignore the bug for a year, then suddenly close it without any fix, because there is a horrible workaround (which existed the entire time!), which everyone has to enable manually every time. Ah, and most important, don't bother to update the documentation.

Yarn is an open-source and community-driven project, which means that by design you're just as welcome to contribute to it as you are to use it. The documentation is here, the source code here. If you feel strongly about this feature, please feel free to give it a try - we'll happily spend time reviewing it and giving you pointers to unblock you if you have questions.

@Spongman @marc-guenther I am sorry that you feel this way about the issue, and Yarn in general. That said we have documented this command line flag some time ago with https://github.com/yarnpkg/website/pull/280. I am not claiming it is the most visible or best description but at least it is there, searchable, and anyone who feels differently is welcome to try improving it. It is actually best for people like you who feel frustrated and not served well by the project to take a stab at improving docs because you know exactly where the pain is.

Regarding fixing the root cause of the issue, which is unsafe concurrent read/write access to a global cache, I agree that this should be solved. That said, anyone who tried to fix this would also know that it is not an easy problem to solve, especially if you don't want to compromise performance. Moreover, concurrent Yarn instances do not seem to be the common case since people usually work on a single project at a time (at least install dependencies one project at a time) and CI systems are isolating builds (with some exceptions). This is why the issue hasn't received much attention from the core maintainers. That said if you feel strongly about it and if you have a good solution in mind, we would be more than happy to assist you throughout the contribution process and champion your pull request if you decide to submit one.

Thanks a lot for sharing your concerns again. All being said I'd also appreciate if you keep in mind that we are all working towards better software here and the best way to get there is collaboration instead of blaming.

I'd love to continue this discussion without pinging everyone on this thread continuously so feel free to join our Discord server and ping me directly or on any of the channels. I'd love to hear more about why you feel your contributions are not welcome and do our best to resolve them โค๏ธ

Was this page helpful?
0 / 5 - 0 ratings

Related issues

chiedo picture chiedo  ยท  3Comments

victornoel picture victornoel  ยท  3Comments

MunifTanjim picture MunifTanjim  ยท  3Comments

davidmaxwaterman picture davidmaxwaterman  ยท  3Comments

sanex3339 picture sanex3339  ยท  3Comments