Berry: Benchmark PnP runtime overhead

Created on 8 Sep 2020  ยท  43Comments  ยท  Source: yarnpkg/berry

  • [ ] I'd be willing to implement this feature
  • [ ] This feature can already be implemented through a plugin

Describe the user story

I saw that the yarn pnp.js file extends some nodejs native functionality like require, path and fs functions

To prevent performance degradation we should make benchmark and stress tests that measure the runtime overhead, for example creating a bundle with webpack of a big project with at least 50 packages

Other performance degradations could be in glob searches

I am also scared that some of these functions reimplementations have time complexity that grows with the workspace packages you have, some of us have workspaces with 200 packages

Describe the solution you'd like

Implement a benchmark to measure the runtime overhead

Describe the drawbacks of your solution

Describe alternatives you've considered

Additional context

It is still not very clear to me why yarn reimplements fs, it would be cool to read more about it

enhancement

Most helpful comment

Regarding the initial topic, yep, we currently track install times but not run times. It would be a good idea to have a dashboard for runtimes as well ๐Ÿ‘ Also, we should put the dashboard link somewhere on the website ๐Ÿค”

All 43 comments

I just did on a very small scale example, and there is already a significant impact of yarn berry that I did not expect.

https://github.com/crubier/yarn-benchmarks

For example, on a simple create-react-app + storybook, build storybook takes 23s with yarn berry, VS 12s with npm.

I think we suffer from this internally, on our big monorepo with many storybooks, the storybooks take a total of 90 min to build.... reducing to 45min would already help a lot...

@crubier Have you tried to check if the impact in Yarn berry due to zip compression, i.e. compessionLevel Yarn berry option

@larixer no I didn't know about this option. Let me try it now..

BTW What is the reason for compressing the cache? Is it only for saving space on the hard drive?

@crubier Yep, to save space on the hard drive or in the Git repo

Putting compessionLevel: 0 improves performance by around 0-10%, not much.

For example storybook goes from 21s-25s to 19s-24s

build storybook takes 23s with yarn berry, VS 12s with npm.

That's not a good comparison, what you actually want to compare is PnP vs node_modules in V2

@crubier Yep, to save space on the hard drive or in the Git repo

I think the issue raised by @remorses here is a very good idea. I think the runtime performance impact of yarn on real-life benchmark has been very overlooked, and now might be the time to start thinking about it a bit more. On my part for example, I dont care about saving a few GB on my disk if this meant getting faster performance.

An example of the overlooking of runtime performance was https://github.com/yarnpkg/berry/issues/973 , which was mostly resolved thankfully (on our monorepo there is now "only" a delay of 7s before each yarn command, instead of 5minutes...)

What surprised me in this little benchmark is that yarn berry seems to have a 150-200% performance impact on many real life scripts.

That's not a good comparison, what you actually want to compare is PnP vs node_modules in V2

And yes, I include PnP when I say Yarn v2. The point of this issue is mostly about PnP I think.

I could add yarn v1 with pnp and yarn v2 without pnp in the benchmark.

Your install benchmark isn't useful the way it's currently done, since it doesn't take into account if the archives are already in the cache or if the lockfile is already generated. If you need an example of a good install benchmark, here is ours; the results are here.

I could add yarn v1 with pnp

That's wouldn't be useful, as Yarn 1's PnP was just an experiment and not production ready.

yarn v2 without pnp in the benchmark.

Yes, Yarn 2 with the node-modules linker would be useful.

@paul-soporan

Your install benchmark isn't useful the way it's currently done, since it doesn't take into account if the archives are already in the cache or if the lockfile is already generated.

I know about that of course, I ran the benchmark only after installing in each folder (And the benchmark is ran several times in a row without cleanup). So it's a "reinstall" benchmark, not a "first install" benchmark.

npm and Yarn v1 are irrelevant imo, what we're interested in is the performance overhead of Yarn V2's PnP mode vs node_modules. npm and Yarn v1 has hoisting bugs which can give them unfair advantages (v2 as far as we know, doesn't)

I think the runtime performance impact of yarn on real-life benchmark has been very overlooked

That's a false statement, i've done optimizations on real world usecases, for example:

You can see all the performance related PRs i've done here https://github.com/yarnpkg/berry/pulls?q=is%3Apr+author%3Amerceyz+perf+is%3Aclosed

I know about that of course, I ran the benchmark only after installing in each folder. So it's a "reinstall" benchmark, not a "first install" benchmark.

The only case our benchmarks doesn't cover is runtime overhead of PnP vs node_modules, the rest we cover already https://p.datadoghq.eu/sb/d2wdprp9uki7gfks-c562c42f4dfd0ade4885690fa719c818

@merceyz "overlooked" does not mean "completely forgotten":

In the past 3 months you made at least 2 PRs that each improved by 100% the performance on certain use cases, this is very good and going in the right direction, but it also shows that there are still wide margins for improvement.

As I mentioned I also made a PR 7 months ago ( https://github.com/yarnpkg/berry/pull/998 ) which improved by 2500% the runtime performance on large monorepos. There again, there was large margin for improvement of the runtime performance.

My only point is that we should indeed start measuring this runtime performance improvement because this is now becoming the limiting factor for us internally.

@merceyz

Comparing npm with yarn berry without pnp gives indeed smilar results.

So yes, as expected, we can compare yarn pnp vs yarn node_modules, and we get:

build-storybook ok

real    0m24.025s
user    0m30.199s
sys     0m2.798s

berry-no-pnp
build-storybook ok

real    0m13.496s
user    0m13.646s
sys     0m2.200s

I have just run yarn run --inspect start-storybook on a workspace with 250 packages and made a profile, it turns out traverseDependencyTree takes 72% of the time

traverseDependencyTree's time complexity increases with number of dependencies, this is not a good idea, we should find another way

Schermata 2020-09-09 alle 15 18 40

@remorses Out of curiosity, where does traverseDependencyTree come from? We don't have such a function inside the PnP hook and it doesn't appear when I profile https://github.com/crubier/yarn-benchmarks/tree/master/single-berry. Is it from the fsevents patch? Is it from another dependency?

It comes from .yarn/unplugged/fsevents-patch-5aa22a6c1e/node_modules/fsevents/vfs.js

@remorses Indeed this algorithm is very inefficient because of seen.delete, it can run for ages on some dependency graphs. It can be rewritten in much more efficient manner

Regarding the initial topic, yep, we currently track install times but not run times. It would be a good idea to have a dashboard for runtimes as well ๐Ÿ‘ Also, we should put the dashboard link somewhere on the website ๐Ÿค”

I updated my benchmark repo to start performing benchmark on monorepos at runtime.

https://github.com/crubier/yarn-benchmarks

Here are some quick results:

  • Storybook with Yarn v2 + Node Module resolution = 12s to build โœ…
  • Storybook with Yarn v2 + PnP resolution = 24s to build ๐Ÿ˜ž
  • Storybook with Yarn v2 + Node Module resolution + Monorepo with 200 packages = FAIL to build ๐Ÿ”ด
  • Storybook with Yarn v2 + PnP resolution + Monorepo with 200 packages = 48s to build โš ๏ธ

That's a x4 performance penalty by doing the same thing, but in a monorepo (without importing dependencies from the monorepo, just by _being_ in the monorepo), and using PnP (because it doesn't work without it anyway, probably storybooks fault again.)

This seems to match with what @remorses says, the 4x performance hit seems to match the 72% of the time spent in traverseDependencyTree .

So the good news is that this particular problem seems fixable! I think I can help, but the file in question lacks documentation so it's hard to know where to start

@larixer Do you intend to open a PR improving the performance of traverseDependencyTree from the fsevents patch? (Asking you since you're the one that discovered the problem of that algorithm)

@paul-soporan To be fair I'm busy now with nohoist so I don't plan to hack traverseDependencyTree anytime soon. I think removing this line seen.delete should be enough, but someone needs to double check that of course.

@crubier

Storybook with Yarn v2 + Node Module resolution + Monorepo with 200 packages = FAIL to build ๐Ÿ”ด

This worries me, a bit :) NM_DEBUG_LEVEL=1 yarn on that Monorepo throws or not?
NM_DEBUG_LEVEL=1 performs self-checking if produced node_modules install tree is correct
NM_DEBUG_LEVEL=2 performs self-checking and also outputs the reasons why packages were not hoisted to the top node_modules

โžœ NM_DEBUG_LEVEL=1 yarn
โžค YN0000: โ”Œ Resolution step
โžค YN0002: โ”‚ multi@workspace:. doesn't provide @testing-library/dom@>=5 requested by @testing-library/user-event@npm:7.2.1
โžค YN0002: โ”‚ multi@workspace:. doesn't provide webpack@>=2 requested by babel-loader@npm:8.1.0
โžค YN0002: โ”‚ react-color@npm:2.18.1 doesn't provide react@* requested by @icons/material@npm:0.2.4
โžค YN0002: โ”‚ @mdx-js/loader@npm:1.6.16 doesn't provide react@^16.13.1 requested by @mdx-js/react@npm:1.6.16
โžค YN0002: โ”‚ @storybook/addon-essentials@npm:6.0.21 [3c62c] doesn't provide @babel/core@^7.0.0-0 requested by @storybook/addon-docs@npm:6.0.21
โžค YN0002: โ”‚ @storybook/preset-create-react-app@npm:3.1.4 [3c62c] doesn't provide typescript@>= 3.x requested by react-docgen-typescript-plugin@npm:0.5.2
โžค YN0002: โ”‚ react-docgen-typescript-loader@npm:3.7.2 [cbc10] doesn't provide webpack@^3.0.0 || ^4.0.0 requested by @webpack-contrib/schema-utils@npm:1.0.0-beta.0
โžค YN0002: โ”‚ @storybook/react@npm:6.0.21 [3c62c] doesn't provide typescript@>= 3.x requested by react-docgen-typescript-plugin@npm:0.5.2
โžค YN0002: โ”‚ foobar@workspace:packages/single-yarn doesn't provide @testing-library/dom@>=5 requested by @testing-library/user-event@npm:7.2.1
โžค YN0002: โ”‚ foobar@workspace:packages/single-yarn doesn't provide webpack@>=2 requested by babel-loader@npm:8.1.0
โžค YN0000: โ”” Completed in 2.16s
โžค YN0000: โ”Œ Fetch step
โžค YN0000: โ”” Completed in 1.49s
โžค YN0000: โ”Œ Link step
hoist: 954.502ms
โžค YN0000: โ”” Completed in 7.16s
โžค YN0000: Done with warnings in 11.25s

Then

โžœ cd package/the-package-to-build-storybook-on
โžœ yarn build-storybook                                                                                     
info @storybook/react v6.0.21
info 
info clean outputDir..
info => Copying static files from: public
info => Copying prebuild dll's..
info => Building manager..
info => Loading manager config..
info => Loading presets
info => Compiling manager..
info => manager built (5.03 s)
info => Building preview..
info => Loading preview config..
info => Loading presets
info => Loading config/preview file in "./.storybook".
info => Loading config/preview file in "./.storybook".
info => Adding stories defined in ".storybook/main.js".
ERR! Failed to resolve a `react-scripts` package.
info => Using default Webpack setup.
info => Compiling preview..
ERR! => Failed to build the preview
ERR! ./src/stories/assets/code-brackets.svg 1:0
ERR! Module parse failed: Unexpected token (1:0)
ERR! You may need an appropriate loader to handle this file type, currently no loaders are configured to process this file. See https://webpack.js.org/concepts#loaders
ERR! > <svg xmlns="http://www.w3.org/2000/svg" xmlns:xlink="http://www.w3.org/1999/xlink" width="48" height="48" version="1.1" viewBox="0 0 48 48"><title>illustration/code-brackets</title><g id="illustration/code-brackets" fill="none" fill-rule="evenodd" stroke="none" stroke-width="1"><path id="Combined-Shape" fill="#87E6E5" d="M11.4139325,12 C11.7605938,12 12,12.5059743 12,13.3779712 L12,17.4951758 L6.43502246,23.3839989 C5.85499251,23.9978337 5.85499251,25.0021663 6.43502246,25.6160011 L12,31.5048242 L12,35.6220288 C12,36.4939606 11.7605228,37 11.4139325,37 C11.2725831,37 11.1134406,36.9158987 10.9453839,36.7379973 L0.435022463,25.6160011 C-0.145007488,25.0021663 -0.145007488,23.9978337 0.435022463,23.3839989 L10.9453839,12.2620027 C11.1134051,12.0841663 11.2725831,12 11.4139325,12 Z M36.5860675,12 C36.7274169,12 36.8865594,12.0841013 37.0546161,12.2620027 L47.5649775,23.3839989 C48.1450075,23.9978337 48.1450075,25.0021663 47.5649775,25.6160011 L37.0546161,36.7379973 C36.8865949,36.9158337 36.7274169,37 36.5860675,37 C36.2394062,37 36,36.4940257 36,35.6220288 L36,31.5048242 L41.5649775,25.6160011 C42.1450075,25.0021663 42.1450075,23.9978337 41.5649775,23.3839989 L36,17.4951758 L36,13.3779712 C36,12.5060394 36.2394772,12 36.5860675,12 Z"/><rect id="Rectangle-7-Copy-5" width="35.57" height="4" x="5.009" y="22.662" fill="#A0DB77" rx="2" transform="translate(22.793959, 24.662305) rotate(-75.000000) translate(-22.793959, -24.662305)"/></g></svg>
ERR!  @ ./src/stories/Introduction.stories.mdx 9:0-46 204:9-13
ERR!  @ ./src sync ^\.(?:(?:^|\/|(?:(?:(?!(?:^|\/)\.).)*?)\/)(?!\.)(?=.)[^\/]*?\.stories\.mdx)$
ERR!  @ ./.storybook/generated-stories-entry.js
... And 1000s of similar errors

The output of NM_DEBUG_LEVEL=2 yarn is 6700 lines long so I will spare you this. You can check it on my repo in the folder multi-yarn-nm

Storybook doesn't always work in monorepos, it depends on the hoisting layout you get. PR here https://github.com/storybookjs/storybook/pull/11753

@larixer indeed removing seen.delete seems to make the situation better without breaking anything as far as I know.

After removing seen.delete() the situation is:

  • Storybook with Yarn v2 + Node Module resolution = 12s to build โœ…
  • Storybook with Yarn v2 + PnP resolution = 24s to build ๐Ÿ˜ž
  • Storybook with Yarn v2 + Node Module resolution + Monorepo with 200 packages = FAIL to build ๐Ÿ”ด (see @merceyz message above)
  • Storybook with Yarn v2 + PnP resolution + Monorepo with 200 packages = 28s to build ๐Ÿ˜ž

So to recap: removing seen.delete() makes the situation better for monorepos, by removing a large part of the monorepo-induced overhead spent in traverseDependencyTree as hown by @remorses. But pnp (monorepo or not) still has a ~2x performance penality as compared to node module)

@crubier Looks like we have another bottleneck then. seen.delete affects only large dependency tree, but there is no effect on small dependency trees which is expectable

I opened a PR for the seen.delete part of the problem @larixer , I'll run tests locally, is there anything else I can do to gain confidence that it does not break things ?

@crubier I think it should be all fine, This method just registers all the locations, so visiting graph nodes exactly once should be enough. You should also run yarn version check -i on your PR and select patch for plugin-compat and for yarnpkg-cli and possibly add an entry to CHANGELOG.md

@crubier One observation that bothers me is that from the same set of source files the storybook produces output files that significantly differ in size! Compare output files for node_modules and for pnp

I'm curious of why fsevents is involved in a build, where file watching shouldn't happen

@larixer

One observation that bothers me is that from the same set of source files the storybook produces output files that significantly differ in size! Compare output files for node_modules and for pnp

This is something I have noticed on my internal monorepo as well.

Here is the webpack bundle analyzer output:

For yarn-node-modules

Screenshot 2020-09-09 at 21 07 46

For yarn-pnp

Screenshot 2020-09-09 at 21 07 25

The problem seems to be that yarn/cache ends up in the bundle, while only yarn/$$virtual should actually end up there.

@merceyz since you seem to know a bit about storybook maybe you have an idea? Again this is default create-react-app + default storybook

@crubier I see a problem with Storybook/Webpack not been able to delegate to sb_dll/*_dl.js files under PnP and thus recompiling all the stuff it needs from node_modules again and again. To check this you can create .storybook/yarn-preset.js:

async function yarn2Config(config, options) {
  const newConfig = {
    ...(config || {})
  };
  newConfig.optimization = {minimize: false, splitChunks: false, runtimeChunk: false};
  newConfig.mode = 'development';
  return newConfig;
}

module.exports = { managerWebpack: yarn2Config, webpack: yarn2Config, webpackFinal: yarn2Config };

basically turn off all optimizations, switch to development mode and generate single bundle.
You need to modify .storybook/main.js and add "presets": [require.resolve("./yarn-preset")]
Run with this preset both NM and Yarn PnP, then diff produced bundles. You will see in the diff something like this:

-/***/ "./node_modules/@emotion/core/dist/core.browser.esm.js":
-/*!************************************************************************************************!*\
-  !*** delegated ./@emotion/core/dist/core.browser.esm.js from dll-reference storybook_docs_dll ***!
-  \************************************************************************************************/
+/***/ "./.yarn/$$virtual/@emotion-core-virtual-283d648f44/0/cache/@emotion-core-npm-10.0.35-565218757c-5
+/*!*****************************************************************************************************
+  !*** ./.yarn/$$virtual/@emotion-core-virtual-283d648f44/0/cache/@emotion-core-npm-10.0.35-565218757c-5
+  \*****************************************************************************************************

e.g. NM bundle delegates many functions to DLLs, but PNP bundle compiles all of them into bundle

@larixer Oh, the storybook is distributing prebuilt DLLs:

info => Copying prebuild dll's..

inside DLLs you can see:

 !*** /Users/shilman/projects/baseline/storybook/node_modules/@storybook/semver/internal/debug.js ***!

This is a suboptimal decision, because this way the DLLs are dependent on hoisting layout, so they will be reused to a less degree by the package manager that is different from the one used to build the DLLs, even if package manager is the same, but only the version is different the hoisting layout will be different. And of course node_modules prebuilt DLLs won't be used by PnP at all.

They should at least have an option to build dll on a user side. But the functionality like this will not be added anytime soon I think.

You can try disabling DLL usage via --no-dll and configure Webpack to use some compilation caching plugin like hard-source-webpack-plugin instead. I think this way the effect will be more or less the same as with DLLs, but more reliable

@larixer I just tried, and using --no-dll succeeds in making both yarn-nm and yarn-pnp similar in terms of bundle size, but probably not as you (or at least I) expected.

It does not make the yarn-pnp bundle smaller (it does not change anything for pnp basically), but by it make the yarn-nm bundle larger... So it's not really a solution..

It's an original caching, I wonder if it could be made compatible with PnP, perhaps some with extra Webpack config ๐Ÿค” Probably worth opening an issue on the Storybook repository rather than here, since it's not actionable on our side (let's keep this issue here about building a runtime perf dashboard).

cc @gaetanmaisse

@crubier --no-dll places both NM and PnP into the same conditions, after that their perfs are the same for me. The "solution" would be to use build caching that is not hardcoded to node_modules, for example hard-source-webpack-plugin or Webpack 5 or ... some other caching solution that works both for PnP and for NM

@arcanis to go back to the idea of a runtime benchmark, here are a few requirements for a good benchmark in my opinion, which I started to implement on my repo https://github.com/crubier/yarn-benchmarks :

  1. Testing several package configurations

    1. Tests done on a single package

    2. Tests done on single packages inside of a monorepo (large monorepo with 200 packages is good)

    3. Tests done on all packages of a monorepo (lerna style stuff)

  2. Testing several use cases

    1. Run yarn on a simple command (like yarn echo)

    2. Run yarn on a complex command involving only one package of a monorepo (like yarn build)

    3. Run yarn on a complex command involving many packages of a monorepo (like yarn jest at the root, with many jest "projects")

    4. Run yarn on one complex command per package of a monorepo (like yarn lerna run build )

The tests above should allow to benchmark different kinds of overheads that we have seen:

  • Overhead caused by pnp setup, on simple commands (I run yarn echo "ok" on 200 packages, it takes 20 minutes)
  • Overhead caused by pnp resolution, on commands that involve a lot of resolutions (like yarn build a complex package, it takes 4 times the time it would take without pnp)
  • Overhead caused by plugin-compat or other "hacks" like this. For example rewriting fs or glob has a significant performance impact in some cases
  • Overhead caused by interaction with 3rd party tooling like jest or storybook or webpack.

Besides the pnp resolution, I would also love to see a benchmark for the patched fs layer

Was this page helpful?
0 / 5 - 0 ratings

Related issues

mormahr picture mormahr  ยท  3Comments

danreg picture danreg  ยท  3Comments

juanpicado picture juanpicado  ยท  4Comments

dzintars picture dzintars  ยท  3Comments

thealjey picture thealjey  ยท  4Comments