Parcel: [RFC] Automatically extract modules shared across multiple bundles into their own bundle

Created on 23 Feb 2018  ·  13Comments  ·  Source: parcel-bundler/parcel

Given the following output from today:

- bundle-1
  - module-1
  - module-2
  - module-3
- bundle-2
  - module-1
  - module-3
  - module-4
- bundle-3
  - module-4
  - module-5

Parcel should extract shared modules into their own bundles which should be side-linked with the bundle they are required in

- bundle-1
  - (link shared-bundle-1 for module-1 and module-3)
  - module-2
- bundle-2
  - (link shared-bundle-1 for module-1 and module-3)
  - (link shared-bundle-2 for module-4)
- bundle-3
  - (link shared-bundle-2 for module-4)
  - module-5
- shared-bundle-1
  - module-1
  - module-3
- shared-bundle-2
  - module-4

Having a linked bundles should modify the corresponding async import sites to include both the imported bundle and the linked bundles.

<script src="bundle-1"/>

Becomes

<script src="shared-bundle-1"/>
<script src="bundle-1"/>

This can produce much more efficient bundles with zero duplication and potentially greatly improves cachability.

- bundle-1
  - module-2
- bundle-2 (empty)
- bundle-3
  - module-5
- shared-bundle-1
  - module-1
  - module-3
- shared-bundle-2
  - module-4

The initial implementation can come in two stages:

  1. If a module is seen in more than one bundle, extract it into its own bundle
  2. If two modules are always seen together, put them in the same extracted bundle

This should all happen automatically without creating a "commons chunk" or anything manually configured. I'm convinced there is an ideal solution here that won't need to be configured any further. But it will probably take us a little while to get it right.


Note: Implementing an equivalent to Webpack's CommonsChunkPlugin has come up a few times. I think we should avoid ever creating something like that.

Creating one giant "vendor" bundle is actually the worst possible solution to this problem. It makes for an extremely low cache hit rate across multiple builds and adds the maximum amount of overhead to every bundle that needs it.

Feature RFC ✨ Parcel 2

Most helpful comment

Just wondering if there is a workaround until Parcel 2 arrives?

All 13 comments

Here is some example output from what happens today.

screen shot 2018-02-22 at 5 50 36 pm copy

Big +1 on it. This feature is very useful for many multiple page applications.

But one of my concern is how we choose the minimum size of common chunk (as minChunks option in CommonsChunkPlugin). This option is important for the strategy to generate shared-bundle.

If two modules are always seen together, put them in the same extracted bundle

"Two" is not the best for some large multiple page applications that contains 1000+ modules and lots of shared modules. Otherwise it will generate something like this (for worst case):

- bundle-1
  - (link shared-bundle-1 for module-1 and module-3)
  - (link shared-bundle-2 for module-5 and module-7)
  - (link shared-bundle-3 for module-9 and module-11)
  - (link shared-bundle-4 for module-4 and module-6)
  - (link shared-bundle-5 for module-8 and module-10)
  ...
  - module-2

Just notice that too much shared-bundle should also be avoided.

Long-term I think it will work out as a good strategy. Those bundles will tend to align with individual packages and their exclusive sub-dependencies. This will cache really well because they will only be invalidated when you upgrade that package.

I spoke to @addyosmani about initial load. Right now V8 kicks in streaming scripts at 30kb, but with the future of parallelized parsing of scripts small modules loaded asynchronously will likely be the bigger win long term. Firefox is already starting to do some of this work with their quantum project.

Additionally with the web packaging format (which I hope will it's mind about https://github.com/WICG/webpackage/issues/6), cache manifests, HTTP2 server push, and more, I expect many small bundles to win out over these massive bundles we're creating today.

Parcel should be aiming for what the web will long term rather than refilling the same space that other bundlers are already filling today. Addy is scheduling a meeting with members of Chrome team to talk about the ideal future and how we can align together over the coming years.

@jamiebuilds did you see webpack 4's new SplitChunkPlugin? I like the design a lot https://medium.com/webpack/webpack-4-code-splitting-chunk-graph-and-the-splitchunks-optimization-be739a861366

This wouldn't work out. Dynamically importing all modules independently is not the silver bullet. Let alone the header cost, it will erode the performance of the asset server/CDN as you will have to load a lot of packages through HTTP one by one when in need. Well, I'm not particularly familiar with static servers but my instinct tells me that getting so many connections at once is not a good thing.
It is also not very worth it to have so many cache entries in client-side browser. As far as I know, browser cache is merely implemented by a hash table and the access time will gradually rust, as the table grows bigger and more buckets/links are required.
Even that the downloading action itself is asynchronized, parsing cost is still a huge penalty.

@stevefan1999 All of those concerns are things browsers/specs are working to solve right now. We should be working with them towards that goal

I think we will need to add a few more heuristics here. I tried implementing this strategy and tried it on a fairly large app, which already had 6 split points defined with import(). This resulted in 66 files being produced, basically a JS + CSS + map file for most of the combinations of these 6 (22 JS files for example). So, in order to import one of the 6 split points, something like 11 JS files would need to be loaded in parallel. I guess some of those should probably get combined together in some way.

The duplication you were seeing was a bug in the logic for hoisting modules. See #1310. Now the modules will be properly hoisted up to the parent bundle and deduped.

This issue should still be left open though since that generally results in one giant bundle of all the common deps at the top rather than splitting things out in parallel.

This is implemented in #2401 and will be part of Parcel 2!

Is this related to the following scenario?

I have an entry point that contains several dynamic imports. Each of those imports share several large dependencies. When I build my project, all of those large shared dependencies are hoisted into my app.js which is effectively defeating the purpose of code splitting? I apologize if I'm misunderstanding.

Just wondering if there is a workaround until Parcel 2 arrives?

Given this has been implemented in Parcel 2, should we close out this ticket? If we'd prefer to keep it open for discussion, we can also remove it from the Parcel 2 Alpha 1 milestone.

Going to close as complete.

Was this page helpful?
0 / 5 - 0 ratings

Related issues

donaldallen picture donaldallen  ·  3Comments

humphd picture humphd  ·  3Comments

Niggler picture Niggler  ·  3Comments

termhn picture termhn  ·  3Comments

algebraic-brain picture algebraic-brain  ·  3Comments