This issue is a continuation from the Twitter thread discussing usage of webpack-bundle-analyzer with CRA.
webpack-bundle-analyzer
can be used either as a CLI, providing it the stats.json
webpack JSON output _and_ the built files. With these, we are able to:
Here's an example I built from a project using create-react-app
after first having to eject.
The changes I had to do after ejecting are in this commit: https://github.com/valscion/arvontakone/commit/fbbc3069ef7a152bce6eb2087d91beeda7a9ec36
EDIT: Here's a GIF of that visualization:
Ideally this wouldn't require any changes to the CRA code, but I don't think that is possible. We'd need to at least get our hands on the stats.json
that webpack can output if ran with the --json
flag.
Are we able to come up with some way of gaining access to the module stats from the outside, or would it fall under the category of providing configuration and being excluded from CRA?
I highly recommend the Bundle Buddy tool from @samccone
https://twitter.com/franklinjavier/status/931324987654443008?s=17
What's the latest thinking on this? Which tools work best?
I'm biased, naturally, but still think webpack-bundle-analyzer would be the best. I'm still contemplating about the best way to handle outputting stats.json from the build -- this issue https://github.com/webpack-contrib/webpack-bundle-analyzer/issues/119 makes me a bit unsure how much work we'd need to do in CRA to get the stats.json out without issues.
I haven't forgotten about this issue here, though. When I've had time with webpack-bundle-amalyzer, I've worked on the next major version PR https://github.com/webpack-contrib/webpack-bundle-analyzer/pull/97
OK, let me know if you need something on our side (as long as it doesn't negatively affect build perf).
Sure, will do βΊοΈ
A "lighter touch" could be to just have an option to output JSON, and then that JSON can be handled however the user sees fit (I prefer to use Chris Bateman's webpack visualizer).
I imagine an addition to package.json scripts like this:
"scripts": {
"start": "react-scripts start",
"build": "react-scripts build",
"build-stats": "react-scripts buildStats",
"test": "react-scripts test --env=jsdom",
"eject": "react-scripts eject"
}
build-stats
will just create a stats.json
in the root dir, which they can then use however they see fit. For the recommended webpack-bundle-analyzer
plugin, this can be done through their CLI utility, by adding another script (in userland) to the package.json:
"scripts": {
"start": "react-scripts start",
"build": "react-scripts build",
"build-stats": "react-scripts buildStats",
"test": "react-scripts test --env=jsdom",
"eject": "react-scripts eject",
+ "analyze": "webpack-bundle-analyzer stats.json"
}
I'm happy to work on this as a feature if folks agree this is the direction to go!
Yeah the stats.json
is indeed what webpack-bundle-analyzer
would also need. I think we've agreed that is a good direction. This point β "as long as it doesn't negatively affect build perf" β is the crucial one that might need some manual fiddling.
Feel free to work on this @joshwcomeau, it would be amazing :relaxed:
@joshwcomeau you probably want to be aware of this: https://github.com/webpack-contrib/webpack-bundle-analyzer/issues/119
Basically, if a bundle is large enough, JSON.stringify
on the stats object coming from webpack can crash the build for an out-of-memory error. Most likely this would also affect webpack
CLI itself when ran with --json
flag, as they seem to be using plain JSON.stringify
, too: https://github.com/webpack/webpack/blob/30329fb3648537ddc801b71f14904a3e762d4ad4/bin/webpack.js#L370-L371
I haven't reproduced this error myself, but if you find something, I'd love to hear about it.
This point β "as long as it doesn't negatively affect build perf" β is the crucial one that might need some manual fiddling.
Right, so I was proposing we don't do it as part of the main build
task; I have a hard time imagining it not affecting build perf, since it's non-trivial extra work being done. By having it as a separate task (eg. build-stats
or something), you can run that task as-needed. In my experience, this isn't something that needs to be done with every build, it's more something you do occasionally when you notice your bundle getting bloated. As a bonus, that separate task might be even faster? Although for that, I'm really not sure, don't know enough (or, really, anything) about Webpack internals.
@joshwcomeau you probably want to be aware of this: webpack-contrib/webpack-bundle-analyzer#119
Ah, interesting! I've never run into that myself. Do you think it's something we need to be concerned about? It seems like a pretty edge-case thing (otherwise I'm sure I would have run into it by now). My instinct is that we proceed anyway, and hope that a fix is implemented within webpack shortly.
I'll spend some time tomorrow looking at this.
I'm in favor of the separate entry point npm script proposal by @joshwcomeau, https://github.com/facebookincubator/create-react-app/issues/1858#issuecomment-357458170 - the webpack visualizer they're referring to and webpack-bundle-analyzer alike are both great because of their interactivity. I actually made a workaround with a similar approach https://github.com/facebookincubator/create-react-app/issues/3518 that can be used without ejecting.
I don't think we need to be that concerned about build performance here because, in my experience, this is not information most users will find very useful on every recompile. Usually, I check these tools every few days to make sure that the bundle size hasn't grown much larger than expected, due to a recently added dependency.
Ah, interesting! I've never run into that myself. Do you think it's something we need to be concerned about? It seems like a pretty edge-case thing (otherwise I'm sure I would have run into it by now).
We should care, one way or the other. If we go the direction of adding a new react-scripts
task (like the buildStats
you proposed), then this is not crucial β building the project normally would still work. We could tell about this edge case in the documentation.
However, if we were to create the stats.json
file on react-scripts build
, that would change things. The error coming from JSON.stringify
is not rescuable, so we couldn't just slap a try { } rescue
block around the call, as the JSON.stringify
call will _crash_ the node process itself. It would be awful if some users of create-react-app
with huge applications would no longer be able to build their app at all.
Nevertheless, both approaches could benefit if we'd change JSON.stringify
to some other code supporting streaming JSON stringification. Mainly, it could be much faster if the stringification of a huge object would be chunked. https://github.com/philbooth/bfj package seems like a nice solution.
My instinct is that we proceed anyway, and hope that a fix is implemented within webpack shortly.
Hmm I'm not so sure about whether webpack would fix it in any short time period. If the stats.toJson
method would start crashing, then I'd bet they would do something. But if it is only the stringifying part, then it seems like more of a power user feature to be able generate a big JSON file.
Did some digging, I have some questions.
It turns out the Node API for webpack supplies the stats object as part of the build process.
I still need to figure out exactly how to get the stats object to match the options when using the CLI, but one thing's for sure: we will need to JSON.stringify
it so we can write it to file, which means that performance irrespective, I think it should be a flag to the build script.
For the bug being discussed about stringifying blowing up on huge JSON blobs, I imagine there's some kind of way to stream it? But
(I did do some perf checks and it seems like it'll take around 20ms to stringify and write to file, on a 2014 iMac, with a small project. I imagine it could be quite a bit longer on a larger project, but given this is an opt-in thing that won't run automatically, I think that's ok?)
So yeah, I think the best approach would be to have a CLI flag for the build script, since it's essentially the same script. For an ejected project:
"scripts": {
"start": "node scripts/start.js",
"build": "node scripts/build.js",
"build:with-stats": "node scripts/build.js --stats",
"test": "node scripts/test.js --env=jsdom"
}
I can use the convention established in the test script, and just use process.argv
to get whether or not the flag was provided. And then I can use that to optionally do something like this:
const resolveArgs = {
stats,
previousFileSizes,
warnings: messages.warnings,
};
if (buildStats) {
fs.writeFile('bundle-stats.json', JSON.stringify(stats.toJson()), err => {
if (err) {
reject(err);
}
return resolve(resolveArgs);
});
} else {
return resolve(resolveArgs);
}
Lemme know if this sounds good (or what ought to be changed) and I'll get a PR submitted!
EDIT: Ah, just saw the comment about bfj. Yeah, that works as well! I misunderstood originally and thought the stringification happened within Webpack. Since we have control over it, we may as well use something like bjf.
Something with streaming probably sounds good to me
As far as I'm concerned, your thoughts seem good to me, @joshwcomeau :+1:
It might be useful to write the bundle-stats.json
to the same directory as the built files (was it dist/
? It's been a while I ran CRA.), as that will let you only use one argument to the webpack-bundle-analyzer
CLI.
The CLI will by default read the bundled files from the same folder as the stats JSON file is in β if the stats JSON file is in a different directory than the bundle contents, CLI will need you to supply the bundle directory as the second argument.
Sorry for the delay in this. It's a particularly busy week for me, traveling for work. Haven't had the time I expected to work on this! Will try and pick it up over the coming weekend.
It might be useful to write the bundle-stats.json to the same directory as the built files (was it dist/? It's been a while I ran CRA.), as that will let you only use one argument to the webpack-bundle-analyzer CLI.
Hmm... so my concern with that is that I think of the /build
directory as the shippable production assets. My deploy script, for example, just copies that directory to the server. It doesn't fit, semantically, with the rest of the directory's contents, IMO.
I suspect it would cause more confusion as folks try and figure out where the stats file is written. Because of how I view the /build
directory, I don't think it'd occur to me to check it for the stats file. Feels surprising to me, and I'm not sure if the benefit of being able to omit an arg for webpack-bundle-analyzer
is worth the trade-off.
Maybe that's just me, though? Curious to get someone else's take on it, if /build
is an intuitive place for the stats JSON to be written.
I think you have valid points. We should then figure out how we can help users get the stats reported correctly. I think documenting this will be key :). Users will have to remember to use both CLI arguments with CRA.
webpack-bundle-analyzer ./build-stats.json ./build
Curious to get someone else's take on it, if /build is an intuitive place for the stats JSON to be written.
@joshwcomeau I understand not wanting to include the stats file in the build folder because it's not a production asset, but the /build
folder is the first place I would look for stats.json
, considering it's an output of the build process.
My 2 cents when it comes to exporting these types of perf/stats from build pipelines
TL;DR
Add the ability for users to output full source included sourcemaps β making it trivial to use any sourcemap compatible bundle analyzer.
less tl;dr
My current thinking is that the webpack stats object only gives a small subset of the entire picture and is coupled to the a specific build pipeline :/ which encourages hyper specific tooling that is not portable across other bundle analysis pipelines.
What is the most common denominator when it comes to looking and understanding bundles then? I would say it is source-maps.
The nice thing about sourcemaps with embedded source (debug sourcemaps in webpack)
is that the sourcemaps are a portable bundle representation that do a pretty excellent job of capturing the entire picture of what is in a bundle or bundles for a given application.
From a user's perspective what kind of questions arise when they are looking at a bundle or sets of bundles?
Based on my user research these are the questions that continually pop up.
Luckily of all these questions can be answered via source maps introspection except for (why is lib X making it into my bundle).
For this reason I think that if CRA wants to make it easy for users to understand their bundle without having to eject their webpack config, outputting debug source maps in dev mode by default and optionally outputting debug source maps for a prod build is a solid approach that will be maximally is portable across tools.
As always thanks for your time and everyone's thoughts.
I think its really valuable to get a better understanding of bundling by looking at the output. If #3145 makes it into 2.0 it will be even more useful.
Generating the stats.json seems to be a fair way so that the user can pick the tool of flavour for them self.
@valscion are you up for a PR? I can help with docs.
@valscion are you up for a PR? I can help with docs.
Which PR are you talking about? I think @joshwcomeau might be the one you should be asking here :relaxed:
@valscion Even i could manage to get the json file out of webpack going but you seemed to have knowledge about some concerns regarding it. Anyway I will appreciate and be thankful for anyone who wants to make an effort!
Interesting, @samccone! I've never heard of sourcemaps being used for anything other than understanding stack traces in dev.
Of the points you listed, are there any that stats.json can't answer? Of the webpack-specific analyzers/visualizers I've seen, none have tackled finding duplication across bundles, although it feels like it really ought to be possible.
I suppose from my perspective, it's about balancing the benefit of a tool-agnostic method vs. supporting what users actually use. I've been meaning to try Bundle Buddy for a while, but I'd also hate to lose the tool I already use, Webpack Visualizer. I'd wager that Webpack Bundle Analyzer is the most popular tool used by folks, and it feels like an incomplete solution if we don't support that. Unless we think tool-agnosticism is important enough to override that?
My personal sentiment is that we should support the most common pattern, since ejecting can handle more advanced/niche usecases. Just based on NPM downloads it looks like webpack-bundle-analyzer is leading the pack atm. Can always revisit in the future if that changes (or support both?)
For the location of stats.json
in /build
, my preference is weakly-held. If folks think it makes more sense to output it in /build
, I'm alright with that!
My life continues to be busier than I expect. I spent some time this morning playing with this; I still need to test everything, but I think I should have a PR up in the next day or two. That said, I don't want to make any promises I can't keep! If anyone's getting impatient, don't let me be a blocker on this.
Based on my comment above I put a design document for understanding bundles via sourcemaps
https://docs.google.com/document/d/1ycGVBJmwIVs34yhC0oTqv_WH5f0fs2SAFmzyTiBK99k/edit
your comments are appreciated!
--
, none have tackled finding duplication across bundles, although it feels like it really ought to be possible.
It is possible, bundle-buddy does just that https://bundle-buddy.firebaseapp.com/
but I'd also hate to lose the tool I already use, Webpack Visualizer
There is actually no reason that I know of why webpack vis can not gather its info from the sourcemaps vs stats.json :)
Add the ability for users to output full source included sourcemaps β making it trivial to use any sourcemap compatible bundle analyzer.
Don't we already do that?
Cool doc, @samccone!
It is possible, bundle-buddy does just that https://bundle-buddy.firebaseapp.com/
I know :) I mentioned it earlier. I meant specifically tools that use Webpack's stats.json
There is actually no reason that I know of why webpack vis can not gather its info from the sourcemaps vs stats.json :)
For sure, but "it's possible" and "it exists" are two very different things. I'd like to support the tools as they are now, not as they could be. Unless I've misunderstood, and you're saying that these tools already support it?
I think I'm gonna submit my PR as-is, with support for stats.json
. Important to stress that I'm not saying that it's a better solution than sourcemaps! It's just the one that I believe solves the greatest user need as it is now (based on npm download numbers).
If someone else wants to add support for sourcemaps, I think it'd be pretty straightforward! I'd imagine we just need to find how they're being generated in dev, hook into it in prod, and write them to the build dir. But yeah, I do think we want both, and I don't have the time to tack on another task. Happy to coordinate with someone else on that though.
Oh, there's totally already support for sourcemaps, at least according to the README template. Yay solving for both usecases!
If anyone from earlier in the discussion wants to take a gander at the code, PR is here: https://github.com/facebook/create-react-app/pull/3945
Yassss #3945 was merged so this issue can be closed, I think! :tada:
:100: :+1: work @joshwcomeau
Most helpful comment
My 2 cents when it comes to exporting these types of perf/stats from build pipelines
TL;DR
Add the ability for users to output full source included sourcemaps β making it trivial to use any sourcemap compatible bundle analyzer.
less tl;dr
My current thinking is that the webpack stats object only gives a small subset of the entire picture and is coupled to the a specific build pipeline :/ which encourages hyper specific tooling that is not portable across other bundle analysis pipelines.
What is the most common denominator when it comes to looking and understanding bundles then? I would say it is source-maps.
The nice thing about sourcemaps with embedded source
(debug sourcemaps in webpack)
is that the sourcemaps are a portable bundle representation that do a pretty excellent job of capturing the entire picture of what is in a bundle or bundles for a given application.From a user's perspective what kind of questions arise when they are looking at a bundle or sets of bundles?
Based on my user research these are the questions that continually pop up.
Luckily of all these questions can be answered via source maps introspection except for (why is lib X making it into my bundle).
For this reason I think that if CRA wants to make it easy for users to understand their bundle without having to eject their webpack config, outputting debug source maps in dev mode by default and optionally outputting debug source maps for a prod build is a solid approach that will be maximally is portable across tools.
As always thanks for your time and everyone's thoughts.