Node: Consider minifying JS sources?

Created on 17 Apr 2017 · 24Comments · Source: nodejs/node

JS sources seem be shipped as is, converted into a .cc file. They could be shipped minified though. Has there been previous discussions around this?

Obviously publicly visible identifiers must not change. Advantages are faster scanning, lower shipping binary size, and slightly lower memory use due to shorter identifiers. Disadvantage would be worse debuggability, but solvable by making this an option or providing source maps; and different inlining decisions with Crankshaft, which is going to be the past soon though.

V8 does minify its source in a step in the js2c build target. The minifier script is kind of bad though, as it uses regexes for parsing. It serves its purpose okayish, but sometimes parses incorrectly, resulting in confusing bugs.

Another option is to not ship the JS sources at all, but the precompiled bytecode. This likely increases the binary size though. V8's uses about 4x memory in the startup snapshot for precompiled bytecode vs equivalent minified source. That would also increase GC pressure.

V8 Engine discuss lib / src

Source

hashseed

👍2 ❤1

Most helpful comment

Advantages are faster scanning, lower shipping binary size, and slightly lower memory use due to shorter identifiers.

^--- No none of those are an advantage until the values of those numbers are provided.

Disadvantages are obvious. In my experience, no one wants to see node internal line numbers... until they have a problem that leads to node internals, then they can't live without them.

sam-github on 17 Apr 2017

👍6 😄2

All 24 comments

I have a faint recollection about this from two years ago, but couldn't find pointers (@indutny?)
I would add an optimizer step (de-deopter).
+11 for source-maps.

refack on 17 Apr 2017

I've often considered this but have yet to form a solid opinion on it. The one major immediate impact it would have is on error stack traces, which would force it to be a semver-major change. I'm definitely open to exploring it if it can be done in a way that minimizes any potential for breaking changes.

jasnell on 17 Apr 2017

I'm leaning towards -1. Being able to look up the line number from a stack trace in the original source without any other tooling/resources is important to me.

mscdex on 17 Apr 2017

👍7

FWIW I think this should not actually be labelled with "V8 Engine" since the files I would consider for minifying are in lib/.

hashseed on 17 Apr 2017

But it's the closest? And it's V8 dependent?

refack on 17 Apr 2017

I'm leaning towards -1. Being able to look up the line number from a stack trace in the original source without any other tooling/resources is important to me.

@mscdex I partially agree, since in the bigger picture way more stack traces are created for userland code. Internal frames are almost useless to anyone but us. Also it might be solvable? (Easiest/fastest solution is to exclude from debug builds, or even make it flagable)

refack on 17 Apr 2017

Internal frames are almost useless to anyone but us.

I disagree. They are useful for everyone.

lpinca on 17 Apr 2017

👍6

Everyone? I'm sure you mean that we differ on the ratio of people who they are interested them 😉
From my experience with asynctrace (my own longjohn ancestor) the most common request was to filter those out...

refack on 17 Apr 2017

FWIW, there's my somewhat related February PR (https://github.com/nodejs/node/pull/11417), but I'm not really sure at this point that it is a good idea, not to mention minifying the sources completely.

aqrln on 17 Apr 2017

Advantages are faster scanning, lower shipping binary size, and slightly lower memory use due to shorter identifiers.

^--- No none of those are an advantage until the values of those numbers are provided.

Disadvantages are obvious. In my experience, no one wants to see node internal line numbers... until they have a problem that leads to node internals, then they can't live without them.

sam-github on 17 Apr 2017

👍6 😄2

I would suggest:

## Keep line numbers is a critical target !
### Keep frame names is a target

refack on 18 Apr 2017

Although I think that is we have a --no-minify flag for people to use for debug traces, it's a decent fallback.

refack on 18 Apr 2017

@hashseed do you know if source-maps are supported by the Stack-Trace-API
loc: MaybeHandle<Object> ErrorUtils::FormatStackTrace
Worse case we could use Error.prepareStackTrace

refack on 18 Apr 2017

No, and I would prefer not using prepareStackTrace for this. I could imagine a simple C++ callback that provides the source offset and source URL and expects a line/column pair in return.

hashseed on 18 Apr 2017

(everyone has their own favorite 🛠️) Where would you hook it to?
Could also return a triplet with func name.

refack on 18 Apr 2017

@hashseed @nodejs/v8 Do you have any thoughts on adding source map support to V8 itself (#6471)? I think if that were to happen, there wouldn’t be much standing in the way of this, right?

addaleax on 22 Apr 2017

Not sure whether it makes sense for V8 to support source maps internally, or just offer a callback mechanism to allow the embedder implement this (as mentioned in earlier).

hashseed on 22 Apr 2017

👍1

I don't see any point in minification of code that isn't being sent to some client web browser. How many bytes, and it would certainly be measured in less than megabytes since the current lib directory is 1.1MB, do you think will be saved by this proposal? I don't think it would be enough to warrant:

the added build complexity
the reduced clarity for users stepping through bugs

jsumners on 22 Apr 2017

I don't see any point in minification of code that isn't being sent to some client web browser. How many bytes, and it would certainly be measured in less than megabytes since the current lib directory is 1.1MB, do you think will be saved by this proposal? I don't think it would be enough to warrant:

IMHO binary size will not be a see benefit (especially if we add source-maps), but as @hashseed points out "faster scanning" and "slightly lower memory use" will benifit. Also some code "cosmetics" cause deopts (like character count of a function, including comments).

the added build complexity

Writing it once, yes, but manageable. Running it on every build IMHO negligible.

the reduced clarity for users stepping through bugs

This is a blocking issue, but a target to achieve anyway (source-maps support in debugger).

refack on 22 Apr 2017

@hashseed Yeah, I’m not sure either, but if Node wants to have source map support in general (and I think we’d want it for this feature, and users have been requesting it independently too), then practically speaking it’s something that all of V8’s major consumers use, right?

like character count of a function, including comments

That’s gone with TF+I, I wouldn’t care about this peculiar thing.

addaleax on 22 Apr 2017

👍1

After some internal discussions, I think it's unlikely that V8 will include support for source maps. To prevent XSS attacks, V8 would require, in a browser context, the source map to be the same origin as the script itself. For Node.js this is not an issue at all. I think it's a lot easier to only implement a callback so that the embedder can implement source map support.

hashseed on 24 Apr 2017

Ok, can we help?

refack on 25 Apr 2017

Advantages are faster scanning, lower shipping binary size, and slightly lower memory use due to shorter identifiers.

@hashseed, still, what are the numbers here?

I expect to see less than 0.5MiB binary size decrease (somewhere around 1% of the total binary size) and near-zero memory usage change if we don't include the sourcemaps.

I expect to see an _increase_ in distrubution size if we do include the sourcemaps. Not sure about the memory usage — that depends on when the sourcemaps will be loaded, but at the moment when they will be loaded — I expect to see an increase on the memory usage.

Also note that loading sourcemaps will either slow things down by default when errors are generated, or will require some kind of opt-in, and that will make things harder to debug (e.g. for issues reported on this issue tracker, etc.).

My current estimation is that it's simply not worth it, hence -1.