Uglifyjs: Enhancement: Maps for string literals & reserved names

Created on 30 Jul 2014 · 9Comments · Source: mishoo/UglifyJS

UglifyJS currently leaves string literals and reserved function names untouched and in place.

In order to make the produced code even harder to read and even smaller, I suggest to introduce tables for (1.) string literals and (2.) reserved functions and properties.

For (1.): collect all string literals, remove duplicates, put all literals in a map, and refer to this map whenever a string literal is needed. For example, the original code:

var a = "aaa" + q + "bbb" + q + "aaa";
var b = "aaa" + w + "xxx" + w + "bbb";
var c = "Running " + 1 + " times";
var d = "Template %1 example %2".replace("%1", i).replace("%2", j);

should produce a string literals table with the short name 's':

var s = {
  a: "aaa",
  b: "bbb",
  c: "ccc",
  d: "xxx",
  e: "Running ",
  f: " times",
  g: "Template %1 example %2"
};

The original code should be transformed to:

var a = s.a + q + s.b + q + s.a;
var b = s.a + w + s.c + w + s.b;
var c = s.d + 1 + s.e;
var d = s.f.replace("%1", i).replace("%2", j);

For (2.), tables of all used reserved functions and property names could be built:

var f = {
  a: console.assert,
  b: alert,
  c: console.log,
  d: "prototype"
};

Occurances of console.assert should then be rewritten as f.a.
Access to the prototype could be transformed to myclass[f.d].a = ....

Source

rondonjon

👍1

All 9 comments

This should only be done if the length of duplicated strings is greater than the length of the table, which should be easy to figure out.

ghost on 1 Aug 2014

Disagree.

The idea behind the string table is not only to reduce the code size but also to obfuscate the code, since strings (and particularly debug statements) often reveal what the code around them is doing.

So even if the size would increase by a few bytes, I still think the uglifier should permit string tables to be created, maybe using an override option, e.g. --enforce-string-table?

rondonjon on 1 Aug 2014

Take the following random snippet from a Closure-Compiler-obfuscated application, for instance, and guess what it could be doing.

!0===r.De&&d.setAttribute("vector-effect","non-scaling-stroke");return d};r.prototype.Zv=function(b,c,f){c||(c="#000");isNaN(f)&&(f=1);var d=window.document.createElementNS(z.sb,"path");d.setAttribute("d",b);d.setAttribute("fill","none");0<f&&(d.setAttribute("stroke",c),d.setAttribute("stroke-width",f));!0===r.De&&d.setAttribute("vector-effect","non-scaling-stroke");return d};r.prototype.$v=function(b,c){c||(c="#000");var f=window.document.createElementNS(z.sb,"polyline");f.setAttribute("points",b);f.setAttribute("fill",c);!0===r.De&&f.setAttribute("vector-effect","non-scaling-stroke");

The code would be a lot more difficult to read if all the attribute names had been moved into a string table, and if reserved names such as window, document, createElementNS, setAttribute had been replaced with references to a property table. Imagine:

t.w instead of window
w[t.d] or just t.d instead of window.document (whereby the latter could raise some difficulties)
x[t.p] instead of x.prototype
f[t.s]() instead of f.setAttribute()

The string literals from this snippet occur many other times in the surrounding code, which is why I am suggesting a string table.

Plus, given the many repetitions of the reserved word setAttribute alone in the snippet above, I expect a significantly reduced size from a property table besides the benefit of the improved obfuscation.

rondonjon on 1 Aug 2014

Despite its name, UglifyJS is not an obfuscator. The main goal is compression (and such "obfuscation" will actually increase the code size after gzip).

Also, de-obfuscating such code is so easy that it's just not worth the trouble.

mishoo on 1 Aug 2014

👍1

The main goal is compression (and such "obfuscation" will actually increase the code size after gzip).

Because gzip always does such a bad job at removing frequently used text sequences, like [t.d].

Also, de-obfuscating such code is so easy that it's just not worth the trouble.

Right, why waste time on the removal of long, recurring, human-readable string literals when there's time for something useful, such as the optimization of ternary operation that equate to true or false.

Thanks anyway for even taking the time to respond. I guess I'm going to build a preprocessor step on my own then.

rondonjon on 1 Aug 2014

👎2

IMO, obfuscating JavaScript is a rather silly process to even consider. Are you trying to ensure that no one can read your code, as if they're going to steal it? You're giving them the source code no matter what you do; the kind of obfuscation that Uglify does is already enough for most people to give up on reading the code, let alone modifying it. If you're sending any network requests, people can just open up the network panel in their browser, and if you try to change the DOM, people will already be able to see the changes that you made. What kind of JavaScript are you trying to hide that you don't want people to know about that badly?

ghost on 3 Aug 2014

IMO, obfuscating JavaScript is a rather silly process to even consider. Are you trying
to ensure that no one can read your code, as if they're going to steal it? You're giving
them the source code no matter what you do;

Right. I am aware of how Javascript is delivered and executed in the runtime engine, and everything you say is true.

the kind of obfuscation that Uglify does is already enough for most people to give
up on reading the code.

Wrong. Please take a look at the example above. I have a web app with 700 kb of JS code here, and while some parts are indeed unreadable after being processed by uglifyjs or the closure compiler, others look almost unchanged -- due to string literals (with debug output, message texts, etc.) and repeated access to "reserved" names (e.g. prototype, createElement, ...).

Not that I'm going to waste my time this way, but if I was interested in how this application works and how to initiate this or modify that, I would consider such sections as a tempting invitation.

If you're sending any network requests, people can just open up the network panel in
their console, and if you try to change the DOM, people will already be able to see the
changes that you made.

That's a whole different story. I am talking about the plain readability of the code.

What kind of JavaScript are you trying to hide that you don't want people to know about that badly?

Let's keep this technical. My point is that uglifyJS neither compresses nor obfuscates code very well that contains a lot of reserved names and/or a lot of string literals. Backing out of that discussion with gzip is inconsistent. If compression is to be delegated to gzip, then uglifyJS already does loads of questionable work by removing spaces, shortening variable names etc., and could as well leave all that to gzip too, which is of course nonsense and raises the question: why not add another (did you say "silly"?) task for apps that currently fall through the cracks.

Please bare in mind that every serious compiler/linker combo in the past has worked with string and symbol tables for good reasons, and (altough JS has admittedly little in common with these) the removal of duplicate string literals has always been used to reduce the size of the output.

rondonjon on 4 Aug 2014

That's a whole different story. I am talking about the plain readability of the code.

Don't forget that - as @mishoo already mentioned - UglifyJS is not an obfuscator.

Backing out of that discussion with gzip is inconsistent.

It really isn't. Some things compress well with gzip, some don't. Your suggestion would remove lots of repeating patterns and replace it with smaller repeating patterns that will now occur in different places as well with more surrounding (non-related) context because you have to define them. It won't give better compression post-gzip, and that _is_ quite important.

Don't forget that most JS is delivered for execution over a network. while most compiled binaries are not. We have to look at more than just raw file size, because focusing on that might (for example) give you extremely bad performance (extra lookups etc.), more latency/network usage. etc.

rvanvelzen on 4 Aug 2014

If compression is to be delegated to gzip, then uglifyJS already does loads of questionable work by removing spaces, shortening variable names etc., and could as well leave all that to gzip too

Just to show that it's doing a pretty good job:

[/tmp] $ ls -l kendo.all.js 
-rw-rw-r-- 1 mishoo mishoo 4602662 aug  4 22:45 kendo.all.js
[/tmp] $ uglifyjs kendo.all.js -cm > kendo.all.min.js
[/tmp] $ ls -l kendo.all.min.js 
-rw-rw-r-- 1 mishoo mishoo 1738914 aug  4 22:46 kendo.all.min.js
[/tmp] $ gzip -c kendo.all.js | wc -c
807379
[/tmp] $ gzip -c kendo.all.min.js | wc -c
514478

So: that's a 4.6M JS file (huge, I know) which gets minified to 1.7M. And the difference between gzipping the original file and the minified file is almost 300K (which, again, is huge). That's why minifying before gzip makes sense.

Now, there was a huge patch in UglifyJS v1 that did half of what you're suggesting, with the notice _“Worsens the data compression ratio of gzip.”_ I'm not interested in doing something similar for V2.

mishoo on 4 Aug 2014

Was this page helpful?

0 / 5 - 0 ratings