Node: build: introduce zopfli compression

Created on 5 Jan 2017  路  11Comments  路  Source: nodejs/node

  • Version: any
  • Platform: any unix os with .gz tarball release
  • Subsystem: build

Dear all node maintainer,

Not sure if you heard of zopfli or not, I spend a while to test its performance and impact on node release tarball, wanna share the result with you, wonder if you'll like to accept a PR to use it?

zopfli is a fully gz compatible compression algorithm introduced by Google in 2013, with additional 5% size saving but may be 80 times slower when compression (but won't be performance issue when decompression), used to compress the static resource like release tarball, since we can compress it once but use it billion times, so that we can save significant bandwidth and disk space. It's already been included in Ubuntu, Debian, Fedora Linux distros, and also FreeBSD, for the detail, there are some references:

I just take node-v7.4.0-linux-x64.tar.gz release tarball as an example, compare time + size impact on my computer with both origin gzip -9 and different iterations of zopfli compression
( gzip -9, zopfli --i1, zopfli --i9, zopfli --i19, zopfli --i50)

$ time gzip -9 node-v7.4.0-linux-x64.tar

real    0m6.452s
user    0m6.442s
sys     0m0.008s

size: 15537444
$ time zopfli --i1 node-v7.4.0-linux-x64.tar

real    1m49.536s
user    1m49.452s
sys     0m0.052s

node-v7.4.0-linux-x64.tar.gz 15537444 -> 14975152 (96.38%), 16 times slower
$ time zopfli --i9 node-v7.4.0-linux-x64.tar

real    3m44.684s
user    3m44.546s
sys     0m0.068s

node-v7.4.0-linux-x64.tar.gz 15537444 -> 14937034 (96.13%), 34 times slower
$ time zopfli --i19 node-v7.4.0-linux-x64.tar

real    6m11.833s
user    6m11.656s
sys     0m0.068s

node-v7.4.0-linux-x64.tar.gz 15537444 -> 14935402 (96.13%), 56 times slower
$ time zopfli --i50 node-v7.4.0-linux-x64.tar

real    13m49.044s
user    13m48.278s
sys     0m0.064s

node-v7.4.0-linux-x64.tar.gz 15537444 -> 14933583 (96.11%), 168 times slower

So the time will at least increase 16 times, about 1 min 43 secs on my computer (E3-1220 V2 @ 3.10GHz CPU with DDR3 1333 8GB Ram), the size will be reduced to 96.38%, if we give it more time and iterations, the size could be smaller, but the effect won't growth that significantly, that's the tradeoff.

I'm not sure how many download times per release will have, since we don't release new version everyday (except nightly build, but we can disable zopfli on nightly build, just use origin gzip on it), give each release few more minutes, save about 4% size on the gz release tarball, save both bandwidth and disk space on the nodejs side, user side, may worth it.

Just FYI, I also tested zopfli compression result with all the gz tarballs but different compress iterations as below:

zopfli --i1 size changes:

node-v7.4.0-headers.tar.gz          483170 ->   460281 (95.26%)
node-v7.4.0-darwin-x64.tar.gz     13416624 -> 12893965 (96.10%)
node-v7.4.0-linux-arm64.tar.gz    14711615 -> 14155294 (96.21%)
node-v7.4.0-linux-armv6l.tar.gz   14687810 -> 14147345 (96.32%)
node-v7.4.0-linux-armv7l.tar.gz   14666298 -> 14122855 (96.29%)
node-v7.4.0-linux-ppc64.tar.gz    15675943 -> 14887267 (94.96%)
node-v7.4.0-linux-ppc64le.tar.gz  15335840 -> 14750604 (96.18%)
node-v7.4.0-linux-s390x.tar.gz    15952274 -> 15138093 (94.89%)
node-v7.4.0-linux-x64.tar.gz      15537444 -> 14975152 (96.38%)
node-v7.4.0-linux-x86.tar.gz      14999886 -> 14469933 (96.46%)
node-v7.4.0-sunos-x86.tar.gz      15309353 -> 14751377 (96.35%)
node-v7.4.0.tar.gz                27904025 -> 26594185 (95.30%)

zopfli --i9 size changes:

node-v7.4.0-headers.tar.gz          483170 ->   458036 (94.79%)
node-v7.4.0-darwin-x64.tar.gz     13416624 -> 12848587 (95.76%)
node-v7.4.0-linux-arm64.tar.gz    14711615 -> 14111936 (95.92%)
node-v7.4.0-linux-armv6l.tar.gz   14687810 -> 14102389 (96.01%)
node-v7.4.0-linux-armv7l.tar.gz   14666298 -> 14080710 (96.00%)
node-v7.4.0-linux-ppc64.tar.gz    15675943 -> 14826696 (94.58%)
node-v7.4.0-linux-ppc64le.tar.gz  15335840 -> 14710986 (95.92%)
node-v7.4.0-linux-s390x.tar.gz    15952274 -> 15089968 (94.59%)
node-v7.4.0-linux-x64.tar.gz      15537444 -> 14937034 (96.13%)
node-v7.4.0-linux-x86.tar.gz      14999886 -> 14435642 (96.23%)
node-v7.4.0-sunos-x86.tar.gz      15309353 -> 14712741 (96.10%)
node-v7.4.0.tar.gz                27904025 -> 26472385 (94.86%)

zopfli --i50 size changes:

node-v7.4.0-headers.tar.gz          483170 ->   457820 (94.75%)
node-v7.4.0-darwin-x64.tar.gz     13416624 -> 12843218 (95.72%)
node-v7.4.0-linux-armv7l.tar.gz   14666298 -> 14075986 (95.97%)
node-v7.4.0-linux-armv6l.tar.gz   14687810 -> 14096693 (95.97%)
node-v7.4.0-linux-x86.tar.gz      14999886 -> 14431547 (96.21%)
node-v7.4.0-linux-arm64.tar.gz    14711615 -> 14109192 (95.90%)
node-v7.4.0-linux-x64.tar.gz      15537444 -> 14933583 (96.11%)
node-v7.4.0-linux-s390x.tar.gz    15952274 -> 15086037 (94.56%)
node-v7.4.0-linux-ppc64.tar.gz    15675943 -> 14821324 (94.54%)
node-v7.4.0-linux-ppc64le.tar.gz  15335840 -> 14706944 (95.89%)
node-v7.4.0-sunos-x86.tar.gz      15309353 -> 14703688 (96.04%)
node-v7.4.0.tar.gz                27904025 -> 26461851 (94.83%)

What do you guys think?

Thanks for your time :)

build feature request

Most helpful comment

BTW I'll like to send a pull request for it, just wanna make sure @nodejs/build would accept it first. Thanks!

All 11 comments

@nodejs/build @nodejs/release Hello guys! Would you like to give some suggestions? Thanks!

Ping @nodejs/build again? I think you are the only ones who can actually influence this kind of thing. It doesn鈥檛 sound unreasonable to me, fwiw.

Thanks @addaleax !

BTW I'll like to send a pull request for it, just wanna make sure @nodejs/build would accept it first. Thanks!

hmmmm... still no any response?

@PeterDaveHello you might want to open a issue in the build repo.

But why? We already have xz for the space aware. Also, you omit memory requirements. Seeing how we bake releases on everything from rpi to multi-core we have limited resources.

We still have gz release which means it's still important, isn't it?

@PeterDaveHello said:
We still have gz release which means it's still important, isn't it?

Not sure how to interpret your answer. If users are "byte savings aware" they won't use it; so why go through the complexity and potential resource issues to save a few percent on the gz files? Run the same tests on a raspberry pi and lets review the results. How long does it take? How much memory does it use?

IMO, it's not about if the users care about the byte saving or not, gz support is so widely that xz can't be, I believe that we didn't drop it because it's still a very important compression format, and I think we can distribute a smaller images which can save our and users' disk space, bandwidth.

I'm not sure how would the memory consumption test should be here, for the time consuming, maybe we can migrate to multi-thread compression at the same time to minimize the time spent.

It seems like perhaps this should be closed. Feel free to re-open (or leave a comment requesting that it be re-opened) if you disagree. I'm just tidying up and not acting on a super-strong opinion or anything like that.

Was this page helpful?
0 / 5 - 0 ratings

Related issues

stevenvachon picture stevenvachon  路  3Comments

ksushilmaurya picture ksushilmaurya  路  3Comments

willnwhite picture willnwhite  路  3Comments

loretoparisi picture loretoparisi  路  3Comments

filipesilvaa picture filipesilvaa  路  3Comments