Dear all node maintainer,
Not sure if you heard of zopfli or not, I spend a while to test its performance and impact on node release tarball, wanna share the result with you, wonder if you'll like to accept a PR to use it?
zopfli is a fully gz compatible compression algorithm introduced by Google in 2013, with additional 5% size saving but may be 80 times slower when compression (but won't be performance issue when decompression), used to compress the static resource like release tarball, since we can compress it once but use it billion times, so that we can save significant bandwidth and disk space. It's already been included in Ubuntu, Debian, Fedora Linux distros, and also FreeBSD, for the detail, there are some references:
I just take node-v7.4.0-linux-x64.tar.gz release tarball as an example, compare time + size impact on my computer with both origin gzip -9 and different iterations of zopfli compression
( gzip -9, zopfli --i1, zopfli --i9, zopfli --i19, zopfli --i50)
$ time gzip -9 node-v7.4.0-linux-x64.tar
real 0m6.452s
user 0m6.442s
sys 0m0.008s
size: 15537444
$ time zopfli --i1 node-v7.4.0-linux-x64.tar
real 1m49.536s
user 1m49.452s
sys 0m0.052s
node-v7.4.0-linux-x64.tar.gz 15537444 -> 14975152 (96.38%), 16 times slower
$ time zopfli --i9 node-v7.4.0-linux-x64.tar
real 3m44.684s
user 3m44.546s
sys 0m0.068s
node-v7.4.0-linux-x64.tar.gz 15537444 -> 14937034 (96.13%), 34 times slower
$ time zopfli --i19 node-v7.4.0-linux-x64.tar
real 6m11.833s
user 6m11.656s
sys 0m0.068s
node-v7.4.0-linux-x64.tar.gz 15537444 -> 14935402 (96.13%), 56 times slower
$ time zopfli --i50 node-v7.4.0-linux-x64.tar
real 13m49.044s
user 13m48.278s
sys 0m0.064s
node-v7.4.0-linux-x64.tar.gz 15537444 -> 14933583 (96.11%), 168 times slower
So the time will at least increase 16 times, about 1 min 43 secs on my computer (E3-1220 V2 @ 3.10GHz CPU with DDR3 1333 8GB Ram), the size will be reduced to 96.38%, if we give it more time and iterations, the size could be smaller, but the effect won't growth that significantly, that's the tradeoff.
I'm not sure how many download times per release will have, since we don't release new version everyday (except nightly build, but we can disable zopfli on nightly build, just use origin gzip on it), give each release few more minutes, save about 4% size on the gz release tarball, save both bandwidth and disk space on the nodejs side, user side, may worth it.
Just FYI, I also tested zopfli compression result with all the gz tarballs but different compress iterations as below:
zopfli --i1 size changes:
node-v7.4.0-headers.tar.gz 483170 -> 460281 (95.26%)
node-v7.4.0-darwin-x64.tar.gz 13416624 -> 12893965 (96.10%)
node-v7.4.0-linux-arm64.tar.gz 14711615 -> 14155294 (96.21%)
node-v7.4.0-linux-armv6l.tar.gz 14687810 -> 14147345 (96.32%)
node-v7.4.0-linux-armv7l.tar.gz 14666298 -> 14122855 (96.29%)
node-v7.4.0-linux-ppc64.tar.gz 15675943 -> 14887267 (94.96%)
node-v7.4.0-linux-ppc64le.tar.gz 15335840 -> 14750604 (96.18%)
node-v7.4.0-linux-s390x.tar.gz 15952274 -> 15138093 (94.89%)
node-v7.4.0-linux-x64.tar.gz 15537444 -> 14975152 (96.38%)
node-v7.4.0-linux-x86.tar.gz 14999886 -> 14469933 (96.46%)
node-v7.4.0-sunos-x86.tar.gz 15309353 -> 14751377 (96.35%)
node-v7.4.0.tar.gz 27904025 -> 26594185 (95.30%)
zopfli --i9 size changes:
node-v7.4.0-headers.tar.gz 483170 -> 458036 (94.79%)
node-v7.4.0-darwin-x64.tar.gz 13416624 -> 12848587 (95.76%)
node-v7.4.0-linux-arm64.tar.gz 14711615 -> 14111936 (95.92%)
node-v7.4.0-linux-armv6l.tar.gz 14687810 -> 14102389 (96.01%)
node-v7.4.0-linux-armv7l.tar.gz 14666298 -> 14080710 (96.00%)
node-v7.4.0-linux-ppc64.tar.gz 15675943 -> 14826696 (94.58%)
node-v7.4.0-linux-ppc64le.tar.gz 15335840 -> 14710986 (95.92%)
node-v7.4.0-linux-s390x.tar.gz 15952274 -> 15089968 (94.59%)
node-v7.4.0-linux-x64.tar.gz 15537444 -> 14937034 (96.13%)
node-v7.4.0-linux-x86.tar.gz 14999886 -> 14435642 (96.23%)
node-v7.4.0-sunos-x86.tar.gz 15309353 -> 14712741 (96.10%)
node-v7.4.0.tar.gz 27904025 -> 26472385 (94.86%)
zopfli --i50 size changes:
node-v7.4.0-headers.tar.gz 483170 -> 457820 (94.75%)
node-v7.4.0-darwin-x64.tar.gz 13416624 -> 12843218 (95.72%)
node-v7.4.0-linux-armv7l.tar.gz 14666298 -> 14075986 (95.97%)
node-v7.4.0-linux-armv6l.tar.gz 14687810 -> 14096693 (95.97%)
node-v7.4.0-linux-x86.tar.gz 14999886 -> 14431547 (96.21%)
node-v7.4.0-linux-arm64.tar.gz 14711615 -> 14109192 (95.90%)
node-v7.4.0-linux-x64.tar.gz 15537444 -> 14933583 (96.11%)
node-v7.4.0-linux-s390x.tar.gz 15952274 -> 15086037 (94.56%)
node-v7.4.0-linux-ppc64.tar.gz 15675943 -> 14821324 (94.54%)
node-v7.4.0-linux-ppc64le.tar.gz 15335840 -> 14706944 (95.89%)
node-v7.4.0-sunos-x86.tar.gz 15309353 -> 14703688 (96.04%)
node-v7.4.0.tar.gz 27904025 -> 26461851 (94.83%)
What do you guys think?
Thanks for your time :)
@nodejs/build @nodejs/release Hello guys! Would you like to give some suggestions? Thanks!
Ping @nodejs/build again? I think you are the only ones who can actually influence this kind of thing. It doesn鈥檛 sound unreasonable to me, fwiw.
Thanks @addaleax !
BTW I'll like to send a pull request for it, just wanna make sure @nodejs/build would accept it first. Thanks!
hmmmm... still no any response?
@PeterDaveHello you might want to open a issue in the build repo.
But why? We already have xz for the space aware. Also, you omit memory requirements. Seeing how we bake releases on everything from rpi to multi-core we have limited resources.
We still have gz release which means it's still important, isn't it?
@PeterDaveHello said:
We still have gz release which means it's still important, isn't it?
Not sure how to interpret your answer. If users are "byte savings aware" they won't use it; so why go through the complexity and potential resource issues to save a few percent on the gz files? Run the same tests on a raspberry pi and lets review the results. How long does it take? How much memory does it use?
IMO, it's not about if the users care about the byte saving or not, gz support is so widely that xz can't be, I believe that we didn't drop it because it's still a very important compression format, and I think we can distribute a smaller images which can save our and users' disk space, bandwidth.
I'm not sure how would the memory consumption test should be here, for the time consuming, maybe we can migrate to multi-thread compression at the same time to minimize the time spent.
It seems like perhaps this should be closed. Feel free to re-open (or leave a comment requesting that it be re-opened) if you disagree. I'm just tidying up and not acting on a super-strong opinion or anything like that.
Most helpful comment
BTW I'll like to send a pull request for it, just wanna make sure @nodejs/build would accept it first. Thanks!