Https-everywhere: Use a compression algo better than gzip

Created on 18 Sep 2018  路  11Comments  路  Source: EFForg/https-everywhere

Type: code issue

I personally use xz -9e or 7z with ppmd algo depending on what is better.

sign-rulesets

Most helpful comment

LZMA2 is very memory- and CPU-hungry. Not an option for battery-powered resource-constrained devices.

All 11 comments

Good luck implementing decompression in JS.

There is also brotli: https://github.com/foliojs/brotli.js
But I'm not sure if it's worth to implement a whole library to get smaller files.

LZMA2 is very memory- and CPU-hungry. Not an option for battery-powered resource-constrained devices.

Currently, the compression ratio is around 19.4%. I doubt if we can get much better by simply switching the algorithm.

$ cat .git/HEAD
e14027653a97a9eb683e35b674dfc724a9366c47
$ python utils/merge-rulesets.py
$ du -sh rules/default.rulesets 
6.7M    rules/default.rulesets

$ wget https://www.https-rulesets.org/v1/default.rulesets.1537387579.gz
$ du -sh default.rulesets.1537387579.gz
1.3M    default.rulesets.1537387579.gz

@cschanaj xz -9e should give 1.1 MiB.

Yes. But this is a breaking change in the API on https://www.https-rulesets.org/v1/ which might affect the downstream project. It also affect how third-parties update channels should provide their files if we modify the way decompression works within the extension. I am not really sure if this is totally worthy.

brotli seems to perform a little bit better than xz. Maybe we could use Content-Encoding? https://caniuse.com/#search=brotli

Maybe we could use Content-Encoding?

This issue was about current storage format. I mean that currently it stores base64-encoded gzip in json, b64 is inefficient, using json also flash writes. So the fast way to reduce amount of data written into flash before #16551 is implemented is to just change compression algo.

What you propose is to enable live decompression, this would require recompression to write them into flash (or we can not to write them in flash at all and keep this in memory, and I guess it would be optimal since to access the sites from the list a user may need Internet connection, the list is updated regularily, so we can just download it (or a diff to a stable version bundled with the addon) on browser start and keep in memory).

Given the clarification in https://github.com/EFForg/https-everywhere/issues/16552#issuecomment-423810925, I agree that the rulesets storage can be optimized. Moreover, this can be done without affecting the APIs. Maybe this issue can be merged with #16551?

Just 50 cents about compression algos:

$ ls -go --time-style=+"" -Sr
total 23720
-rw-r--r-- 1 1046871  default.rulesets.7zip.ppmd.zip
-rw-r--r-- 1 1059897  default.rulesets.7zip.PPMD.7z
-rw-r--r-- 1 1059897  default.rulesets.1537387579.7zip.LZMA.7z
-rw-r--r-- 1 1098768  default.rulesets.1537387579.xz
-rw-r--r-- 1 1099516  default.rulesets.1537387579.7zip.xz
-rw-r--r-- 1 1099622  default.rulesets.1537387579.7zip.lzma2.7z
-rw-r--r-- 1 1099697  default.rulesets.1537387579.br
-rw-r--r-- 1 1099777  default.rulesets.1537387579.7zip.lzma.zip
-rw-r--r-- 1 1199801  default.rulesets.1537387579.7zip.bz2
-rw-r--r-- 1 1199963  default.rulesets.1537387579.7zip.bzip2.7z
-rw-r--r-- 1 1199989  default.rulesets.1537387579.7zip.bzip2.zip
-rw-r--r-- 1 1223669  default.rulesets.1537387579.7zip.deflate64.zip
-rw-r--r-- 1 1247129  default.rulesets.1537387579.7zip.gz
-rw-r--r-- 1 1247271  default.rulesets.1537387579.7zip.deflate.zip
-rw-r--r-- 1 1316565  default.rulesets.1537387579.gz
-rw-r--r-- 1 6963578  default.rulesets.1537387579

All compressed with the maximum compression available in tools. For 7zip GUI was used to set up the commpression params. The sizes are the sizes of files with headers which can cause differences. Some compressiion libs allow to eliminate some headers, keeping this info in other place.

ppmd is expectedly the best since it is optimized for compressing text.

Was this page helpful?
0 / 5 - 0 ratings

Related issues

margre8 picture margre8  路  3Comments

a0193143 picture a0193143  路  4Comments

the8472 picture the8472  路  4Comments

jsha picture jsha  路  3Comments

00h-i-r-a00 picture 00h-i-r-a00  路  4Comments