Hello!
zstd is more faster then gzip with same compression(without overhead for used resourses). It be good if zfs will have support zstd for compression dataset like gzip, lz4 and etc.
Benchmark from zstd -
| Compressor name | Ratio | Compression| Decompress.|
| --------------- | ------| -----------| ---------- |
| zstd 1.1.3 -1 | 2.877 | 430 MB/s | 1110 MB/s |
| zlib 1.2.8 -1 | 2.743 | 110 MB/s | 400 MB/s |
| brotli 0.5.2 -0 | 2.708 | 400 MB/s | 430 MB/s |
| quicklz 1.5.0 -1 | 2.238 | 550 MB/s | 710 MB/s |
| lzo1x 2.09 -1 | 2.108 | 650 MB/s | 830 MB/s |
| lz4 1.7.5 | 2.101 | 720 MB/s | 3600 MB/s |
| snappy 1.1.3 | 2.091 | 500 MB/s | 1650 MB/s |
| lzf 3.6 -1 | 2.077 | 400 MB/s | 860 MB/s |
Thanks in advance!
Agree with it. zstd works impressive on our dataset!
I believe zstd is currently being added to zfs on freebsd, maybe we should pick this up after they upstream to openzfs.
Zstd would be really nice.
As far as i know @skiselkov developed a prototype at OpenZFS Hackathon (see http://open-zfs.org/wiki/OpenZFS_Developer_Summit_2016).
Not sure if his branch is still up-to-date (https://github.com/skiselkov/illumos-gate/tree/zstd) or on what version of zstd this is based. Most recent release is v1.2.0 (https://github.com/facebook/zstd/releases)
I'll try to make a prototype of ZSTD integration on my spare time, but it would be more a study for me.
So, any news on this? Facebook is already trying to propose compressed Linux kernel images using zstd.
There will be a topic about zstd on OpenZFS Developer Summit 2017, we will port this realisation as soon as it will release.
"ZSTD Compression" talk by Allan Jude at OpenZFS Developer Summit 2017, mentioned above.
https://redmine.ixsystems.com/issues/26816 seems the most recent upstream-ish work (which was last partially done, and other-partially stuck)
For any interested - most ready-to-use patch is https://reviews.freebsd.org/D11124 . We'll backport it as soon as it would be merged to freebsd.
If possible, consider how a possible future implementation of custom dictionaries could be integrated into this work.
if someone is interested. i ported the freebsd zstd patch to zfsonlinux (hopefully without any misstakes). its made against 0.8.0rc1 you can find the patch here
https://svn.dd-wrt.com/changeset/37376
and this post fix
https://svn.dd-wrt.com/changeset/37385
https://svn.dd-wrt.com/changeset/37399
https://svn.dd-wrt.com/changeset/37401
Why not opening a PR so someone can review?
@darkbasic because i dont use git and i'm too busy with my own projects. if someone wants to test it, he may do so. its to early for a pull request. (and i see alot of outstanding other pull requests which leads me to the conclusion that most of them are ignored anyway)
@BrainSlayer The open PRs are all labeled with their current status. They're not being ignored by the project.
@rlaager yes but its still to early. i still have to run tests if all is working correct. and guess what. its not yet. i have some crashing issues. so it better to submit it after all is working correct
i added a new postfix link to the original post which solves the crash i was discovering. just stupid error i made while porting.
made another patch. its now finally working and its tested in limited range. please test it and tell me if its worth todo a pull request
root@DD-WRT:/tmp/mnt/zfs/usr# zfs get all|grep compress |
zfs compressratio 2.48x -
zfs compression zstd-19 local
zfs refcompressratio 2.51x -
and with max compression
root@DD-WRT:/tmp/mnt/zfs# zfs get all |grep compress |
zfs compressratio 2.56x -
zfs compression zstd-22 local
zfs refcompressratio 2.57x -
are the gains here really worth it?
perhaps my read of the numbers is off, but it looks marginally better ... as a gzip replacement, which i'm not even sure is that widely used is it?
@cwedgwood zstd is significantly faster than gzip in compression and especially decompression: https://facebook.github.io/zstd/
@cwedgwood, from a technical perspective, zstd is superior to gzip.
I have a use case where I have a many tens of terabytes write-once-read-many dataset, soon to grow into multi hundred terabyte range. The files need to be accessed by closed source software that doesn't know how to do compression. Most important concern right now is storage efficiency, but read throughput also matters as a typical processing job churns through a couple of hundred gigabytes.
A few select performance measurements:
Algorithm | Compression ratio | Compression speed | Decompression speed
--------- | ----------------- | ----------------- | -------------------
lz4 | 1.044 | 532 MB/s | 2340 MB/s
lz4hc (-3) | 1.414 | 31.2 MB/s | 751 MB/s
gzip -1 | 1.655 | 25.3 MB/s | 56.7 MB/s
gzip -6 | 1.698 | 4.90 MB/s | 57.1 MB/s
zstd -1 | 1.619 | 454 MB/s | 926 MB/s
zstd -5 | 1.829 | 32.2 MB/s | 351 MB/s
As you can see, zstd is very interesting for this usecase, gzip is an order of magnitude slower while having worse compression ratio. Standard lz4 doesn't like this dataset at all.
zstd is awesome, period.
Most helpful comment
For any interested - most ready-to-use patch is https://reviews.freebsd.org/D11124 . We'll backport it as soon as it would be merged to freebsd.