I've been adding some new compressors to the backup program I'm developing, HashBackup. Since they are rather new, I enabled compression verification. During backups, HB decompresses and compares all compressed data before adding it to the backup.
I have isolated a 256K block of data that 0.7.0 apparently doesn't compress correctly. It works fine with 0.6.1. Here is zstd 0.7.0 trying to compress this block:
[jim@mb ~]$ zstd --version
*** zstd command line interface 64-bits v0.7.0, by Yann Collet ***
[jim@mb ~]$ md5 bad
MD5 (bad) = 40e6a2ada63122557c55790b524521c2
[jim@mb ~]$ ls -l bad
-rw-r--r-- 1 jim staff 262144 Jun 23 21:23 bad
[jim@mb ~]$ zstd --check bad
bad : 7.72% (262144 => 20237 bytes, bad.zst)
[jim@mb ~]$ zstd --test bad.zst
Error 36 : Decoding error : Restored data doesn't match checksum
[jim@mb ~]$ zstd -d bad.zst -o bad2
Error 36 : Decoding error : Restored data doesn't match checksum
[jim@mb ~]$ rm bad.zst
[jim@mb ~]$ zstd bad
bad : 7.72% (262144 => 20237 bytes, bad.zst)
[jim@mb ~]$ zstd -d bad.zst bad2
zstd: bad already exists; not overwritten
zstd: bad2: unknown suffix (.zst expected) -- ignored
[jim@mb ~]$ zstd -d bad.zst -o bad2
zstd: bad2 already exists; do you wish to overwrite (y/N) ? y
bad.zst : 262144 bytes
[jim@mb ~]$ ls -l bad*
-rw-r--r-- 1 jim staff 262144 Jun 23 21:23 bad
-rw-r--r-- 1 jim staff 20237 Jun 23 21:31 bad.zst
-rw-r--r-- 1 jim staff 262144 Jun 23 21:31 bad2
[jim@mb ~]$ cmp bad bad2
bad bad2 differ: char 188453, line 1
[jim@mb ~]$
I have attached the block. Send an email (in my profile) if the md5 doesn't match.
bad.zip
Indeed, there was a roundtrip bug in 0.7.0,
it has been fixed in 0.7.1.
This specific test file works fine with 0.7.1.
0.7.1 works great. I didn't realize there had been a new release in the last few days, should have checked that first. Thanks!
Today I ran an existing backup, about 5.5M blocks of data totaling 100GB (compressed) through zstd 071. All blocks were uncompressed with the old compressor, then compressed with zstd at levels 1-7, decompressed, and verified. The block sizes ranged from tiny to 1MB. No problems! I'm leaving the roundtrip verification on for a while as a precaution, but it looks good. Thanks for your fantastic work on zstd! -Jim
Thanks for your great feedback !
For your information, note that v0.7.1 has checksum verification turned on by default,
so that any potential corruption in the future get immediately caught.
That's only for the zstd command though, right? I don't want or need zstd to add a checksum on each block I compress, because I already have a SHA1.
Correct, that's only for the command line.
And it's just default, it can also be disabled using --no-check.
On the API side, it's the reverse logic : no checksum by default, since it is in general combined with some other process. Can be enabled using _advanced() prototypes.
Ran another backup through zstd levels 1-7, this one with 14M blocks. All good!