Hi,
While importing v1.4.1 into zstd-jni, I noticed that my tests for decompressing legacy v5 format are failing. Using the zstd CLI utils, they are also truncating the output file for v5:
~ zstdcat xml_v04.zst > v4_dc
~ zstdcat xml_v05.zst > v5_dc
~ zstdcat xml_v06.zst > v6_dc
~ zstdcat xml_v07.zst > v7_dc
~ ls -al
total 43072
drwxr-xr-x 2 luben luben 4096 юли 21 01:41 .
drwxr-xr-x 4 luben luben 4096 юли 21 00:51 ..
-rw-r--r-- 1 luben luben 5345280 юли 21 01:39 v4_dc
-rw-r--r-- 1 luben luben 5334107 юли 21 01:40 v5_dc
-rw-r--r-- 1 luben luben 5345280 юли 21 01:40 v6_dc
-rw-r--r-- 1 luben luben 5345280 юли 21 01:40 v7_dc
-rw-r--r-- 1 luben luben 5345280 дек 18 2015 xml
...
-rw-r--r-- 1 luben luben 707718 апр 15 2016 xml_v04.zst
-rw-r--r-- 1 luben luben 706452 апр 15 2016 xml_v05.zst
-rw-r--r-- 1 luben luben 703596 юни 20 2016 xml_v06.zst
-rw-r--r-- 1 luben luben 703154 авг 3 2016 xml_v07.zst
Notice, the size of v5_dc that was decompressed from xml_v05.zst is smaller than the others, or the original file xml.
All files can be downloaded from: https://github.com/luben/zstd-jni/tree/master/src/test/resources
Regards,
luben
The regression is introduced by 0fd322f812211e653a83492c0c114b933f8b6bc5. Now I have to figure out if it is a bug in the commit, or if the v0.5 compressor is producing buggy data.
I was able to decompress the file using previous versions - it's part of the zstd-jni test suite.
Yeah, I am able to decompress with earlier versions too. But, I'm think that maybe the zstd-0.5 compressor produced bad output that we also happened to decompress successfully.
CC @Cyan4973, since you're more familiar with the code. I commented the line (L3194) that I can revert to get the original data back. Do you see anything wrong with the code?
Is it possible that dumpsEnd is meant to be inclusive inclusive? So *dumpsEnd is the last byte of the dumps?
I've found the bug, and I will put up a PR to fix it shortly. It only effects version 0.5, both earlier versions and later versions are not effected.
In the meantime, we have the zstd versions test, but it seems to be disabled in our CI. I just ran it, but it didn't catch the bug.
@felixhandte can you look into running the zstd versions test in at least our master branch CI, and also add it to our manual tests in the release checklist?
@Cyan4973, @felixhandte should we make another release for this bug? I'm thinking yes.
That's a good question.
I guess it partially depends how frequently users decompress v0.5 legacy format.
Unfortunately, we have no stats on such event :(
I intend to patch zstd-jni-1.4.1 (still unreleased), but I think it's good idea to release it as 1.4.2.
Most helpful comment
I've found the bug, and I will put up a PR to fix it shortly. It only effects version 0.5, both earlier versions and later versions are not effected.
In the meantime, we have the zstd versions test, but it seems to be disabled in our CI. I just ran it, but it didn't catch the bug.
@felixhandte can you look into running the zstd versions test in at least our master branch CI, and also add it to our manual tests in the release checklist?