Zstd: v1.4.1 corruption decompressing legacy v5 files

Created on 21 Jul 2019  Â·  8Comments  Â·  Source: facebook/zstd

Hi,

While importing v1.4.1 into zstd-jni, I noticed that my tests for decompressing legacy v5 format are failing. Using the zstd CLI utils, they are also truncating the output file for v5:

~ zstdcat xml_v04.zst > v4_dc
~ zstdcat xml_v05.zst > v5_dc
~ zstdcat xml_v06.zst > v6_dc
~ zstdcat xml_v07.zst > v7_dc
~ ls -al
total 43072
drwxr-xr-x 2 luben luben     4096 юли 21 01:41 .
drwxr-xr-x 4 luben luben     4096 юли 21 00:51 ..
-rw-r--r-- 1 luben luben  5345280 юли 21 01:39 v4_dc
-rw-r--r-- 1 luben luben  5334107 юли 21 01:40 v5_dc
-rw-r--r-- 1 luben luben  5345280 юли 21 01:40 v6_dc
-rw-r--r-- 1 luben luben  5345280 юли 21 01:40 v7_dc
-rw-r--r-- 1 luben luben  5345280 дек 18  2015 xml
...
-rw-r--r-- 1 luben luben   707718 апр 15  2016 xml_v04.zst
-rw-r--r-- 1 luben luben   706452 апр 15  2016 xml_v05.zst
-rw-r--r-- 1 luben luben   703596 юни 20  2016 xml_v06.zst
-rw-r--r-- 1 luben luben   703154 авг  3  2016 xml_v07.zst

Notice, the size of v5_dc that was decompressed from xml_v05.zst is smaller than the others, or the original file xml.

All files can be downloaded from: https://github.com/luben/zstd-jni/tree/master/src/test/resources

Regards,
luben

bug

Most helpful comment

I've found the bug, and I will put up a PR to fix it shortly. It only effects version 0.5, both earlier versions and later versions are not effected.

In the meantime, we have the zstd versions test, but it seems to be disabled in our CI. I just ran it, but it didn't catch the bug.

@felixhandte can you look into running the zstd versions test in at least our master branch CI, and also add it to our manual tests in the release checklist?

All 8 comments

The regression is introduced by 0fd322f812211e653a83492c0c114b933f8b6bc5. Now I have to figure out if it is a bug in the commit, or if the v0.5 compressor is producing buggy data.

I was able to decompress the file using previous versions - it's part of the zstd-jni test suite.

Yeah, I am able to decompress with earlier versions too. But, I'm think that maybe the zstd-0.5 compressor produced bad output that we also happened to decompress successfully.

CC @Cyan4973, since you're more familiar with the code. I commented the line (L3194) that I can revert to get the original data back. Do you see anything wrong with the code?

Is it possible that dumpsEnd is meant to be inclusive inclusive? So *dumpsEnd is the last byte of the dumps?

I've found the bug, and I will put up a PR to fix it shortly. It only effects version 0.5, both earlier versions and later versions are not effected.

In the meantime, we have the zstd versions test, but it seems to be disabled in our CI. I just ran it, but it didn't catch the bug.

@felixhandte can you look into running the zstd versions test in at least our master branch CI, and also add it to our manual tests in the release checklist?

@Cyan4973, @felixhandte should we make another release for this bug? I'm thinking yes.

That's a good question.
I guess it partially depends how frequently users decompress v0.5 legacy format.
Unfortunately, we have no stats on such event :(

I intend to patch zstd-jni-1.4.1 (still unreleased), but I think it's good idea to release it as 1.4.2.

Was this page helpful?
0 / 5 - 0 ratings