Zstd: stream decompress API generate wrong data, but have no error message.

Created on 19 Sep 2016 · 15Comments · Source: facebook/zstd

I have a compressed file by zstd and use the following code to decompress it:

#define BUFF_IN_SIZE (32 * 1024)

static char *decompressFile_orDie2(const char* fname)
{
    FILE* const fin = fopen_orDie(fname, "rb");
    char *fname2 = createOutFilename_orDie(fname);
    FILE* const fout = fopen_orDie(fname2, "wb");
    size_t const buffInSize = BUFF_IN_SIZE;
    char buffIn[BUFF_IN_SIZE];
    size_t const buffOutSize = ZSTD_DStreamOutSize();;
    void*  const buffOut = malloc_orDie(buffOutSize);
    size_t read, toRead = buffInSize;

    ZSTD_DStream* const dstream = ZSTD_createDStream();
    if (dstream == NULL) { fprintf(stderr, "ZSTD_createDStream() error \n"); exit(10); }
    size_t const initResult = ZSTD_initDStream(dstream);
    if (ZSTD_isError(initResult)) { fprintf(stderr, "ZSTD_initDStream() error \n"); exit(11); }

    while ((read = fread_orDie(buffIn, toRead, fin))) {
        ZSTD_inBuffer input = { buffIn, read, 0 };
        while (input.pos < input.size) {
            ZSTD_outBuffer output = { buffOut, buffOutSize, 0 };
            ZSTD_decompressStream(dstream, &output, &input);
            if (output.pos > 0) {
                fwrite_orDie(buffOut, output.pos, fout);
            }           
        }
    }

    fclose_orDie(fin);
    fflush(fout);
    fclose_orDie(fout);
    free(buffOut);

    return fname2;
}

I defined the macro BUFF_IN_SIZE as 4K, 8K,16K,32K,64K and 128K to test, the decompressed file is equal to the original file using 4K,8K,16K and 32K, but not equal using 64K and 128K. so the stream decompress API work correctly depending on the size of input buffer.

Source

azurelaker

Most helpful comment

which version of API are you using? 1.0.0 or dev branch?
ZSTD_decompressStream(dstream, &output, &input);
In version 1.0.0, you need to check the return code of ZSTD_decompressStream. Only 0 indicates the end of decompressing a frame.

There are cases when input.pos = input.size, but the decompress process is not finished.

advancedxy on 19 Sep 2016

👍3

All 15 comments

There are cases when input.pos = input.size, but the decompress process is not finished.

advancedxy on 19 Sep 2016

👍3

the latest dev branch.

azurelaker on 20 Sep 2016

Check the result of ZSTD_decompressStream(), maybe it is returning an error code.

terrelln on 20 Sep 2016

int errCode = ZSTD_decompressStream(dstream, &output, &input);
if (errCode < 0) {
    printf("decompress error:%d\n", errCode);
}

without error happen.

I think the return value of ZSTD_decompressStream ( nextSrcSizeHint) should not be a mandatory requirement for caller's input buffer size, caller do not care about the inner status of the decompress API.

azurelaker on 20 Sep 2016

@azurelaker the return type of ZSTD_decompressStream() is a size_t. To check for error, you must do if (ZSTD_isError(errCode)).

terrelln on 20 Sep 2016

@azurelaker Is createOutFilename_orDie() returning a different name than fname?

terrelln on 20 Sep 2016

@terrelln the decompressed out file name is not same as the reading file.

azurelaker on 20 Sep 2016

size_t errCode = ZSTD_decompressStream(dstream, &output, &input);
if (ZSTD_isError(errCode)) {
    printf("decompress error:%d\n", errCode);
}

without error happen.

azurelaker on 20 Sep 2016

Can you decompress the file using the zstd command line utility? If you could give me the input file + the code you used for compression, I can try to reproduce the failure.

terrelln on 20 Sep 2016

I think the return value of ZSTD_decompressStream ( nextSrcSizeHint) should not be a mandatory requirement for caller's input buffer size, caller do not care about the inner status of the decompress API.

It's not. Maybe you are hitting a corner case.

As terrelln said, a reproduce step would be much appreciated.
Or could you provide the read size and nextSrcSizeHint of every iteration? And the compressed file if possible

advancedxy on 20 Sep 2016

@azurelaker :

the methodology you selected is to declare "end of decompression" on reaching the end of compressed file. This methodology is not supported in v0.8.0 ... v1.0.0.

Reason is : it's possible for the decoder to read the entire compressed file, decompress last block within its internal buffer, and then try to push data into external output buffer, which may be too small to finish the job.
In which case, some decompressed data still remain within internal buffers, waiting to be flushed.

In your case, you have an output buffer of size ZSTD_DStreamOutSize(), which guarantee it's always possible to flush at least one full block. If your application was following nextSrcSizeHint, it would guarantee to load data one block at a time, hence it would guarantee that flush is always successful.
But since your application loads compressed input in fixed blocks sizes buffInSize, you can be in a situation where the decoder loads more than one block. In which case, it will try to flush more than one block, but output is not large enough to flush everything in one pass.

This situation should be properly detected with a ZSTD_decompressStream() return code which is != 0. You don't need to follow the size hint, but you should at least verify that decompression is completely finished by checking if return code is ==0.

Last detail, I'm wondering which version of the library you are using.
You stated "latest dev", but it doesn't match expectation.
Can you check which version of the library you are being linked to ?
A simple call to ZSTD_versionNumber() should be enough.

Cyan4973 on 20 Sep 2016

the code i used is the latest master (not dev branch, sorry for misstating it), is it recommended?

azurelaker on 21 Sep 2016

@azurelaker 1.0.0 is recommended. But as @Cyan4973 said, the methodology is not supported in 1.0.

You can test your code in dev branch, it should work as expected.
In v1.0.0, maybe you can be something like this:

size_t sizeHint;
while ((read = fread_orDie(buffIn, toRead, fin))) {
  ZSTD_inBuffer input = { buffIn, read, 0 };
  while (input.pos < input.size) {
    ZSTD_outBuffer output = { buffOut, buffOutSize, 0 };
    sizeHint = ZSTD_decompressStream(dstream, &output, &input);
    // Don't forget to check the ZSTD_isError(sizeHint)
    if (output.pos > 0) {
      fwrite_orDie(buffOut, output.pos, fout);
    }           
  }
 }

// This should handle buffOutSizes different from DStreamOutSize()
while(sizeHint) {
  ZSTD_inBuffer input = { buffIn, 0, 0 };
  ZSTD_outBuffer output = { buffOut, buffOutSize, 0 };
  sizeHint = ZSTD_decompressStream(dstream, &output, &input);
  // Don't forget to check the ZSTD_isError(sizeHint)
  if (output.pos > 0) {
    fwrite_orDie(buffOut, output.pos, fout);
  }           
 }

The code above is not tested. But you can get the point.

advancedxy on 21 Sep 2016

I used my same test code to test the latest dev branch. it works no matter how big the fixed input buffer size.
thanks :+1:

azurelaker on 21 Sep 2016

v1.1.0 now fakes not reading the last byte,
so that users relying on assumption that "if all input is read, then decompression is finished"
can rely on it and see a correct behaviour.

Cyan4973 on 28 Sep 2016

Was this page helpful?

0 / 5 - 0 ratings