Zstd: Corrupted ZSTD output under very repetitive input

Created on 29 Feb 2020  路  7Comments  路  Source: facebook/zstd

Build type: 32-bit
Build system & version: Visual Studio 2017 Version 15.9.19
ZSTD version: Release, commit 8974906 (latest version, also built with Visual Studio 2017)
Language standard: C++17

So I was trying to compress libbson raw data, however ZSTD reported an error (DEBUG_LEVEL is 4) and ZSTD_compress didn't return an error code of any kind, only the amount that it actually compressed.
D:\Reekpie\zstd\lib\compress\zstd_compress_sequences.c:101: ERROR!: check ZSTD_g etFSEMaxSymbolValue(ctable) < max failed, returning ERROR(GENERIC): Repeat FSE_C Table has maxSymbolValue 51 < 52
The compressed buffer is corrupted and cannot be decompressed, since the decompression function throws a -20 error (corrupted block).

Code for reproduction:

#include <zstd.h>
#include <fstream>

int main()
{
    std::ifstream input("0.0.zchk_raw", std::fstream::in | std::fstream::ate);
    const size_t inputSize = input.tellg();
    input.seekg(0);
    char* inputBuffer = new char[inputSize];
    input.read(inputBuffer, inputSize);
    const size_t approxCompressSize = ZSTD_compressBound(inputSize);
    char* compressBuffer = new char[approxCompressSize];
    const size_t compressSize = ZSTD_compress(compressBuffer, approxCompressSize, inputBuffer, inputSize, 22);

    return 0;
}

0.0.zchk_raw is attached to this post in a ZIP file.

0.0.zip

bug

Most helpful comment

Alright, I figured this out.
Apparently VC is the culprit, as it appends some extra bytes when writing out with std::ofstream. This caused the output, compressed file to be 2 bytes larger than it's supposed to be.
I apologize for wasting your time 馃槢

All 7 comments

Was able to reproduce this. The full log:

C:\Users\bimbashrestha\Downloads\zstd\lib\compress\zstd_compress.c: ZSTD_compressCCtx (srcSize=2359445)
C:\Users\bimbashrestha\Downloads\zstd\lib\compress\zstd_compress.c: ZSTD_compress_advanced_internal (srcSize:2359445)
C:\Users\bimbashrestha\Downloads\zstd\lib\compress\zstd_compress.c: ZSTD_compressBegin_internal: wlog=22
C:\Users\bimbashrestha\Downloads\zstd\lib\compress\zstd_compress.c: ZSTD_resetCCtx_internal: pledgedSrcSize=2359445, wlog=22
C:\Users\bimbashrestha\Downloads\zstd\lib\compress\zstd_compress.c: chainSize: 8388608 - hSize: 8388608 - h3Size: 131072
C:\Users\bimbashrestha\Downloads\zstd\lib\compress\zstd_compress.c: Need 66805KB workspace, including 66193KB for match state, and 0KB for buffers
C:\Users\bimbashrestha\Downloads\zstd\lib\compress\zstd_compress.c: windowSize: 2359445 - blockSize: 131072
C:\Users\bimbashrestha\Downloads\zstd\lib\compress\zstd_compress.c: Resize workspaceSize from 0KB to 66805KB
C:\Users\bimbashrestha\Downloads\zstd\lib\compress\zstd_cwksp.h: cwksp: freeing workspace
C:\Users\bimbashrestha\Downloads\zstd\lib\compress\zstd_cwksp.h: cwksp: creating new workspace with 68409174 bytes
C:\Users\bimbashrestha\Downloads\zstd\lib\compress\zstd_cwksp.h: cwksp: init'ing workspace with 68409174 bytes
C:\Users\bimbashrestha\Downloads\zstd\lib\compress\zstd_cwksp.h: cwksp: clearing!
C:\Users\bimbashrestha\Downloads\zstd\lib\compress\zstd_cwksp.h: cwksp: clearing!
C:\Users\bimbashrestha\Downloads\zstd\lib\compress\zstd_compress.c: pledged content size : 2359445 ; flag : 1
C:\Users\bimbashrestha\Downloads\zstd\lib\compress\zstd_compress.c: reset indices : 1
C:\Users\bimbashrestha\Downloads\zstd\lib\compress\zstd_cwksp.h: cwksp: ZSTD_cwksp_mark_tables_dirty
C:\Users\bimbashrestha\Downloads\zstd\lib\compress\zstd_cwksp.h: cwksp: clearing tables!
C:\Users\bimbashrestha\Downloads\zstd\lib\compress\zstd_compress.c: reset table : 1
C:\Users\bimbashrestha\Downloads\zstd\lib\compress\zstd_cwksp.h: cwksp: ZSTD_cwksp_clean_tables
C:\Users\bimbashrestha\Downloads\zstd\lib\compress\zstd_cwksp.h: cwksp: ZSTD_cwksp_mark_tables_clean
C:\Users\bimbashrestha\Downloads\zstd\lib\compress\zstd_compress.c: reserving optimal parser space
C:\Users\bimbashrestha\Downloads\zstd\lib\compress\zstd_compress.c: wksp: finished allocating, 0 bytes remain available
C:\Users\bimbashrestha\Downloads\zstd\lib\compress\zstd_compress.c: ZSTD_compress_insertDictionary (dictSize=0)
C:\Users\bimbashrestha\Downloads\zstd\lib\compress\zstd_compress.c: ZSTD_writeFrameHeader : dictIDFlag : 1 ; dictID : 0 ; dictIDSizeCode : 0
C:\Users\bimbashrestha\Downloads\zstd\lib\compress\zstd_opt.c: ZSTD_initStats_ultra (srcSize=131072)
C:\Users\bimbashrestha\Downloads\zstd\lib\compress\zstd_compress_sequences.c:101: ERROR!: check ZSTD_getFSEMaxSymbolValue(ctable) < max failed, returning ERROR(GENERIC): Repeat FSE_CTable has maxSymbolValue 16 < 18
C:\Users\bimbashrestha\Downloads\zstd\lib\compress\zstd_compress_sequences.c:101: ERROR!: check ZSTD_getFSEMaxSymbolValue(ctable) < max failed, returning ERROR(GENERIC): Repeat FSE_CTable has maxSymbolValue 51 < 52
C:\Users\bimbashrestha\Downloads\zstd\lib\compress\zstd_compress.c: ZSTD_writeEpilogue
C:\Users\bimbashrestha\Downloads\zstd\lib\compress\zstd_compress.c: end of frame : controlling src size
C:\Users\bimbashrestha\Downloads\zstd\lib\compress\zstd_cwksp.h: cwksp: freeing workspace
C:\Users\bimbashrestha\Downloads\zstd\lib\decompress\zstd_decompress.c: ZSTD_createDCtx
C:\Users\bimbashrestha\Downloads\zstd\lib\decompress\zstd_decompress.c: reading magic number FD2FB528 (expecting FD2FB528)
C:\Users\bimbashrestha\Downloads\zstd\lib\decompress\zstd_decompress.c: ZSTD_decompressFrame (srcSize:2368661)
C:\Users\bimbashrestha\Downloads\zstd\lib\decompress\zstd_decompress.c: reading magic number CD000000 (expecting FD2FB528)
C:\Users\bimbashrestha\Downloads\zstd\lib\decompress\zstd_decompress.c: ZSTD_decompressFrame (srcSize:2368101)
C:\Users\bimbashrestha\Downloads\zstd\lib\decompress\zstd_decompress.c:259: ERROR!: unconditional check failed, returning ERROR(prefix_unknown):
C:\Users\bimbashrestha\Downloads\zstd\lib\decompress\zstd_decompress.c:627: ERROR!: forwarding error in ZSTD_decodeFrameHeader(dctx, ip, frameHeaderSize): Unknown frame descriptor:
C:\Users\bimbashrestha\Downloads\zstd\lib\decompress\zstd_decompress.c:767: ERROR!: check (ZSTD_getErrorCode(res) == ZSTD_error_prefix_unknown) && (moreThan1Frame==1) failed, returning ERROR(srcSize_wrong): at least one frame successfully completed, but following bytes are garbage: it's more likely to be a srcSize error, specifying more bytes than compressed size of frame(s). This error message replaces ERROR(prefix_unknown), which would be confusing, as the first header is actually correct. Note that one could be unlucky, it might be a corruption error instead, happening right at the place where we expect zstd magic bytes. But this is _much_ less likely than a srcSize field error.
inputSize: 2359445 compressSize: 560 approxCompressSize: 2368661 reconSize: 4294967224

C:\Users\bimbashrestha\Downloads\zstd\build\VS2010\Debug\CorruptBlock.exe (process 6076) exited with code 0.

With:

#include <zstd.h>
#include <fstream>
#include <iostream>

int main()
{
    std::ifstream input("C:/Users/bimbashrestha/Downloads/zstd/build/VS2010/bin/Win32_Debug/0.0.zchk_raw", std::fstream::in | std::fstream::ate);
    const size_t inputSize = input.tellg();
    input.seekg(0);
    char* inputBuffer = new char[inputSize];
    char* reconBuffer = new char[inputSize];
    input.read(inputBuffer, inputSize);
    const size_t approxCompressSize = ZSTD_compressBound(inputSize);
    char* compressBuffer = new char[approxCompressSize];
    const size_t compressSize = ZSTD_compress(compressBuffer, approxCompressSize, inputBuffer, inputSize, 22);
    const size_t reconSize = ZSTD_decompress(reconBuffer, inputSize, compressBuffer, approxCompressSize);

    std::cout << "inputSize: " << inputSize;
    std::cout << " compressSize: " << compressSize;
    std::cout << " approxCompressSize: " << approxCompressSize;
    std::cout << " reconSize: " << reconSize << std::endl;

    return 0;
}

Also, it looks like this is a generic issue. So not just Visual. I can reproduce this in the cli

ERROR!: check ZSTD_g etFSEMaxSymbolValue(ctable) < max failed, returning ERROR(GENERIC): Repeat FSE_C Table has maxSymbolValue 51 < 52

This is not an error, just an artifact. We should fix this function so it doesn't log error messages. I haven't been able to reproduce the issue. Can you please include how you call the decompression function in your repro?

Can you please include how you call the decompression function in your repro?

It's like this, like what @bimbashrestha did:
const size_t reconSize = ZSTD_decompress(reconBuffer, inputSize, compressBuffer, compressSize);
And it doesn't seem to be an artifact since reconSize is always set to an error code.

There was typo in the script I posted above. i actually can't reproduce the decompression failure either.

const size_t reconSize = ZSTD_decompress(reconBuffer, inputSize, compressBuffer, approxCompressSize);

should be

const size_t reconSize = ZSTD_decompress(reconBuffer, inputSize, compressBuffer, compressSize);

I'm using visual 2019 and Win32 as the target and C++17

EDIT: @ivanka2012 Can you try with the change to the script and see if you still get a corruption?

Corrected script

#include <zstd.h>
#include <fstream>
#include <iostream>

int main()
{
    std::ifstream input("C:/Users/bimbashrestha/Downloads/zstd/build/VS2010/bin/Win32_Debug/0.0.zchk_raw", std::fstream::in | std::fstream::ate);
    const size_t inputSize = input.tellg();
    input.seekg(0);
    char* inputBuffer = new char[inputSize];
    char* reconBuffer = new char[inputSize];
    input.read(inputBuffer, inputSize);
    const size_t approxCompressSize = ZSTD_compressBound(inputSize);
    char* compressBuffer = new char[approxCompressSize];
    const size_t compressSize = ZSTD_compress(compressBuffer, approxCompressSize, inputBuffer, inputSize, 22);
    const size_t reconSize = ZSTD_decompress(reconBuffer, inputSize, compressBuffer, compressSize);

    std::cout << "inputSize: " << inputSize;
    std::cout << " compressSize: " << compressSize;
    std::cout << " approxCompressSize: " << approxCompressSize;
    std::cout << " reconSize: " << reconSize << std::endl;

    return 0;
}

And it doesn't seem to be an artifact since reconSize is always set to an error code.

I mean that this particular log line is not an error, not that the decompression failure isn't real. I'm fixing it in #2023.

Alright, I figured this out.
Apparently VC is the culprit, as it appends some extra bytes when writing out with std::ofstream. This caused the output, compressed file to be 2 bytes larger than it's supposed to be.
I apologize for wasting your time 馃槢

Was this page helpful?
0 / 5 - 0 ratings

Related issues

animalize picture animalize  路  3Comments

escalade picture escalade  路  3Comments

pjebs picture pjebs  路  3Comments

AbdulrahmanAltabba picture AbdulrahmanAltabba  路  3Comments

xorgy picture xorgy  路  3Comments