Zstd: Provide ZSTD_boundDecompressedSize()

Created on 21 Feb 2019 · 11Comments · Source: facebook/zstd

When the decompressed size isn't written into the frame, and you want to use the one pass function, you are forced to guess the decompressed size. We could estimate the decompressed size using the number of blocks. It will be no larger than 128 KB larger than the decompressed size if flush isn't used. If flush is used, or some other block level API, then we could estimate way too large.

Example usage (without error checking):

std::string decompress(void* src, size_t size, size_t maxAlloc) {
    size_t const bound = std::min(maxAlloc, ZSTD_boundDecompressedSize(src, size));
    std::string out;
    out.resize(bound);
    out.resize(ZSTD_decompress(&out[0], out.size(), src, size));
    return out;
}

Edit: Implemented by @shakeelrao in PR #1543. This issue is left open to add support for legacy mode before the release.

feature request good first issue

Source

terrelln

Most helpful comment

There isn't explicit documentations, but they are all predecessors to zstd, so they are very similar.

Check out ZSTD_findFrameCompressedSizeLegacy, and each implementation like ZSTDv01_findFrameCompressedSize.

Lets start by using the macro BLOCKSIZE, instead of inspecting the frame header, since it is a bit simpler, and I'm not 100% confident that blockSize <= windowSize for every legacy version. @Cyan4973 may be able to help here. The function ZSTD_getFrameParams() is available for versions >= 0.4. However, we only need legacy support so we can say that the bound always works, it isn't super valuable to be as tight as possible for legacy formats, so I don't think the complexity is worth the gains.

terrelln on 12 Mar 2019

👍2

All 11 comments

Hi @terrelln,

I'm new to the project, but would be interested in implementing this feature. Do you have any reference points for understanding how to compute the upper bound? Should I read the RFC?

shakeelrao on 26 Feb 2019

👍2

Read the block header section of the format spec. Each block has at most Block_Maximum_Decompressed_Size bytes.
We will loop over each frame in the input and either use the content size if available, or count the number of blocks and multiple by Block_Maximum_Decompressed_Size.
We can reuse the code in ZSTD_findFrameCompressedSize(), which loops over every block in a frame to find the compressed size. We can write a helper function which finds both the compressed size, and bounds the decompress size. Then ZSTD_findFrameCompressedSize() can return only the compressed size.
We should support multiple appended frames like ZSTD_findDecompressedSize().

terrelln on 26 Feb 2019

🚀1

Thanks, @terrelln! Is this the rough idea?

bound = 0
for each frame in src:
  if Frame_Content_Size of frame exists:
    bound += Frame_Content_Size of frame
  else
    for each block in frame:
       bound += Block_Maximum_Decompressed_Size
return bound

Also, is Block_Maximum_Decompressed_Size a predefined constant?

shakeelrao on 27 Feb 2019

Actually, now that I have read more code, is it accurate to say this function is an adaption of ZSTD_findDecompressedSize, but instead of returning an error if ZSTD_getFrameContentSize returns ZSTD_CONTENTSIZE_UNKNOWN, we calculate the block-based upper-bound?

shakeelrao on 27 Feb 2019

👍1

Block_Maximum_Decompressed_Size is defined in the spec (you can search for the phrase).
That pseudocode looks right. Note that we'll want to handle legacy frames. We can start by returning an error, and then add in support later.
Yeah, it is exactly that. We can probably roll the block-based upper-bound into a helper also used by ZSTD_findFrameCompressedSize() so we avoid some code duplication.

terrelln on 27 Feb 2019

👍1

Should we leave this open until legacy mode is implemented or does it make sense to create a new issue?

shakeelrao on 4 Mar 2019

Lets leave it open, I'll edit the message.

terrelln on 5 Mar 2019

@terrelln Is there documentation on the legacy format? I'd be interested continuing my work on this feature by implementing legacy support.

shakeelrao on 9 Mar 2019

There isn't explicit documentations, but they are all predecessors to zstd, so they are very similar.

Check out ZSTD_findFrameCompressedSizeLegacy, and each implementation like ZSTDv01_findFrameCompressedSize.

terrelln on 12 Mar 2019

👍2

Is this issue considered done now? Thanks again for all the help!

shakeelrao on 24 Mar 2019

Thanks @shakeelrao for implementing the decompress bound function!

terrelln on 24 Mar 2019

🎉1

Was this page helpful?

0 / 5 - 0 ratings

Related issues

Zstd multithreaded output can depend on number of threads

terrelln · 3Comments

ZSTD on iOS - alignment to 32 bits required?

vade · 3Comments

Multi-threading / multi-threaded support for multi-core system

Hedda · 4Comments

Files only, not directories?

itsnotvalid · 3Comments

Documentation for ZSTD_CCtx_reset() is misleading about parameters

indygreg · 3Comments