When the decompressed size isn't written into the frame, and you want to use the one pass function, you are forced to guess the decompressed size. We could estimate the decompressed size using the number of blocks. It will be no larger than 128 KB larger than the decompressed size if flush isn't used. If flush is used, or some other block level API, then we could estimate way too large.
Example usage (without error checking):
std::string decompress(void* src, size_t size, size_t maxAlloc) {
size_t const bound = std::min(maxAlloc, ZSTD_boundDecompressedSize(src, size));
std::string out;
out.resize(bound);
out.resize(ZSTD_decompress(&out[0], out.size(), src, size));
return out;
}
Edit: Implemented by @shakeelrao in PR #1543. This issue is left open to add support for legacy mode before the release.
Hi @terrelln,
I'm new to the project, but would be interested in implementing this feature. Do you have any reference points for understanding how to compute the upper bound? Should I read the RFC?
Block_Maximum_Decompressed_Size bytes.Block_Maximum_Decompressed_Size.ZSTD_findFrameCompressedSize() can return only the compressed size.Thanks, @terrelln! Is this the rough idea?
bound = 0
for each frame in src:
if Frame_Content_Size of frame exists:
bound += Frame_Content_Size of frame
else
for each block in frame:
bound += Block_Maximum_Decompressed_Size
return bound
Also, is Block_Maximum_Decompressed_Size a predefined constant?
Actually, now that I have read more code, is it accurate to say this function is an adaption of ZSTD_findDecompressedSize, but instead of returning an error if ZSTD_getFrameContentSize returns ZSTD_CONTENTSIZE_UNKNOWN, we calculate the block-based upper-bound?
Block_Maximum_Decompressed_Size is defined in the spec (you can search for the phrase).ZSTD_findFrameCompressedSize() so we avoid some code duplication.Should we leave this open until legacy mode is implemented or does it make sense to create a new issue?
Lets leave it open, I'll edit the message.
@terrelln Is there documentation on the legacy format? I'd be interested continuing my work on this feature by implementing legacy support.
There isn't explicit documentations, but they are all predecessors to zstd, so they are very similar.
Check out ZSTD_findFrameCompressedSizeLegacy, and each implementation like ZSTDv01_findFrameCompressedSize.
Lets start by using the macro BLOCKSIZE, instead of inspecting the frame header, since it is a bit simpler, and I'm not 100% confident that blockSize <= windowSize for every legacy version. @Cyan4973 may be able to help here. The function ZSTD_getFrameParams() is available for versions >= 0.4. However, we only need legacy support so we can say that the bound always works, it isn't super valuable to be as tight as possible for legacy formats, so I don't think the complexity is worth the gains.
Is this issue considered done now? Thanks again for all the help!
Thanks @shakeelrao for implementing the decompress bound function!
Most helpful comment
There isn't explicit documentations, but they are all predecessors to zstd, so they are very similar.
Check out ZSTD_findFrameCompressedSizeLegacy, and each implementation like ZSTDv01_findFrameCompressedSize.
Lets start by using the macro BLOCKSIZE, instead of inspecting the frame header, since it is a bit simpler, and I'm not 100% confident that
blockSize <= windowSizefor every legacy version. @Cyan4973 may be able to help here. The functionZSTD_getFrameParams()is available for versions >= 0.4. However, we only need legacy support so we can say that the bound always works, it isn't super valuable to be as tight as possible for legacy formats, so I don't think the complexity is worth the gains.