Zstd: Please support setting estimatedSrcSize by a non-experimental API

Created on 17 Sep 2019  路  5Comments  路  Source: facebook/zstd

The estimated size of data to be compressed, for tuning the compressor before starting streaming compression, can be passed by the code calling the compressor only using the experimental API (with ZSTD_STATIC_LINKING_ONLY).

The code is quite verbose, involving a heap allocation of ZSTD_CCtx_params temporary:
https://github.com/google/riegeli/blob/8090178002b7f4e7fe5eba7cb756c526261921f0/riegeli/bytes/zstd_writer.cc#L115-L140

It would be nice if estimatedSrcSize was available as a regular compressor parameter, e.g. with ZSTD_CCtx_setEstimatedSrcSize, similar to ZSTD_CCtx_setPledgedSrcSize.

It could also explicitly use ZSTD_CONTENTSIZE_UNKNOWN instead of 0 for unknown. It seems that both values are equivalent for ZSTD_getParams if dictSize is 0, but they are different if dictSize is not 0; I do not understand this comment about intentional overflow:
https://github.com/facebook/zstd/blob/a10c1916130625b9a9cb2534a0a09109bcdbd2a9/lib/compress/zstd_compress.c#L3945-L3947

This would also ease the feeling that setting ZSTD_parameters overrides internal zstd tuning logic: https://github.com/facebook/zstd/issues/1787#issuecomment-531352770

feature request

All 5 comments

This is present in the latest dev branch, added by PR #1733.

ZSTD_CCtx_setParameter(cctx, ZSTD_c_srcSizeHint, srcSizeHint);

This size hint is used to select (when not using explicit parameters) and tune (always) the parameters when the pledgedSrcSize is not set. The size hint is a regular parameter, so it is sticky across compression calls (where pledgedSrcSize is not).

https://github.com/facebook/zstd/blob/a10c1916130625b9a9cb2534a0a09109bcdbd2a9/lib/compress/zstd_compress.c#L1057-L1059

Please let me know if this doesn't solve your problem.

The ZSTD_c_srcSizeHint is still experimental, since it is new. But this one will be stabilized in a few releases, if users (you) find it useful.

Thank you. Yes, this will be useful.

Why is the value limited to 1e9 and not INT_MAX? This can be surprising for code calling the library (if the size hint is computed, it needs to be artificially saturated by a stricter limit than obviously follows from the type).

I believe that the maximum value is INT_MAX. The first commit had it at 1e9, but that was changed in code review.

Ah, indeed I looked at the first commit, I am sorry.

So we only have to wait until this matures to a non-experimental API. I thought that more needs to be done when I filed this issue.

Assuming that this matures at the right time, we can close this issue. Thank you!

Was this page helpful?
0 / 5 - 0 ratings

Related issues

Hedda picture Hedda  路  4Comments

g666gle picture g666gle  路  3Comments

itsnotvalid picture itsnotvalid  路  3Comments

icebluey picture icebluey  路  3Comments

planet36 picture planet36  路  3Comments