Hi,
I am trying to port existing code (zstd-jni) to the new Advanced API but it looks like there is something that I don't understand. I have already ported the decompression side allowing to attach dictionaries (fast of plain) using the new API but there is something missing compression side. Here is minimal change that I was expecting to work:
ZSTD_CCtx_setParameter(stream, ZSTD_c_checksumFlag, checksum);
- result = ZSTD_initCStream_usingDict(stream, dict_buff, dict_size, level);
+ result = ZSTD_CCtx_loadDictionary(stream, dict_buff, dict_size);
+ if (result != 0) goto E2;
+ result = ZSTD_initCStream(stream, level);
With this change we can round-trip compression/decompression but the dictionary Id is lost from the compressed frame header - it is always 0.
Is it an omission on the library side or I am doing something wrong?
Hi @luben ,
a few things I can think of :
ZSTD_initCStream() is not supposed to be necessary. In the new advanced API, one just set the compression level with ZSTD_CCtx_setParameter(), and that's fine, sending data afterwards triggers an automatic init if they are the first bytes of the frames.
ZSTD_initCStream() is still useful for support of legacy code, obviously.
It can also be used to start a new frame without finishing properly the previous one, though this functionality is in competition with ZSTD_CCtx_reset(), which is now preferable.
In theory, using ZSTD_initCStream() is expected to be neither necessary nor detrimental. It should merely reset the context for a new frame and set the compression level. But I believe it's a case of bad side effects when mixing legacy and new API entry points. ZSTD_initCStream() enforces its own frame parameters, and one of them is removing the dictionary ID, which wasn't necessary before. I think we should update this entry point, so that it play nicer with the new API.
Short term :
remove ZSTD_initCStream() in your code, just set the compression level with ZSTD_CCtx_setParameter().
Medium term (for us) :
update ZSTD_initCStream() internal so that it plays nicer with the new advanced API.
We should also make clear what's the expected behavior in the documentation.
That will be easier when advanced API get promoted to stable status, as we will be able to shuffle presentation order in a more logical way.
Thanks, that solved the problem. The lazy initialization will also allow to simplify the bindings. Does it apply also for the decompression context?
Yes, it does.
A lot of people are going to be asking questions like "will my compression level take effect" with increasing API complexity. I had to go read the zstd sources to answer which order I should call ZSTD_initCStream and ZSTD_CCtx_setParameter, which I am now using to enable checksums. (Note that I am very adverse to rewriting my code to not use ZSTD_initCStream). I would think about sticking with existing names for functions and providing "Ex"(tended) version that take more arguments so that the basic use pattern remains the same.
Making the lifetime of things (e.g. context/stream/parameters/frames)--and how they are nested--obvious should considered be one of the top concerns in the API design.
Why are you adverse to rewriting your code to not use ZSTD_initCStream()?
The API we're moving to is used like:
ZSTD_CCtx* cctx = ZSTD_createCCtx();
ZSTD_CCtx_setParameter(cctx, ZSTD_c_compressionLevel, 1);
ZSTD_CCtx_setParameter(cctx, ZSTD_c_checksumFlag, 1);
// Use ZSTD_compressStream() and ZSTD_compressEnd() like normal.
// You may switch to using ZSTD_compressStream2(), which enables one
// extra optimization opportunity, but is otherwise equivalent.
// If you're reusing a context for many compressions you may call
ZSTD_CCtx_reset(cctx, ZSTD_reset_session_only);
// to reset the context state without resetting the parameters.
ZSTD_freeCCtx(cctx);
All of the ZSTD_initCStream*() functions will eventually be deprecated, though we won't be making any breaking changes to the stable API for a long long time, if ever.
I understand that having to migrate to a new API is a pain. We guarantee that we won't break existing usage of the stable API, but if you want to use new features, they will only be exposed through the new API, so you'll have to transition.
The header file will be updated by the next release to move the new API towards the top above the ZSTD_initCStream() API, which should help clarify this. Additionally, I'll update all the example files to use the new API. Would it be helpful to have a file with examples of how to transition from the old API to the new one?
If you have any further suggestions, please let me know before the next release, since once we move the API to the stable section it cannot be changed.
I moved all my use to the new API and I find it easier to work in general. Please, put some docs that init* functions are deprecated and don't work with the new API, as this caused me a lot of wasted time, hence this issue that put me on the right track.
Thanks for the feedback! I'll be rearranging the header and docs this month in preparation for the release. I'm tracking changes in Issue #1533.
I've reordered documentation in the zstd header, and explicitly marked the deprecated streaming functions, and what their equivalents are in the new advanced API.
Please let me know if you see any other improvements!
Most helpful comment
I moved all my use to the new API and I find it easier to work in general. Please, put some docs that init* functions are deprecated and don't work with the new API, as this caused me a lot of wasted time, hence this issue that put me on the right track.