Zstd: How to use streaming compression with multi-thread?

Created on 13 Apr 2019  路  2Comments  路  Source: facebook/zstd

I am trying to add multi-thread to examples/streaming_compression.c. I added a few lines in compressFile_orDie()(full code here):

#ifdef ZSTD_MULTITHREAD
    assert(!ZSTD_isError(ZSTD_CCtx_setParameter(cstream, ZSTD_c_nbWorkers, 2)));
#endif

    // size_t const initResult = ZSTD_initCStream(cstream, cLevel);
    // if (ZSTD_isError(initResult)) {
    //     fprintf(stderr, "ZSTD_initCStream() error : %s \n",
    //                 ZSTD_getErrorName(initResult));
    //     exit(11);
    // }

It seems that https://github.com/facebook/zstd/blob/470344d33e1d52a2ada75d278466da8d4ee2faf6/lib/compress/zstd_compress.c#L4021 would create mtctx only if cctx is not initialized(i.e. cctx->streamStage == zcss_init). So I comment ZSTD_initCStream and set ZSTD_c_nbWorkers to 2.

But this modified version of streaming_compression dies after calling ZSTD_endStream and exit with message "not fully flushed". This can be reproduced on large enough input, for example, silesia.tar.

I think maybe I misunderstand the way to use ZSTD. So how to use streaming compression with multi-thread?

question

Most helpful comment

Hey @bennyyip, there are two things that should help you here:

  1. I updated streaming_compression.c to use the new API a week ago. Now you would just insert your call to ZSTD_CCtx_setParameter(cctx, ZSTD_c_nbWorkers, 2)) where the other parameters are set.
  2. The reason you were seeing that not flushed error, is because in multithreaded mode ZSTD_endStream(), or equivalently ZSTD_compressStream2() with ZSTD_e_end, doesn't make maximum forward progress, just some progress, so you need to call it until it returns 0. We realized that this was a confusing behavior, so in commit 48a6427d22f290157b8acc3f7c03c0f762a768be we made ZSTD_endStream() block until either its output buffer is full, or the stream is complete.

We'll be releasing a new version of zstd on Monday with a focus on the advanced API. Please let me know if you find any of the new documentation confusing, so I can improve it.

All 2 comments

Hey @bennyyip, there are two things that should help you here:

  1. I updated streaming_compression.c to use the new API a week ago. Now you would just insert your call to ZSTD_CCtx_setParameter(cctx, ZSTD_c_nbWorkers, 2)) where the other parameters are set.
  2. The reason you were seeing that not flushed error, is because in multithreaded mode ZSTD_endStream(), or equivalently ZSTD_compressStream2() with ZSTD_e_end, doesn't make maximum forward progress, just some progress, so you need to call it until it returns 0. We realized that this was a confusing behavior, so in commit 48a6427d22f290157b8acc3f7c03c0f762a768be we made ZSTD_endStream() block until either its output buffer is full, or the stream is complete.

We'll be releasing a new version of zstd on Monday with a focus on the advanced API. Please let me know if you find any of the new documentation confusing, so I can improve it.

@terrelln Thanks for your quick reply! I will open an issue if I found any confusion in the new documentation.

Was this page helpful?
0 / 5 - 0 ratings

Related issues

robert3005 picture robert3005  路  4Comments

itsnotvalid picture itsnotvalid  路  3Comments

planet36 picture planet36  路  3Comments

AbdulrahmanAltabba picture AbdulrahmanAltabba  路  3Comments

animalize picture animalize  路  3Comments