Zstd: 1.3.4 multithreaded compression is nondeterministic

Created on 28 Mar 2018 · 4Comments · Source: facebook/zstd

In previous zstd versions, if you made a copy of file X (call the copy X'), and then compressed X and X' with the CLI using the same number of threads, the compressed files would be byte-for-byte identical. This is no longer true with zstd 1.3.4. My application using ZSTD_compress_generic() exhibits the same behavior change.

I can live with this, but it would be nice to at least have a way to request deterministic multithreaded compression when it would be useful.

Source

chrchang

👍1

Most helpful comment

@chrchang It should be fixed now, but let me know if you are still seeing any non-determinism on the dev branch.

terrelln on 3 Apr 2018

👍2

All 4 comments

I'm not sure if this is a "bug" or not, I guess it depends on what is causing the non-determinism.

I'll look into what is causing the non-determinism and post back here once I understand exactly what is going on.

terrelln on 28 Mar 2018

It is caused by commit 9a211d1f05bd8cb6cb80780bf1880c23204b7b68. zstdmt reuses contexts without resetting the tables, relying on other mechanisms to invalidate the entries. This means that the segment that the context previously compressed can slightly change the compressed result. However, single threaded compression should still be deterministic.

On one hand, we don't guarantee that the compressed output will be the same between versions, or even between runs. On the other hand, I understand that deterministic results can be helpful, especially for benchmarking.

Switching to a level that uses the ZSTD_greedy strategy or stronger (level 5+ for inputs >= 256 KB) should be deterministic in zstd-1.3.4.

@cyan4973, what do you think?

terrelln on 28 Mar 2018

Wow, that's the result of a quite complex interaction...

That being said, I believe generating a deterministic output is valuable.

I also believe that 9a211d1 is beneficial when creating CDict, since it ensures that the table is optimally filled,
but in the context of prefix loading, like zstdmt, improved speed due to skipping second pass would make sense, due to the very low compression gains produced.

Hence, I believe the dictionary loading function should have 2 variants, one "full", when building CDict, and one "fast", when loading prefixes (which are only used once). The difference is merely the second pass.

I guess it's easier said than done, since as a consequence, selecting the right variant requires sending or deducting the context, so I suspect it's going to impact some internal apis.