Zstd: When to recycle ZSTD_CCtx objects?

Created on 17 Sep 2019  ·  4Comments  ·  Source: facebook/zstd

Experiments indicate that recycling ZSTD_CCtx objects between separate streaming compression sessions (e.g. with ZSTD_CCtx_reset), instead of creating them anew, is much faster: compression takes microseconds instead of milliseconds.

It looks like this is because existing entries in a hash table can be cleared much faster than it takes to fill a newly allocated table to all zeros. Most expensive is a memset in ZSTD_reset_matchState, its size can be over ten megabytes.

But recycling ZSTD_CCtx objects is helpful only if compression options match. If compression options differ, ZSTD_CCtx performs a costly reset of internal tables, and recycling them is as expensive as creating them anew, while losing the opportunity to reuse the object with matching compression options later.

Our library does the recycling automatically, maintaining a pool of idle objects, and looking for an existing object with matching options instead of creating it anew: https://github.com/google/riegeli/blob/master/riegeli/base/recycling_pool.h. It is also used for other kinds of objects, but ZSTD_CCtx is one where proper recycling matters the most.

The problem is that determining what does it mean that compression options match, for the purpose of recycling ZSTD_CCtx objects, is not trivial. For example:

  • compressionLevel and windowLog participate in this;
  • checksumFlag does not;
  • estimatedSrcSize (or pledgedSrcSize if set) partially participate:

    • ignoring them would trigger expensive recycling with different compression options,

    • checking whether they match exactly would miss meaningful opportunities for recycling,

    • the proper way is to mimic the tableID computation in ZSTD_getCParams and split them into 4 classes of equivalence: https://github.com/google/riegeli/blob/8090178002b7f4e7fe5eba7cb756c526261921f0/riegeli/bytes/zstd_writer.cc#L63-L73

Extracting the key for comparing options is done here: https://github.com/google/riegeli/blob/8090178002b7f4e7fe5eba7cb756c526261921f0/riegeli/bytes/zstd_writer.h#L213-L233; https://github.com/google/riegeli/blob/8090178002b7f4e7fe5eba7cb756c526261921f0/riegeli/bytes/zstd_writer.cc#L95-L97.

This is fragile.

Recently I discovered that if the first ZSTD_compressStream2 call is with ZSTD_e_end, then the input size counts as specifying pledgedSrcSize for the purpose of recycling ZSTD_CCtx objects. This prompted a change to create the compressor lazily, so that when it is known whether the first compressed fragment is also the last one, parameters for looking up an idle ZSTD_CCtx object to recycle can be appropriately adjusted: https://github.com/google/riegeli/commit/8090178002b7f4e7fe5eba7cb756c526261921f0

Can this be done better somehow?

I am not sure how this should be designed properly. Here are two possibilities:

  1. Maybe zstd can somehow make that memset not as costly, to make the whole problem less important. Maybe there is an OS-specific way to get a zeroed memory more efficiently. Maybe zstd could make an internal pool of these tables.

  2. If we keep the existing basic design (i.e. recycling ZSTD_CCtx objects is the responsibility of a higher level library than zstd), what I need is to extract a “key” of compression parameters which matter for this purpose, a key which has equality and hashing defined. Maybe zstd could offer a more reliable way for this.

question

Most helpful comment

The plan after https://github.com/facebook/zstd/pull/1780 is clear: always recycle, ignoring compression options. I am closing the issue. Thank you!

(Now I only have to wait for a zstd release, and then for us to update the version of our zstdlib dependency.)

All 4 comments

Hey!

Great write-up! We have the same pattern in a few places at facebook (e.g.: https://github.com/facebook/hhvm/blob/master/hphp/util/compression-ctx-pool.h), with exactly the same problem.

My changesets #1712 and #1780 address this scenario. When possible (i.e., when there's enough room in the existing workspace buffer), the CCtx avoids the memset even when compression parameters change. This functionality will be in the next zstd release.

I hope that's useful. (In fact, if you tried your use case on the dev branch, it would be great to get confirmation that you're seeing the expected speed up.)

Yeah, this is _exactly_ the kind of topic we want to address for our next release, feedback will be highly appreciated.
@felixhandte made a big part of the required changes recently in dev branch. We have another set of patches coming for some complementary scenarios, but it seems that the scenarios you care about are already covered by Felix's merged work.

Funny: https://github.com/facebook/hhvm/blob/master/hphp/util/compression-ctx-pool.h was submitted one day after https://github.com/google/riegeli/blob/master/riegeli/base/recycling_pool.h

I verified with the dev branch and indeed, changing our code to recycle ZSTD_CCtx objects unconditionally, without verifying compression options, makes microbenchmarks of performing tiny compression tasks about 3% faster. In contrast, with v1.4.0 the same change makes some of these microbenchmarks 350 times slower.

The plan after https://github.com/facebook/zstd/pull/1780 is clear: always recycle, ignoring compression options. I am closing the issue. Thank you!

(Now I only have to wait for a zstd release, and then for us to update the version of our zstdlib dependency.)

Was this page helpful?
0 / 5 - 0 ratings

Related issues

ga92yup picture ga92yup  ·  3Comments

AbdulrahmanAltabba picture AbdulrahmanAltabba  ·  3Comments

icebluey picture icebluey  ·  3Comments

animalize picture animalize  ·  3Comments

Hedda picture Hedda  ·  4Comments