zstd_btultra corner case

Created on 7 Jan 2021  路  7Comments  路  Source: facebook/zstd

zstd_btultra performs nearly 2x worse than zstd_opt on a edge case file.

> seq 100000 999999 > seqs
> zstd seqs -b16e19i0            
16#seqs              :   6300000 ->    321618 (19.59),  15.0 MB/s , 817.0 MB/s 
17#seqs              :   6300000 ->    323943 (19.45),  10.6 MB/s , 831.3 MB/s 
18#seqs              :   6300000 ->    618382 (10.19),  8.60 MB/s , 778.1 MB/s 
19#seqs              :   6300000 ->    618096 (10.19),  7.31 MB/s , 779.3 MB/s

Most helpful comment

Running my multi-arrivial branch which stores the 8 best arrivals for each position, and checks non-repcodes for the best arrival, and repcodes for every arrival. It is very slow, but does much better in this corner case.

> ./zstd -b16e22 ~/datasets/seqs
16#seqs              :   6300000 ->    227770 (27.66),  3.25 MB/s , 781.7 MB/s 
17#seqs              :   6300000 ->    233354 (27.00),  3.03 MB/s , 781.2 MB/s 
18#seqs              :   6300000 ->    149662 (42.09),  1.86 MB/s , 789.2 MB/s 
19#seqs              :   6300000 ->    149389 (42.17),  1.85 MB/s , 788.8 MB/s

All 7 comments

One difference I can think of is using fractional bit evaluation (btultra) instead of full bit approximation (btopt).
It could end up triggering a different optimization gradient, which ends up being incorrect (?)

One difference I can think of is using fractional bit evaluation

That was my thought, since the regression is caused by btultra having 2x more literals. But I disabled fractional bits and didn't see a difference.

I'm hunting down every difference between btopt and btultra to find out what is causing it.

This is the minimal changes that fix the issue. Making one change or the other doesn't do anything, you need both.

diff --git a/lib/compress/zstd_opt.c b/lib/compress/zstd_opt.c
index e55c459d..995a8251 100644
--- a/lib/compress/zstd_opt.c
+++ b/lib/compress/zstd_opt.c
@@ -24,10 +24,10 @@
 *  Price functions for optimal parser
 ***************************************/

-#if 0    /* approximation at bit level */
+#if 1    /* approximation at bit level */
 #  define BITCOST_ACCURACY 0
 #  define BITCOST_MULTIPLIER (1 << BITCOST_ACCURACY)
-#  define WEIGHT(stat)  ((void)opt, ZSTD_bitWeight(stat))
+#  define WEIGHT(stat,opt)  ((void)opt, ZSTD_bitWeight(stat))
 #elif 0  /* fractional bit accuracy */
 #  define BITCOST_ACCURACY 8
 #  define BITCOST_MULTIPLIER (1 << BITCOST_ACCURACY)
@@ -1076,7 +1076,7 @@ ZSTD_compressBlock_opt_generic(ZSTD_matchState_t* ms,

             if (cur == last_pos) break;

-            if ( (optLevel==0) /*static_test*/
+            if ( (1||optLevel==0) /*static_test*/
               && (opt[cur+1].price <= opt[cur].price + (BITCOST_MULTIPLIER/2)) ) {
                 DEBUGLOG(7, "move to next rPos:%u : price is <=", cur+1);
                 continue;  /* skip unpromising positions; about ~+6% speed, -0.01 ratio */

With this diff it is:

16#seqs              :   6300000 ->    321644 (19.59),  15.6 MB/s , 865.0 MB/s 
17#seqs              :   6300000 ->    323967 (19.45),  11.6 MB/s , 885.4 MB/s 
18#seqs              :   6300000 ->    183556 (34.32), 10.33 MB/s , 889.6 MB/s 
19#seqs              :   6300000 ->    182603 (34.50),  8.37 MB/s , 891.8 MB/s 

So there seems to be some sort of pricing issue. My guess is literal pricing, since the alphabet size is 10.

I think the root cause is actually a case of not being to evaluate repcode choice.

At some point, zstd_btultra decides to take an offset instead of adding 2 literals + a repcode. I have to figure out why it made that choice.

That offset messes up the recode history (rep0 = 2 lits + ml 5, rep1 = 1 lit + ml 6). But, from that point on "repcode 0" is so heavily favored that repcode0 + 2 lits is favored over repcode1 + 1 lit.

The optimal parser doesn't know that using repcode1 puts it back into repcode0.

Running my multi-arrivial branch which stores the 8 best arrivals for each position, and checks non-repcodes for the best arrival, and repcodes for every arrival. It is very slow, but does much better in this corner case.

> ./zstd -b16e22 ~/datasets/seqs
16#seqs              :   6300000 ->    227770 (27.66),  3.25 MB/s , 781.7 MB/s 
17#seqs              :   6300000 ->    233354 (27.00),  3.03 MB/s , 781.2 MB/s 
18#seqs              :   6300000 ->    149662 (42.09),  1.86 MB/s , 789.2 MB/s 
19#seqs              :   6300000 ->    149389 (42.17),  1.85 MB/s , 788.8 MB/s

I was testing this use case with a smaller sequence set, from 10000 to 99999,
and was expecting to witness roughly the same outcome, with just a size difference.

It doesn't : ranking of btopt and btultra is reversed (compared to sequences 100000 to 999999)

| levels | Ratio | Note |
| --- | --- | --- |
| 1-2 | x2.3 | does not find match
| 3 - 7 | x22 | good ! greedy and lazy
| 8 - 15 | x10 | worse :( lazy2 and btlazy2
| 16 - 17 | x7 | even worse... btopt
| 18 - 22 | x19 | better, but not "best" btultra

It shows that, depending on gradiant decent, itself dependent on initialization and, well, almost random luck, then one or the other get stuck into a "good" or "bad" state with regards to repcode selection, and then just stays there.

None of them is excellent though, and your multi-arrivial experiment show that it could be much better.

It shows that, depending on gradiant decent, itself dependent on initialization and, well, almost random luck, then one or the other get stuck into a "good" or "bad" state with regards to repcode selection, and then just stays there.

Yeah, it seems to be just getting lucky or not lucky.

I don't remember why I stopped working on the multi-arrivial. Probably just got busy & it was still very slow. I'll try to pick it up at some point, and working on speeding it up (or reserving it for its own strategy above zstd_btultra & using it at level 19+).

Was this page helpful?
0 / 5 - 0 ratings

Related issues

Hedda picture Hedda  路  4Comments

itsnotvalid picture itsnotvalid  路  3Comments

sergeevabc picture sergeevabc  路  3Comments

g666gle picture g666gle  路  3Comments

robert3005 picture robert3005  路  4Comments