I've opened this issue to discuss some changes we can make to the determination of jobLog (which is the value that ZSTDMT_computeTargetJobLog() returns) when using multithreading.
When compressing silesia.tar (~200MB) with -T0 -19 --long=23, @Cyan4973 discovered that as of version 1.3.5, multithreading is not truly enabled with -T0. This has led to some discussion about better settings for multithreading while using LDM, and in particular, reducing the job sizes across the board.
Currently, ldm uses chainLog exclusively to compute the job size. Here, I present a table of some compression configurations and how the LDM currently derives jobLog, and two possible changes that we could make. I'm in favor of approach 2 and being more liberal with using smaller jobSizes. On silesia.tar, we can compress nearly twice as fast on the -19 wlog=23, -19 wlog=27 and -22 wlog=27 cases in particular, compared to no multithreading.
I feel that with -T0, I generally expect the program to err on the side of parallelizing more, rather than less, and the existing 256MB default jobSize at -19 -T0 --long seems too large.
| Conf | dev, no --long | dev, --long: MAX(21, chainLog+4) | proposal 1, --long: MAX(21, ZSTD_cycleLog(hashLog, strategy)+4) | proposal 2, --long: MAX(21, ZSTD_cycleLog(hashLog, strategy)+3) |
|-------------|------------------|--------------------------------------|---------------------------------------------------------------------|---------------------------------------------------------------------|
| -22 wlog=27 | 29 | 30 | 28 | 27 |
| -19 wlog=27 | 29 | 28 | 25 | 24 |
| -19 wlog=23 | 25 | 28 | 25 | 24 |
| -16 wlog=23 | 25 | 26 | 25 | 24 |
| -13 wlog=22 | 24 | 25 | 25 | 24 |
| -11 wlog=22 | 24 | 25 | 26 | 25 |
| -11 wlog=27 | 29 | 25 | 26 | 25 |
| -9 wlog=21 | 23 | 23 | 24 | 23 |
| -7 wlog=21 | 23 | 23 | 23 | 22 |
| -3 wlog=21 | 23 | 21 | 21 | 21 |
| -3 wlog=27 | 29 | 21 | 21 | 21 |
| -1 wlog=19 | 21 | 21 | 21 | 21 |
| -1 wlog=22 | 25 | 21 | 21 | 21 |
| -1 wlog=27 | 29 | 21 | 21 | 21 |
I like the results of proposal 1,
though something seems off with regards to the calculation.
For example, I expect the cyclelog of level 19 to be 23,
and therefore, a joblog of 25 would actually require a rule like cyclelog + 2.
I then expect the cyclelog of level 16 to be 21,
and the cyclelog of level 13 to be 20.
Yet, in above table, levels 19, 16 and 13 all result in the same jobSize.
I recommend adding a column cycleLog so that we can all see the calculation base.
Shouldn't it be cycleLog(chainLog, strategy)?
Yes, indeed
Shouldn't it be
cycleLog(chainLog, strategy)?
Yeah that would make sense - I've seen cycleLog being used for both cycleLog(cParams.hashLog, strategy) and cycleLog(cParams.chainLog, strategy) in the codebase - is there an actual meaning of cycleLog when it's used for one or the other, seeing as it's basically just returning the value itself (-1 at higher compression settings), or is this mostly just a heuristic correction?
cycleLog() is supposed to only have a meaning with regards to chainLog,
and even then, only for levels greedy and above.
I'm surprised that it could be used in a statement like cycleLog(cParams.hashLog...
Yeah, the one place it shows up is in FIO_adjustParamsForPatchFromMode() - at line 900:if (fileWindowLog > ZSTD_cycleLog(cParams.hashLog, cParams.strategy))
Is this a bug that should get fixed too?
Yes, it seems incorrect.
Also, it seems that your work on refactoring parameters for --long mode and enabling it automatically for large windowLog could have an impact on the FIO_adjustParamsForPatchFromMode() function anyway.