Our application uses multi-threaded compression with default compression level (3) and sets nbWorkers to number of logical cores on the machine. We create the compression context on first use and reuse it throughout the session and free it on termination. All our compression calls are issued from the main thread only.
We're getting crash reports from the field that some users are encountering a crash in the main thread during this cleanup inside the function POOL_join, in the second call to ZSTD_pthread_cond_broadcast (WakeAllConditionVariable on Windows). We haven't been able to find steps to reproduce this problem.
Crash stack from the dump looks like the following:
ntdll.dll!00007ffcd1b6cc37() Unknown
[Inline Frame] AppName.exe!POOL_join(POOL_ctx_s *) Line 168 C
AppName.exe!POOL_free(POOL_ctx_s * ctx) Line 178 C
AppName.exe!ZSTDMT_freeCCtx(ZSTDMT_CCtx_s * mtctx) Line 966 C
AppName.exe!ZSTD_freeCCtxContent(ZSTD_CCtx_s * cctx) Line 138 C
AppName.exe!ZSTD_freeCCtx(ZSTD_CCtx_s * cctx) Line 149 C
We can't share the dump files due to legal and privacy concerns but following are the debugger views of the source and disassembly of the main and pool thread.
Main Thread:

Main Thread Disassembly:

Pool Thread:

Pool Thread Disassembly:

We are on zstd release 1.4.4 and using Visual Studio 2017 (15.9) on Windows and building for x86_64.
An observation that has been true for all dumps we've examined so far, is that some of the pool threads seem to have exited by the time the crash happens.
For example, in the above dump, nbWorkers was set as 4 but at the time of the crash only one pool thread was still active. Rest 3 had exited.
Any insights on how to resolve this would be highly appreciated. Thanks.
I suspect this is a Windows specific issue, though it's not obvious what is happening.
One thing to try is to compile zstd with asserts enabled: by defining DEBUGLEVEL to 1 (or higher to get progressively more logging).
@terrelln if we could reproduce this issue at our end, increasing logging would have helped. However we're only getting this from the field. We do have DEBUGLEVEL set as 1 for our debug builds, but have never encountered this in-house.
Also in this case number of crashes is almost equal to number of users affected, so hardly ever does the same user encounters (or at least reports) this twice.
And yes this does appear to be a Windows specific issue. We have not encountered any similar report on Mac.
@vnair81 do you know what Visual Studios version is being used to compile, and what versions of Windows this error is occurring on (and any other potential variables)?
Visual Studio 2017 (version 15.9.15) was used for building this. OS on which crash occurred was Windows 10 Pro (version 10.0.18363).
I have the full memory dump available, so we can examine any variable which is not optimized away (/O2 optimization was used).
ctx from POOL_join is as follows:

@terrelln does this help?
Not sure if it is relevant, but we previously found an issue (#1830) where the mutex and condition variable were being overwritten by a memset call.
Though that doesn't look to be the case here.