One of my coworkers made the following observation:
I noticed the julia profiler is sampling all the threads in a deterministic order, which can skew the profile for multithreaded code. When you stop one thread holding an important mutex, you can get a pileup of threads blocking for that mutex, which can make it seem like there's lots of mutex contention, but really that mutex contention is caused by the profiler.
We talked about this with @JeffBezanson last month, and I'm opening an issue about this so we don't forget. :)
We guessed probably the best thing to do is to select one thread at random and record a sample for just that thread? But it'd be best to consult an expert on multithreaded profiling.
CC: @vtjnash
What do sampling profilers for other multithreaded systems do?
xref #20687 (not the same thing but similar in spirit)
I had a look at gperftools and they have an option for a separate timer per thread, which would make sense if you set different sampling intervals for each thread and adjusted the distribution accordingly. The other approach that seems unbiased is to have the signal handler select one thread at random to sample each time, and ensure the sampling frequency is low enough that there's a little pause between restarting one thread and stopping another, so the process has time to return to the stationary distribution (sampling interval >> mixing time).
Duplicate of #9224
Most helpful comment
What do sampling profilers for other multithreaded systems do?