The issue is discussed briefly in the following comment
https://discourse.julialang.org/t/julia-threads-vs-blas-threads/8914/14?u=jlapeyre
The OP on the same thread mentions the issue.
The timings have improved since the post, but the basic problem remains:
julia> @btime BLAS.set_num_threads(3)
1.243 μs (2 allocations: 32 bytes)
julia> my_BLAS_set_num_threads(n) =
ccall((:openblas_set_num_threads64_, Base.libblas_name), Cvoid, (Int32,), n);
julia> @btime my_BLAS_set_num_threads(3)
4.898 ns (0 allocations: 0 bytes)
I wrote that the function that "...determines the BLAS vendor is not optimized away by the compiler". This may be misleading. It's not because the compiler needs improvement, but because the BLAS library is linked dynamically.
That's because the internal call checks vendor(), which does a dlopen and then sets threads based on the vendor.
So best to do it on your own if this matters for your code - but Base probably needs to stay general.
We never thought the time it takes to change the number of threads would matter. In fact, I'm not sure the library is intended to be used that way. But it seems we could fix this by setting vendor once in __init__.
@Jutho You may want to read this issue in response to your discourse post. Also: https://github.com/JuliaLang/julia/issues/10028
I don't have a use case. I was responding to the OP, @Jutho. But, looking at the post more closely, I don't see a clear case for the need to set the number of threads quickly. Unless, there is a specific need, it doesn't make sense to change anything.
Agree - but if someone wants to submit a PR to improve this, I wouldn't be opposed to it.
I'm not sure how the threading works, e.g. can you have a Julia thread pool and a BLAS thread pool and quickly switch between them ? The call does not seem to do anything but set an int. Maybe there is a pool and this number is read dynamically to determine how many to use ?
Related to #10028, I thought of writing a function to check if both Julia and BLAS multithreading are enabled. The subject can be very confusing to inexperienced users.
No, what I suggested above is not possible (at least at the moment).
This comment
https://github.com/JuliaLang/julia/blob/ae8e95f249732ce4555584e8d0d8e469c4119edb/base/threadingconstructs.jl#L84
claims that @threads causes threads to be spawned. And matlab or C code that calls BLAS routines can be very inefficient due to spawning a group of threads repeatedly to do a fast operation.
Much of this will have to wait for the new threading work. Until then Julia has its own thread pool and BLAS has its own. Most programs should either use Julia threads or be single threaded and use BLAS threads.
@anton-malakhov plans to integrate the new PARTR stuff that @kpamnany is doing into MKL to have a seamless integration between Julia and MKL BLAS.
Right, though, it's not directly related to the time spent in the set_num_threads() function
Much of this will have to wait
Oh! Of course, since there is no known use case, and the fix is non-breaking, we shouldn't even be talking about this till v1.0 is out the door. My intention was to record it as an issue for the future.
I think this should be fixed by @JeffBezanson 's PR above.
Most helpful comment
@anton-malakhov plans to integrate the new PARTR stuff that @kpamnany is doing into MKL to have a seamless integration between Julia and MKL BLAS.