From discourse, I see a big performance regression compared to Julia 1.0 in a simple map!
call starting in Julia 1.2:
using BenchmarkTools
test8!(D, A, B, C) = map!((a, b, c) -> a + b + c, D, A, B, C)
A = rand(1000,1000); B = rand(1000,1000); C = rand(1000,1000); D = zeros(1000,1000);
@btime test8!($D,$A,$B,$C);
Julia 1.0.4 gives 1.817 ms and Julia 1.1.0 gives 1.961 ms, but Julia 1.2.0 gives 3.004 ms, Julia 1.3.0 gives 3.091 ms, and Julia 1.4.0 gives 3.006 ms.
Hi, I did some benchmarks and the cause is this line. If you remove it and rebuild Julia, the test runs in 1.490 ms as compared to 2.927 ms with it. Adding the @inbounds
macro didn't help and the runtime was about 3 ms as before. Adding the --check-bounds=no
flag speeds up the code to 1.487 ms.
I'm currently investigating why @inbounds
didn't elide the @boundscheck
block in map_n!
. Should be something about inlines or propagating inbounds.
Hah, wow, that's quite the literal speed bump. There are three issues here:
@boundscheck
will never be elided as that function doesn't inline (and isn't called with @inbounds
anyways... and it's a dimension check, not a bounds check.==
and doesn't actually throw a DimensionMismatch if they're notLinearIndices
doesn't have a specialized ==
method so it's using the fallback O(n) algorithm from AbstractArray
A fourth issue is that it assumes fast linear indexing, i.e. it is assuming IndexStyle(args...) isa IndexLinear
I think a generic and composable approach to implement IndexStyle-generic map!
would be to implement it in terms of foldl
. See https://github.com/JuliaLang/julia/pull/35036 for how I used it for foreach(f, arrays...)
. It's generally better than raw for
loops. (But it's more like a possible future improvement rather than a performance bug fix.)
Most helpful comment
Hi, I did some benchmarks and the cause is this line. If you remove it and rebuild Julia, the test runs in 1.490 ms as compared to 2.927 ms with it. Adding the
@inbounds
macro didn't help and the runtime was about 3 ms as before. Adding the--check-bounds=no
flag speeds up the code to 1.487 ms.I'm currently investigating why
@inbounds
didn't elide the@boundscheck
block inmap_n!
. Should be something about inlines or propagating inbounds.