Julia: huge performance regression in vector math

Created on 3 Aug 2016  Â·  20Comments  Â·  Source: JuliaLang/julia

There has been a huge performance regression in simple vector-math operations like +, e.g.

x = rand(10^7); y = rand(10^7);
@time x + y;
@time x + y;

gives 0.495843 seconds (20.00 M allocations: 381.463 MB, 20.92% gc time) ... notice the 20M allocations, indicative of a type instability in an inner loop.

The x + y call devolves into a call to Base._elementwise(+, Float64, x, y) in arraymath.jl, which was most recently touched by #17389 (@pabloferz) and #17313 (@martinholters).

Since @nanosoldier didn't detect any performance regressions in #17313, I'm guessing #17389 is the problem here?

performance regression

Most helpful comment

Adding the type parameter seems to fix it, I was trying to take some type parameters out, but I took out too much.

function Base._elementwise{T}(op, ::Type{T}, A::AbstractArray, B::AbstractArray)
    F = similar(A, T, promote_shape(A, B))
    for (iF, iA, iB) in zip(eachindex(F), eachindex(A), eachindex(B))
        @inbounds F[iF] = op(A[iA], B[iB])
    end
    return F
end

All 20 comments

I cannot reproduce this problem, comparing Version 0.4.5 (2016-03-18 00:58 UTC)
and Version 0.5.0-rc0+0 (2016-07-26 20:22 UTC) on Linux (Ubuntu 16.04).
What did you compare?

0.5.0-rc0+174

Adding the type parameter seems to fix it, I was trying to take some type parameters out, but I took out too much.

function Base._elementwise{T}(op, ::Type{T}, A::AbstractArray, B::AbstractArray)
    F = similar(A, T, promote_shape(A, B))
    for (iF, iA, iB) in zip(eachindex(F), eachindex(A), eachindex(B))
        @inbounds F[iF] = op(A[iA], B[iB])
    end
    return F
end

Yes, the Type{T} ones tend to be important, since we try to avoid specializing on every type argument.

yikes, we really should have run nanosoldier there before merging. whoops.

17798

@JeffBezanson Is there documentation / a rough design document / a blog entry / a certain set of functions that one could look at to get a feeling for the rules that govern specialization?

While #17798 helped, it didn't fix all of it. https://github.com/JuliaCI/BaseBenchmarkReports/blob/f42bed6fb5e9d16970da9b58cf24755de6dc7d0f/daily_2016_8_4/report.md I think what I'm going to do is revert #17389 on the release-0.5 branch for rc1.

I think that for the rest of the problems another type parameter (again) does the trick

function promote_op{S}(f, ::Type{S})
    T = _promote_op(f, _default_type(S))
    return isleaftype(S) ? T : typejoin(S, T)
end
function promote_op{R,S}(f, ::Type{R}, ::Type{S})
    T = _promote_op(f, _default_type(R), _default_type(S))
    isleaftype(R) && return isleaftype(S) ? T : typejoin(S, T)
    return isleaftype(S) ? typejoin(R, T) : typejoin(R, S, T)
end

(currently these two do not have type parameters).

So, echoing @eschnett, a guideline here would be helpful (maybe even worth a place on the performance tips).

My mental model has been that function arguments are only for dispatch and has nothing to do with performance (except for ANY). That is apparently wrong so I would also be interested in that.

As far as I can see the also seem to matter when dispatching on Type{T}s, for the rest it seems that your mental model (which is the one endorsed) still works.

Alright, good to know. Thanks.

@pabloferz, my understanding is that if you write a function f(T::Type), then T is just a value that is determined at runtime (the same version of f is compiled for all values of T), whereas if you write f{T}(::Type{T}) then T is part of the type signature of the function — hence it is known at compile time and a specialized version of f is compiled for every T.

@stevengj That was my understanding too. But, I forgot for a moment while writing the changes on #17389.

Now I believe that a bad interaction between the object_id hashing, cached typed changes and using inference to try to find the return type (and some missing static type parameters) is causing most of these problems. But I'm not actually sure.

@pabloferz Could we have a PR with the more complete fix?

I can work on that, but I won't be able to get to it until Tuesday.

in that case we'll probably have to put rc2 out without reinstating https://github.com/JuliaLang/julia/pull/17389

IIUC, this performance regression no longer exists on the release branch, so this is not blocking anymore.

I think this is fixed by #17929. Reopen or leave a comment if you think otherwise.

LGTM.

Was this page helpful?
0 / 5 - 0 ratings