Julia: huge performance regression in vector math

Created on 3 Aug 2016 · 20Comments · Source: JuliaLang/julia

There has been a huge performance regression in simple vector-math operations like +, e.g.

x = rand(10^7); y = rand(10^7);
@time x + y;
@time x + y;

gives 0.495843 seconds (20.00 M allocations: 381.463 MB, 20.92% gc time) ... notice the 20M allocations, indicative of a type instability in an inner loop.

The x + y call devolves into a call to Base._elementwise(+, Float64, x, y) in arraymath.jl, which was most recently touched by #17389 (@pabloferz) and #17313 (@martinholters).

Since @nanosoldier didn't detect any performance regressions in #17313, I'm guessing #17389 is the problem here?

performance regression

Source

stevengj

Most helpful comment

Adding the type parameter seems to fix it, I was trying to take some type parameters out, but I took out too much.

function Base._elementwise{T}(op, ::Type{T}, A::AbstractArray, B::AbstractArray)
    F = similar(A, T, promote_shape(A, B))
    for (iF, iA, iB) in zip(eachindex(F), eachindex(A), eachindex(B))
        @inbounds F[iF] = op(A[iA], B[iB])
    end
    return F
end

pabloferz on 4 Aug 2016

👍4

All 20 comments

I cannot reproduce this problem, comparing Version 0.4.5 (2016-03-18 00:58 UTC)
and Version 0.5.0-rc0+0 (2016-07-26 20:22 UTC) on Linux (Ubuntu 16.04).
What did you compare?

ufechner7 on 3 Aug 2016

0.5.0-rc0+174

yuyichao on 3 Aug 2016

Adding the type parameter seems to fix it, I was trying to take some type parameters out, but I took out too much.

function Base._elementwise{T}(op, ::Type{T}, A::AbstractArray, B::AbstractArray)
    F = similar(A, T, promote_shape(A, B))
    for (iF, iA, iB) in zip(eachindex(F), eachindex(A), eachindex(B))
        @inbounds F[iF] = op(A[iA], B[iB])
    end
    return F
end

pabloferz on 4 Aug 2016

👍4

Yes, the Type{T} ones tend to be important, since we try to avoid specializing on every type argument.

JeffBezanson on 4 Aug 2016

yikes, we really should have run nanosoldier there before merging. whoops.

tkelman on 4 Aug 2016

17798

pabloferz on 4 Aug 2016

@JeffBezanson Is there documentation / a rough design document / a blog entry / a certain set of functions that one could look at to get a feeling for the rules that govern specialization?

eschnett on 4 Aug 2016

While #17798 helped, it didn't fix all of it. https://github.com/JuliaCI/BaseBenchmarkReports/blob/f42bed6fb5e9d16970da9b58cf24755de6dc7d0f/daily_2016_8_4/report.md I think what I'm going to do is revert #17389 on the release-0.5 branch for rc1.

tkelman on 4 Aug 2016

I think that for the rest of the problems another type parameter (again) does the trick

function promote_op{S}(f, ::Type{S})
    T = _promote_op(f, _default_type(S))
    return isleaftype(S) ? T : typejoin(S, T)
end
function promote_op{R,S}(f, ::Type{R}, ::Type{S})
    T = _promote_op(f, _default_type(R), _default_type(S))
    isleaftype(R) && return isleaftype(S) ? T : typejoin(S, T)
    return isleaftype(S) ? typejoin(R, T) : typejoin(R, S, T)
end

(currently these two do not have type parameters).

So, echoing @eschnett, a guideline here would be helpful (maybe even worth a place on the performance tips).

pabloferz on 4 Aug 2016

My mental model has been that function arguments are only for dispatch and has nothing to do with performance (except for ANY). That is apparently wrong so I would also be interested in that.

KristofferC on 4 Aug 2016

As far as I can see the also seem to matter when dispatching on Type{T}s, for the rest it seems that your mental model (which is the one endorsed) still works.

pabloferz on 4 Aug 2016

Alright, good to know. Thanks.

KristofferC on 4 Aug 2016

@pabloferz, my understanding is that if you write a function f(T::Type), then T is just a value that is determined at runtime (the same version of f is compiled for all values of T), whereas if you write f{T}(::Type{T}) then T is part of the type signature of the function — hence it is known at compile time and a specialized version of f is compiled for every T.

stevengj on 4 Aug 2016

@stevengj That was my understanding too. But, I forgot for a moment while writing the changes on #17389.

Now I believe that a bad interaction between the object_id hashing, cached typed changes and using inference to try to find the return type (and some missing static type parameters) is causing most of these problems. But I'm not actually sure.

pabloferz on 4 Aug 2016

@pabloferz Could we have a PR with the more complete fix?

JeffBezanson on 4 Aug 2016

I can work on that, but I won't be able to get to it until Tuesday.

pabloferz on 5 Aug 2016

in that case we'll probably have to put rc2 out without reinstating https://github.com/JuliaLang/julia/pull/17389

tkelman on 5 Aug 2016

IIUC, this performance regression no longer exists on the release branch, so this is not blocking anymore.

JeffBezanson on 5 Aug 2016

I think this is fixed by #17929. Reopen or leave a comment if you think otherwise.

tkelman on 12 Aug 2016

LGTM.