Julia: sum(function, tuple) is slow

Created on 20 Dec 2018 · 19Comments · Source: JuliaLang/julia

julia> hsum(x...) = sum(abs2.(float.(x)));
julia> gsum(x...) = sum(y -> abs2(float(y)), x);

julia> @btime gsum(1.,2.,3.,4.,5.,6.,7.,8.)
  10.907 ns (0 allocations: 0 bytes)
204.0
julia> @btime hsum(1.,2.,3.,4.,5.,6.,7.,8.)
  2.436 ns (0 allocations: 0 bytes)
204.0

gsum should not be so slow compared to hsum, right?

performance

Source

cossio

Most helpful comment

It works the same w.r.t dispatch but it forces specialization on the argument.

KristofferC on 3 Jan 2019

👍3

All 19 comments

https://github.com/JuliaLang/julia/issues/30421, they use different summation implementation.

KristofferC on 20 Dec 2018

Maybe we should have a specialized sum(function, tuple) method similar to our specialized sum(tuple)?

stevengj on 20 Dec 2018

👍3

@KristofferC Here I don't have a sum of a generator, but of a Tuple. I'm not sure it's the same as the other issue.

cossio on 20 Dec 2018

I think that when you do sum(f, tuple) it actually constructs a generator?

stevengj on 20 Dec 2018

I think that when you do sum(f, tuple) it actually constructs a generator?

No, it calls mapfoldl(y -> abs2(float(y)), Base.add_sum, x). The difference is that we have a special sum method for tuples:

sum(x::Tuple{Any, Vararg{Any}}) = +(x...)

nalimilan on 20 Dec 2018

What about defining:

sum(f, x::Tuple{Any, Vararg{Any}}) = +(f.(x)...)

cossio on 20 Dec 2018

👍2

This allocates for some reason. Any ideas?

julia> sum1(f, x::Tuple}) = +(f.(x)...)
julia> @btime sum1(sin, (1.,2.,3.,4.,5.))
  562.674 ns (10 allocations: 288 bytes)
0.1761616497223787
julia> @btime sum(sin, (1.,2.,3.,4.,5.))
  56.020 ns (0 allocations: 0 bytes)
0.1761616497223787

cossio on 20 Dec 2018

Probably sufficiently large tuples get heap-allocated?

stevengj on 20 Dec 2018

Probably sufficiently large tuples get heap-allocated?

julia> @btime sum1(sin, (1.,2.))
  20.082 ns (0 allocations: 0 bytes)
1.7507684116335782

julia> @btime sum1(sin, (1.,2.,3.))
  30.955 ns (0 allocations: 0 bytes)
1.8918884196934453

julia> @btime sum1(sin, (1.,2.,3.,4.))
  40.153 ns (0 allocations: 0 bytes)
1.1350859243855171

julia> @btime sum1(sin, (1.,2.,3.,4.,5.))
  474.749 ns (10 allocations: 288 bytes)
0.1761616497223787

I don't understand this.

cossio on 20 Dec 2018

I don't know why the threshold is a 5-tuple, but suppose you have a tuple of 1000 elements — where would the result of f.(x) be stored? Clearly there must be some threshold beyond which the temporary tuple is heap-allocated, no?

stevengj on 20 Dec 2018

I think there should be no temporary tuple at all. The operation can be done in-place.

cossio on 20 Dec 2018

It seems too optimistic to hope that the compiler should always be able to figure out that it can eliminate the tuple construction.

The following is based on the Base.afoldl code used in Base.+(a,b,c,d...), and seems to always be fast without allocating:

mapafoldl(F,op,a) = F(a)
mapafoldl(F,op,a,b) = op(F(a),F(b))
mapafoldl(F,op,a,b,c...) = op(op(F(a),F(b)), mapafoldl(F, op, c...))
function mapafoldl(F,op,a,b,c,d,e,f,g,h,i,j,k,l,m,n,o,p,qs...)
    y = op(op(op(op(op(op(op(op(op(op(op(op(op(op(op(F(a),F(b)),F(c)),F(d)),F(e)),F(f)),F(g)),F(h)),F(i)),F(j)),F(k)),F(l)),F(m)),F(n)),F(o)),F(p))
    for x in qs; y = op(y,F(x)); end
    y
end
mysum(f, x::Tuple) = mapafoldl(f, +, x...)

stevengj on 20 Dec 2018

(Indeed, it seems like we could use the above to define an efficient mapfoldl for tuples, which would give us a fast mapreduce, sum, etcetera.)

stevengj on 20 Dec 2018

👍2

(Note that the above is not quite right for a general mapfoldl because it is not left-associative. I'll put together a PR with a proper version.)

stevengj on 20 Dec 2018

👍2

sum(f, x::Tuple{Any, Vararg{Any}}) = +(f.(x)...)

should probably be

sum(f, x::Tuple{Any, Vararg{Any}}) = sum(f, x for x in x)

to avoid the unnecessary materialization.

Nosferican on 26 Dec 2018

I don't think it matters for tuples, no allocation happens anyway.

nalimilan on 26 Dec 2018

👍1

This allocates for some reason. Any ideas?

You have an argument of type <:Function that does not appear in head position in the body of the function definition. Therefore it does not get specialized (@code_native sum1(...) shows you the fully realized code instead of the actually existing code).

julia> using BenchmarkTools
julia> sum1(f, x::Tuple) = +(f.(x)...)
julia> sum2(f::FF, x::Tuple) where FF = +(f.(x)...)

julia> @btime sum1(sin, (1.,2.,3.,4.,5.))
  597.669 ns (10 allocations: 288 bytes)
0.1761616497223787

julia> @btime sum2(sin, (1.,2.,3.,4.,5.))
  62.396 ns (0 allocations: 0 bytes)
0.1761616497223787

chethega on 26 Dec 2018

@chethega I don't understand the difference. I thought ::F where F was equivalent to ::Any.

cossio on 3 Jan 2019

It works the same w.r.t dispatch but it forces specialization on the argument.

KristofferC on 3 Jan 2019

👍3

Was this page helpful?

0 / 5 - 0 ratings

Related issues

API consistency review

StefanKarpinski · 131Comments

allow overloading of a.b field access syntax

StefanKarpinski · 249Comments

bundle Julia with a better terminal on Windows?

StefanKarpinski · 216Comments

Taking matrix transposes seriously

StefanKarpinski · 141Comments

Jeff Bezanson PhD

stevengj · 174Comments