Julia: sum(function, tuple) is slow

Created on 20 Dec 2018  路  19Comments  路  Source: JuliaLang/julia

julia> hsum(x...) = sum(abs2.(float.(x)));
julia> gsum(x...) = sum(y -> abs2(float(y)), x);

julia> @btime gsum(1.,2.,3.,4.,5.,6.,7.,8.)
  10.907 ns (0 allocations: 0 bytes)
204.0
julia> @btime hsum(1.,2.,3.,4.,5.,6.,7.,8.)
  2.436 ns (0 allocations: 0 bytes)
204.0

gsum should not be so slow compared to hsum, right?

performance

Most helpful comment

It works the same w.r.t dispatch but it forces specialization on the argument.

All 19 comments

https://github.com/JuliaLang/julia/issues/30421, they use different summation implementation.

Maybe we should have a specialized sum(function, tuple) method similar to our specialized sum(tuple)?

@KristofferC Here I don't have a sum of a generator, but of a Tuple. I'm not sure it's the same as the other issue.

I think that when you do sum(f, tuple) it actually constructs a generator?

I think that when you do sum(f, tuple) it actually constructs a generator?

No, it calls mapfoldl(y -> abs2(float(y)), Base.add_sum, x). The difference is that we have a special sum method for tuples:

sum(x::Tuple{Any, Vararg{Any}}) = +(x...)

What about defining:

sum(f, x::Tuple{Any, Vararg{Any}}) = +(f.(x)...)

This allocates for some reason. Any ideas?

julia> sum1(f, x::Tuple}) = +(f.(x)...)
julia> @btime sum1(sin, (1.,2.,3.,4.,5.))
  562.674 ns (10 allocations: 288 bytes)
0.1761616497223787
julia> @btime sum(sin, (1.,2.,3.,4.,5.))
  56.020 ns (0 allocations: 0 bytes)
0.1761616497223787

Probably sufficiently large tuples get heap-allocated?

Probably sufficiently large tuples get heap-allocated?

julia> @btime sum1(sin, (1.,2.))
  20.082 ns (0 allocations: 0 bytes)
1.7507684116335782

julia> @btime sum1(sin, (1.,2.,3.))
  30.955 ns (0 allocations: 0 bytes)
1.8918884196934453

julia> @btime sum1(sin, (1.,2.,3.,4.))
  40.153 ns (0 allocations: 0 bytes)
1.1350859243855171

julia> @btime sum1(sin, (1.,2.,3.,4.,5.))
  474.749 ns (10 allocations: 288 bytes)
0.1761616497223787

I don't understand this.

I don't know why the threshold is a 5-tuple, but suppose you have a tuple of 1000 elements聽鈥斅爓here would the result of f.(x) be stored? Clearly there must be some threshold beyond which the temporary tuple is heap-allocated, no?

I think there should be no temporary tuple at all. The operation can be done in-place.

It seems too optimistic to hope that the compiler should always be able to figure out that it can eliminate the tuple construction.

The following is based on the Base.afoldl code used in Base.+(a,b,c,d...), and seems to always be fast without allocating:

mapafoldl(F,op,a) = F(a)
mapafoldl(F,op,a,b) = op(F(a),F(b))
mapafoldl(F,op,a,b,c...) = op(op(F(a),F(b)), mapafoldl(F, op, c...))
function mapafoldl(F,op,a,b,c,d,e,f,g,h,i,j,k,l,m,n,o,p,qs...)
    y = op(op(op(op(op(op(op(op(op(op(op(op(op(op(op(F(a),F(b)),F(c)),F(d)),F(e)),F(f)),F(g)),F(h)),F(i)),F(j)),F(k)),F(l)),F(m)),F(n)),F(o)),F(p))
    for x in qs; y = op(y,F(x)); end
    y
end
mysum(f, x::Tuple) = mapafoldl(f, +, x...)

(Indeed, it seems like we could use the above to define an efficient mapfoldl for tuples, which would give us a fast mapreduce, sum, etcetera.)

(Note that the above is not quite right for a general mapfoldl because it is not left-associative. I'll put together a PR with a proper version.)

sum(f, x::Tuple{Any, Vararg{Any}}) = +(f.(x)...)

should probably be

sum(f, x::Tuple{Any, Vararg{Any}}) = sum(f, x for x in x)

to avoid the unnecessary materialization.

I don't think it matters for tuples, no allocation happens anyway.

This allocates for some reason. Any ideas?

You have an argument of type <:Function that does not appear in head position in the body of the function definition. Therefore it does not get specialized (@code_native sum1(...) shows you the fully realized code instead of the actually existing code).

julia> using BenchmarkTools
julia> sum1(f, x::Tuple) = +(f.(x)...)
julia> sum2(f::FF, x::Tuple) where FF = +(f.(x)...)

julia> @btime sum1(sin, (1.,2.,3.,4.,5.))
  597.669 ns (10 allocations: 288 bytes)
0.1761616497223787

julia> @btime sum2(sin, (1.,2.,3.,4.,5.))
  62.396 ns (0 allocations: 0 bytes)
0.1761616497223787

@chethega I don't understand the difference. I thought ::F where F was equivalent to ::Any.

It works the same w.r.t dispatch but it forces specialization on the argument.

Was this page helpful?
0 / 5 - 0 ratings