julia> hsum(x...) = sum(abs2.(float.(x)));
julia> gsum(x...) = sum(y -> abs2(float(y)), x);
julia> @btime gsum(1.,2.,3.,4.,5.,6.,7.,8.)
10.907 ns (0 allocations: 0 bytes)
204.0
julia> @btime hsum(1.,2.,3.,4.,5.,6.,7.,8.)
2.436 ns (0 allocations: 0 bytes)
204.0
gsum
should not be so slow compared to hsum
, right?
https://github.com/JuliaLang/julia/issues/30421, they use different summation implementation.
Maybe we should have a specialized sum(function, tuple)
method similar to our specialized sum(tuple)
?
@KristofferC Here I don't have a sum
of a generator, but of a Tuple
. I'm not sure it's the same as the other issue.
I think that when you do sum(f, tuple)
it actually constructs a generator?
I think that when you do
sum(f, tuple)
it actually constructs a generator?
No, it calls mapfoldl(y -> abs2(float(y)), Base.add_sum, x)
. The difference is that we have a special sum
method for tuples:
sum(x::Tuple{Any, Vararg{Any}}) = +(x...)
What about defining:
sum(f, x::Tuple{Any, Vararg{Any}}) = +(f.(x)...)
This allocates for some reason. Any ideas?
julia> sum1(f, x::Tuple}) = +(f.(x)...)
julia> @btime sum1(sin, (1.,2.,3.,4.,5.))
562.674 ns (10 allocations: 288 bytes)
0.1761616497223787
julia> @btime sum(sin, (1.,2.,3.,4.,5.))
56.020 ns (0 allocations: 0 bytes)
0.1761616497223787
Probably sufficiently large tuples get heap-allocated?
Probably sufficiently large tuples get heap-allocated?
julia> @btime sum1(sin, (1.,2.))
20.082 ns (0 allocations: 0 bytes)
1.7507684116335782
julia> @btime sum1(sin, (1.,2.,3.))
30.955 ns (0 allocations: 0 bytes)
1.8918884196934453
julia> @btime sum1(sin, (1.,2.,3.,4.))
40.153 ns (0 allocations: 0 bytes)
1.1350859243855171
julia> @btime sum1(sin, (1.,2.,3.,4.,5.))
474.749 ns (10 allocations: 288 bytes)
0.1761616497223787
I don't understand this.
I don't know why the threshold is a 5-tuple, but suppose you have a tuple of 1000 elements聽鈥斅爓here would the result of f.(x)
be stored? Clearly there must be some threshold beyond which the temporary tuple is heap-allocated, no?
I think there should be no temporary tuple at all. The operation can be done in-place.
It seems too optimistic to hope that the compiler should always be able to figure out that it can eliminate the tuple construction.
The following is based on the Base.afoldl
code used in Base.+(a,b,c,d...)
, and seems to always be fast without allocating:
mapafoldl(F,op,a) = F(a)
mapafoldl(F,op,a,b) = op(F(a),F(b))
mapafoldl(F,op,a,b,c...) = op(op(F(a),F(b)), mapafoldl(F, op, c...))
function mapafoldl(F,op,a,b,c,d,e,f,g,h,i,j,k,l,m,n,o,p,qs...)
y = op(op(op(op(op(op(op(op(op(op(op(op(op(op(op(F(a),F(b)),F(c)),F(d)),F(e)),F(f)),F(g)),F(h)),F(i)),F(j)),F(k)),F(l)),F(m)),F(n)),F(o)),F(p))
for x in qs; y = op(y,F(x)); end
y
end
mysum(f, x::Tuple) = mapafoldl(f, +, x...)
(Indeed, it seems like we could use the above to define an efficient mapfoldl
for tuples, which would give us a fast mapreduce
, sum
, etcetera.)
(Note that the above is not quite right for a general mapfoldl
because it is not left-associative. I'll put together a PR with a proper version.)
sum(f, x::Tuple{Any, Vararg{Any}}) = +(f.(x)...)
should probably be
sum(f, x::Tuple{Any, Vararg{Any}}) = sum(f, x for x in x)
to avoid the unnecessary materialization.
I don't think it matters for tuples, no allocation happens anyway.
This allocates for some reason. Any ideas?
You have an argument of type <:Function
that does not appear in head position in the body of the function definition. Therefore it does not get specialized (@code_native sum1(...)
shows you the fully realized code instead of the actually existing code).
julia> using BenchmarkTools
julia> sum1(f, x::Tuple) = +(f.(x)...)
julia> sum2(f::FF, x::Tuple) where FF = +(f.(x)...)
julia> @btime sum1(sin, (1.,2.,3.,4.,5.))
597.669 ns (10 allocations: 288 bytes)
0.1761616497223787
julia> @btime sum2(sin, (1.,2.,3.,4.,5.))
62.396 ns (0 allocations: 0 bytes)
0.1761616497223787
@chethega I don't understand the difference. I thought ::F where F
was equivalent to ::Any
.
It works the same w.r.t dispatch but it forces specialization on the argument.
Most helpful comment
It works the same w.r.t dispatch but it forces specialization on the argument.