The reduce
function should, according to its docstring, "reduce the given collection with the given binary operator." I believe this is also what most programmers would expect a reduce
function to do.
However, this is not always what it does. Some input types, e.g. Int32
are converted before the operator is applied. This can lead to unexpected results.
julia> my_op(a::Int32,b::Int32) = max(a,b)
my_op (generic function with 1 method)
julia> reduce(my_op, ones(Int32,42))
ERROR: MethodError: no method matching my_op(::Int64, ::Int64)
in mapreduce_impl(::Base.#identity, ::#my_op, ::Array{Int32,1}, ::Int64, ::Int64, ::Int64) at ./reduce.jl:101
in _mapreduce(::Base.#identity, ::#my_op, ::Base.LinearFast, ::Array{Int32,1}) at ./reduce.jl:168
in mapreduce(::Function, ::Function, ::Array{Int32,1}) at ./reduce.jl:175
in reduce(::Function, ::Array{Int32,1}) at ./reduce.jl:179
The exact same problem occurs with mapreduce(identity, my_op, ones(Int32,42))
.
I realize that using an accumulator that is wider than the input could be beneficial in some special circumstances, such as if the sum
function didn't exist and one wanted to create it using reduce(+, ...)
without risking integer overflow. But in those cases one could supply a v0
argument of the desired return type (or there could be an optional accumulator_type
argument).
Duplicate of #20560
Hmm, I guess it is not quite a duplicate of that issue, but it is related. The resolution (#20607) should try to fix both problems.
Issue #20560 is mainly about sum
. Currently sum
implemented as something like sum(a) = reduce(+, a)
which makes the two issues related. But if we instead had something like sum(a) = reduce(+, zero(widen(eltype(a))), a)
then the two issues would be disjoint.
To me, sum(a)
means "the sum of the elements of a
". This is a mathematical concept, not a programming one, and I would not necessarily expect the return type to be the same as the element type.
But reduce(+, a)
means "apply +
repeatedly". This is a programming concept, so for the return type to be something different than the return type of +
makes no sense (to me).
It would actually be a nice feature if reduce(+,a)
used a smaller accumulator than sum(a)
and ran faster when SIMD instructions are available. My guess is that any programmer advanced enough to use mapreduce
or reduce
is also advanced enough to know about integer overflow. Less advanced programmers would use sum
in any case, and the documentation should encourage them to do so.
Well stated. I too have never been a fan of reduce
's auto-widening. In my opinion, it violates the principle of least surprise.
+100 @perrutquist
Another benefit of making the distinction sum
= "mathematical sum"; reduce
= "iteration of +
operator" is that it would be natural for sum
to throw an OverflowError
on integer overflow, sacrificing some speed for mathematical correctness.
If an M-bit accumulator is used to sum N-bit elements, then partial sums of 2^(M-N) elements can be computed without checking, so there would not be much of a performance penalty in common cases like M=64, N=32.
Advanced users could still use mapreduce(Int64, +, arr)
to get the 64-bit sum of a 32-bit array without overflow checking.
Most helpful comment
Issue #20560 is mainly about
sum
. Currentlysum
implemented as something likesum(a) = reduce(+, a)
which makes the two issues related. But if we instead had something likesum(a) = reduce(+, zero(widen(eltype(a))), a)
then the two issues would be disjoint.To me,
sum(a)
means "the sum of the elements ofa
". This is a mathematical concept, not a programming one, and I would not necessarily expect the return type to be the same as the element type.But
reduce(+, a)
means "apply+
repeatedly". This is a programming concept, so for the return type to be something different than the return type of+
makes no sense (to me).It would actually be a nice feature if
reduce(+,a)
used a smaller accumulator thansum(a)
and ran faster when SIMD instructions are available. My guess is that any programmer advanced enough to usemapreduce
orreduce
is also advanced enough to know about integer overflow. Less advanced programmers would usesum
in any case, and the documentation should encourage them to do so.