Julia: Unexpected type promotion in reduce and mapreduce

Created on 24 Apr 2017  路  6Comments  路  Source: JuliaLang/julia

The reduce function should, according to its docstring, "reduce the given collection with the given binary operator." I believe this is also what most programmers would expect a reduce function to do.

However, this is not always what it does. Some input types, e.g. Int32 are converted before the operator is applied. This can lead to unexpected results.

julia> my_op(a::Int32,b::Int32) = max(a,b)
my_op (generic function with 1 method)

julia> reduce(my_op, ones(Int32,42))
ERROR: MethodError: no method matching my_op(::Int64, ::Int64)
 in mapreduce_impl(::Base.#identity, ::#my_op, ::Array{Int32,1}, ::Int64, ::Int64, ::Int64) at ./reduce.jl:101
 in _mapreduce(::Base.#identity, ::#my_op, ::Base.LinearFast, ::Array{Int32,1}) at ./reduce.jl:168
 in mapreduce(::Function, ::Function, ::Array{Int32,1}) at ./reduce.jl:175
 in reduce(::Function, ::Array{Int32,1}) at ./reduce.jl:179

The exact same problem occurs with mapreduce(identity, my_op, ones(Int32,42)).

I realize that using an accumulator that is wider than the input could be beneficial in some special circumstances, such as if the sum function didn't exist and one wanted to create it using reduce(+, ...) without risking integer overflow. But in those cases one could supply a v0 argument of the desired return type (or there could be an optional accumulator_type argument).

types and dispatch

Most helpful comment

Issue #20560 is mainly about sum. Currently sum implemented as something like sum(a) = reduce(+, a) which makes the two issues related. But if we instead had something like sum(a) = reduce(+, zero(widen(eltype(a))), a) then the two issues would be disjoint.

To me, sum(a) means "the sum of the elements of a". This is a mathematical concept, not a programming one, and I would not necessarily expect the return type to be the same as the element type.

But reduce(+, a) means "apply + repeatedly". This is a programming concept, so for the return type to be something different than the return type of + makes no sense (to me).

It would actually be a nice feature if reduce(+,a) used a smaller accumulator than sum(a) and ran faster when SIMD instructions are available. My guess is that any programmer advanced enough to use mapreduce or reduce is also advanced enough to know about integer overflow. Less advanced programmers would use sum in any case, and the documentation should encourage them to do so.

All 6 comments

Duplicate of #20560

Hmm, I guess it is not quite a duplicate of that issue, but it is related. The resolution (#20607) should try to fix both problems.

Issue #20560 is mainly about sum. Currently sum implemented as something like sum(a) = reduce(+, a) which makes the two issues related. But if we instead had something like sum(a) = reduce(+, zero(widen(eltype(a))), a) then the two issues would be disjoint.

To me, sum(a) means "the sum of the elements of a". This is a mathematical concept, not a programming one, and I would not necessarily expect the return type to be the same as the element type.

But reduce(+, a) means "apply + repeatedly". This is a programming concept, so for the return type to be something different than the return type of + makes no sense (to me).

It would actually be a nice feature if reduce(+,a) used a smaller accumulator than sum(a) and ran faster when SIMD instructions are available. My guess is that any programmer advanced enough to use mapreduce or reduce is also advanced enough to know about integer overflow. Less advanced programmers would use sum in any case, and the documentation should encourage them to do so.

Well stated. I too have never been a fan of reduce's auto-widening. In my opinion, it violates the principle of least surprise.

+100 @perrutquist

Another benefit of making the distinction sum = "mathematical sum"; reduce= "iteration of + operator" is that it would be natural for sum to throw an OverflowError on integer overflow, sacrificing some speed for mathematical correctness.
If an M-bit accumulator is used to sum N-bit elements, then partial sums of 2^(M-N) elements can be computed without checking, so there would not be much of a performance penalty in common cases like M=64, N=32.
Advanced users could still use mapreduce(Int64, +, arr) to get the 64-bit sum of a 32-bit array without overflow checking.

Was this page helpful?
0 / 5 - 0 ratings