Julia: stop being clever with reduction result types

Created on 10 Feb 2017 · 21Comments · Source: JuliaLang/julia

Let's play "guess the result type":

julia> typeof(sum(rand(Bool, 4)))
Int64

julia> typeof(sum(rand(Int8, 4)))
Int32

julia> typeof(sum(rand(UInt8, 4)))
UInt32

julia> typeof(sum(rand(Int16, 4)))
Int32

julia> typeof(sum(rand(UInt16, 4)))
UInt32

julia> typeof(sum(rand(Int32, 4)))
Int64

julia> typeof(sum(rand(UInt32, 4)))
UInt64

julia> typeof(sum(rand(Int64, 4)))
Int64

julia> typeof(sum(rand(UInt64, 4)))
UInt64

julia> typeof(sum(rand(Int128, 4)))
Int128

julia> typeof(sum(rand(UInt128, 4)))
UInt128

This isn't entirely inconsistent, but it's not super simple either. It's not entirely obvious why 32 bits is some kind of special threshold on a 64-bit machine. I propose that we simply return the type that + would have produced instead and if people want a different result type for their reduction, we should allow them to choose it by specifying a zero element of that type. This applies to prod and other built-in reductions as well.

decision maths

Source

StefanKarpinski

👍6

Most helpful comment

I do generally agree that less magic is better, but in the case of really small types I worry that making the default sum(Int8[...]) -> Int8 is effectively useless.

JeffBezanson on 10 Feb 2017

👍7

All 21 comments

So should typeof(sum(xs)) == typeof(x[1] + x[2] + ... + x[N]) but typeof(sum(xs, v0)) == typeof(v0)? Or should typeof(sum(xs)) == typeof(zero(eltype(xs)) + zero(eltype(xs)))?

I like the former, because it generalizes to reduce for any weird type-untable op. (I guess that's what we do if we don't hit a special case?) But this generalization would also suggest that typeof(sum(xs, v0)) == typeof(v0 + x[1] + x[2] + ... + x[N]). Then the zero element could no longer be used to choose the result type, only to influence it (especially ensuring a minimum bit size). But that's probably sufficient in practice?

martinholters on 10 Feb 2017

I think the zero element should dictate the result type if given, not influence it. I guess I feel like the former behavior is better but I don't think there should be any difference for reasonable types, so it probably doesn't matter much.

StefanKarpinski on 10 Feb 2017

👍1

The issue with returning just the result of x[1] + x[2] + ... is that many people will suffer from overflow without knowing it. Moments ago I just submitted something rather similar over at https://github.com/JuliaImages/ImageFiltering.jl/pull/22. The strategy I took was to accumulate using a wider type, hoping for the InexactError when storage didn't work out. If that's viewed as desirable, we could make the last line of mapreduce be oftype(v0, result) and get similar behavior.

timholy on 10 Feb 2017

We don't really coddle people when it comes to overflow anywhere else in the language, although this is a bit different since having at native size accumulator is pretty efficient. It would be better not to make the return type user-visible. I still worry that we won't be guaranteed to catch all overflows – we'll just catch some of them.

StefanKarpinski on 10 Feb 2017

I do generally agree that less magic is better, but in the case of really small types I worry that making the default sum(Int8[...]) -> Int8 is effectively useless.

JeffBezanson on 10 Feb 2017

👍7

Xref https://github.com/JuliaLang/julia/pull/20561

felixrehren on 13 Feb 2017

I'm fine with it being something "useful" but the current guessing game of result type is not ok. If it was Int or UInt for every small integer type, I'd be fine with that, and it's way easier to explain/predict than what we're doing now.

StefanKarpinski on 13 Feb 2017

👍1

+1 for defaulting to Int (or UInt for unsigned) for every integer type narrower than Int.

stevengj on 13 Feb 2017

I like this as a default, but it would be great if there was a way for users to override this behavior.

oscardssmith on 13 Feb 2017

Otherwise things like adding an array of elements close to zero to each element of the array (or anything similar) would allocate a ton.

oscardssmith on 13 Feb 2017

The override would be by providing a "neutral" element (zero) of the desired type. So if you wanted the useless sum(Int8[...]) => Int8 behavior, you could write sum(zero(Int8), v).

StefanKarpinski on 13 Feb 2017

Otherwise things like adding an array of elements close to zero to each element of the array (or anything similar) would allocate a ton.

I don't understand this. What do you mean?

StefanKarpinski on 13 Feb 2017

I'm also fine with defaulting to Int and UInt.

JeffBezanson on 13 Feb 2017

👍4

Ok, seems like we have a decision here and this just needs a PR.

StefanKarpinski on 13 Feb 2017

I've run into very same problem with JuliaMath/FixedPointNumbers.jl#41 where we discussed widening of Normed{T,f} to Fixed{Int64, f} for all math operations to protect ourselves from over/underflow errors. From my point of view, there is no use case for overflow math in reduction operations and I fully support widening of UInt8 directly to UInt64 or even better to Int64.

phlavenk on 15 Feb 2017

From my point of view, there is no use case for overflow math in reduction operations and I fully support widening of UInt8 directly to UInt64 or even better to Int64.

There are other ways to do this in Julia, but what about reducing to find the max along a particular axis?

kmsquire on 15 Feb 2017

Not all reductions have to behave identically. For minimum, maximum and extrema it seems sensible to return the actual values from the input.

StefanKarpinski on 15 Feb 2017

👍1

I think this is reasonable as long as there's a way to dictate the return type. We have too many cases in my package where memory is at a premium, and we're introducing abstractions specifically to allow larger data structures with fewer allocations.

sbromberger on 18 Mar 2017

@TotalVerb, would you be willing to tackle this issue and #21523?

StefanKarpinski on 13 Jul 2017

Certainly. So the consensus is for the mathy operations that make sense to widen (sum, prod) to widen to Int/UInt always, and for reduce to never widen, right.