Julia: stop being clever with reduction result types

Created on 10 Feb 2017  Â·  21Comments  Â·  Source: JuliaLang/julia

Let's play "guess the result type":

julia> typeof(sum(rand(Bool, 4)))
Int64

julia> typeof(sum(rand(Int8, 4)))
Int32

julia> typeof(sum(rand(UInt8, 4)))
UInt32

julia> typeof(sum(rand(Int16, 4)))
Int32

julia> typeof(sum(rand(UInt16, 4)))
UInt32

julia> typeof(sum(rand(Int32, 4)))
Int64

julia> typeof(sum(rand(UInt32, 4)))
UInt64

julia> typeof(sum(rand(Int64, 4)))
Int64

julia> typeof(sum(rand(UInt64, 4)))
UInt64

julia> typeof(sum(rand(Int128, 4)))
Int128

julia> typeof(sum(rand(UInt128, 4)))
UInt128

This isn't entirely inconsistent, but it's not super simple either. It's not entirely obvious why 32 bits is some kind of special threshold on a 64-bit machine. I propose that we simply return the type that + would have produced instead and if people want a different result type for their reduction, we should allow them to choose it by specifying a zero element of that type. This applies to prod and other built-in reductions as well.

decision maths

Most helpful comment

I do generally agree that less magic is better, but in the case of really small types I worry that making the default sum(Int8[...]) -> Int8 is effectively useless.

All 21 comments

So should typeof(sum(xs)) == typeof(x[1] + x[2] + ... + x[N]) but typeof(sum(xs, v0)) == typeof(v0)? Or should typeof(sum(xs)) == typeof(zero(eltype(xs)) + zero(eltype(xs)))?

I like the former, because it generalizes to reduce for any weird type-untable op. (I guess that's what we do if we don't hit a special case?) But this generalization would also suggest that typeof(sum(xs, v0)) == typeof(v0 + x[1] + x[2] + ... + x[N]). Then the zero element could no longer be used to choose the result type, only to influence it (especially ensuring a minimum bit size). But that's probably sufficient in practice?

I think the zero element should dictate the result type if given, not influence it. I guess I feel like the former behavior is better but I don't think there should be any difference for reasonable types, so it probably doesn't matter much.

The issue with returning just the result of x[1] + x[2] + ... is that many people will suffer from overflow without knowing it. Moments ago I just submitted something rather similar over at https://github.com/JuliaImages/ImageFiltering.jl/pull/22. The strategy I took was to accumulate using a wider type, hoping for the InexactError when storage didn't work out. If that's viewed as desirable, we could make the last line of mapreduce be oftype(v0, result) and get similar behavior.

We don't really coddle people when it comes to overflow anywhere else in the language, although this is a bit different since having at native size accumulator is pretty efficient. It would be better not to make the return type user-visible. I still worry that we won't be guaranteed to catch all overflows – we'll just catch some of them.

I do generally agree that less magic is better, but in the case of really small types I worry that making the default sum(Int8[...]) -> Int8 is effectively useless.

I'm fine with it being something "useful" but the current guessing game of result type is not ok. If it was Int or UInt for every small integer type, I'd be fine with that, and it's way easier to explain/predict than what we're doing now.

+1 for defaulting to Int (or UInt for unsigned) for every integer type narrower than Int.

I like this as a default, but it would be great if there was a way for users to override this behavior.

Otherwise things like adding an array of elements close to zero to each element of the array (or anything similar) would allocate a ton.

The override would be by providing a "neutral" element (zero) of the desired type. So if you wanted the useless sum(Int8[...]) => Int8 behavior, you could write sum(zero(Int8), v).

Otherwise things like adding an array of elements close to zero to each element of the array (or anything similar) would allocate a ton.

I don't understand this. What do you mean?

I'm also fine with defaulting to Int and UInt.

Ok, seems like we have a decision here and this just needs a PR.

I've run into very same problem with JuliaMath/FixedPointNumbers.jl#41 where we discussed widening of Normed{T,f} to Fixed{Int64, f} for all math operations to protect ourselves from over/underflow errors. From my point of view, there is no use case for overflow math in reduction operations and I fully support widening of UInt8 directly to UInt64 or even better to Int64.

From my point of view, there is no use case for overflow math in reduction operations and I fully support widening of UInt8 directly to UInt64 or even better to Int64.

There are other ways to do this in Julia, but what about reducing to find the max along a particular axis?

Not all reductions have to behave identically. For minimum, maximum and extrema it seems sensible to return the actual values from the input.

I think this is reasonable as long as there's a way to dictate the return type. We have too many cases in my package where memory is at a premium, and we're introducing abstractions specifically to allow larger data structures with fewer allocations.

@TotalVerb, would you be willing to tackle this issue and #21523?

Certainly. So the consensus is for the mathy operations that make sense to widen (sum, prod) to widen to Int/UInt always, and for reduce to never widen, right.

Awesome, thanks. Yes I think that's the consensus.

Was this page helpful?
0 / 5 - 0 ratings