Julia: sum(f, itr) when itr is an empty collection should return zero

Created on 13 Jun 2020  路  11Comments  路  Source: JuliaLang/julia

The sum of an empty, typed array is zero:

julia> sum(Int[])
0

julia> sum(Bool[])
0

Broadcasting over an empty array returns an empty array:

julia> (x -> 2x).(Int[])
0-element Array{Int64,1}

julia> Int[] .== 1
0-element BitArray{1}

So, as expected, summing over the above broadcasted operations returns zero:

julia> sum((x -> 2x).(Int[]))
0

julia> sum(Int[] .== 1)
0

However, if we perform the same operations using the sum(f, itr) method, we get an ArgumentError:

julia> sum(x -> 2x, Int[])
ERROR: ArgumentError: reducing over an empty collection is not allowed
Stacktrace:
 [1] _empty_reduce_error() at ./reduce.jl:295
 [2] mapreduce_empty(::Function, ::Function, ::Type{T} where T) at ./reduce.jl:334
 [3] _mapreduce(::var"#7#8", ::typeof(Base.add_sum), ::IndexLinear, ::Array{Int64,1}) at ./reduce.jl:392
 [4] _mapreduce_dim(::Function, ::Function, ::NamedTuple{(),Tuple{}}, ::Array{Int64,1}, ::Colon) at ./reducedim.jl:312
 [5] #mapreduce#580 at ./reducedim.jl:307 [inlined]
 [6] mapreduce at ./reducedim.jl:307 [inlined]
 [7] _sum at ./reducedim.jl:657 [inlined]
 [8] #sum#584 at ./reducedim.jl:653 [inlined]
 [9] sum(::Function, ::Array{Int64,1}) at ./reducedim.jl:653
 [10] top-level scope at REPL[8]:1

julia> sum(==(1), Int[])
ERROR: ArgumentError: reducing over an empty collection is not allowed
Stacktrace:
 [1] _empty_reduce_error() at ./reduce.jl:295
 [2] mapreduce_empty(::Function, ::Function, ::Type{T} where T) at ./reduce.jl:334
 [3] _mapreduce(::Base.Fix2{typeof(==),Int64}, ::typeof(Base.add_sum), ::IndexLinear, ::Array{Int64,1}) at ./reduce.jl:392
 [4] _mapreduce_dim(::Function, ::Function, ::NamedTuple{(),Tuple{}}, ::Array{Int64,1}, ::Colon) at ./reducedim.jl:312
 [5] #mapreduce#580 at ./reducedim.jl:307 [inlined]
 [6] mapreduce at ./reducedim.jl:307 [inlined]
 [7] _sum at ./reducedim.jl:657 [inlined]
 [8] #sum#584 at ./reducedim.jl:653 [inlined]
 [9] sum(::Function, ::Array{Int64,1}) at ./reducedim.jl:653
 [10] top-level scope at REPL[9]:1
fold

Most helpful comment

Defining sum(f, itr) for empty generic itr and arbitrary f is not possible in the current Julia infrastructure.

Well, technically it's possible if you "cheat" and use broadcasting: mysum(f, itr) = sum(f.(itr)).

Actually, no, it's not defining any concrete API. It's exposing undefined behavior of return_type rather than throwing an exception.

However, I feel that there's still a semantic difference between sum(f, itr) and sum(f.(itr)) that is not explained in the docstring.

Personally, I think it's better to make f.(itr) "more correct" by returning Array{Union{}} when the input is empty. But I know that this is not very popular opinion.

After #36188, does sum(Int[]) still return zero, or do you have to call sum(Int[]; init=0) to not get an error?

sum(Int[]) should be fine. More generally, if you have sum(xs) with IteratorEltype(xs) isa HasEltype and zero(eltype(xs)) is defined, sum(xs) should return the zero. For example, sum(Iterators.filter(>(0), Int[])) works. If not, please open an issue.

All 11 comments

What if, for example, f(x) always returns some specific type, regardless of the type of x?

For example, suppose we define f(x) = 1.0. So f(x) always returns a Float64, regardless of the type of x.

In this example, sum(f, Int[1]) will return a Float64, and sum(f, Int[1, 2]) will return a Float64, but sum(f, Int[]) will return an Int?

Won't that mean that this method of sum is not type-stable? Because you can get two different output types for the same input types.

Well, ok. I guess I shouldn't have put zero(eltype(itr)) in the title. Ideally sum(f, itr) would return the zero of the output type of f (or more precisely, the zero of the type resulting from summing an array of the type of the output of f). Because this is inconsistent:

julia> sum((x -> 1.0).(Int[]))
0.0

julia> sum(x -> 1.0, Int[])
ERROR: ArgumentError: reducing over an empty collection is not allowed
Stacktrace:
 [1] _empty_reduce_error() at ./reduce.jl:295
 [2] mapreduce_empty(::Function, ::Function, ::Type{T} where T) at ./reduce.jl:334
 [3] _mapreduce(::var"#13#14", ::typeof(Base.add_sum), ::IndexLinear, ::Array{Int64,1}) at ./reduce.jl:392
 [4] _mapreduce_dim(::Function, ::Function, ::NamedTuple{(),Tuple{}}, ::Array{Int64,1}, ::Colon) at ./reducedim.jl:312
 [5] #mapreduce#580 at ./reducedim.jl:307 [inlined]
 [6] mapreduce at ./reducedim.jl:307 [inlined]
 [7] _sum at ./reducedim.jl:657 [inlined]
 [8] #sum#584 at ./reducedim.jl:653 [inlined]
 [9] sum(::Function, ::Array{Int64,1}) at ./reducedim.jl:653
 [10] top-level scope at REPL[28]:1

However, off the top of my head, I don't know how to implement that. Is there an internal function somewhere in Base or Core that returns the inferred output type of a function?

I can't speak to the efficiency of this, but here's an implementation that has the desired behavior:

julia> mysum(f, itr) = sum(f.(itr))
mysum (generic function with 1 method)

julia> mysum(x -> 1.0, Int[])
0.0

julia> mysum(==(1), Int[])
0

What zero should it return, when the type is not inferrable?

On the other hand:

julia> sum(Array{Union{Float64, Int64},1}())
0

So if sum(x -> x%2==0 ? 1 : 2.0, Int64[]) === 0, then that's consistent.

Defining sum(f, itr) for empty generic itr and arbitrary f is not possible in the current Julia infrastructure.

But we now have init for sum https://github.com/JuliaLang/julia/pull/36188. So, as of Julia 1.6, the best practice would be to use init if you want sum to work for generic possibly empty collections.

(Note: The confusion here is that broadcasting is "cheating" by using the internal inference API return_type in the empty case. The broadcasting implementation weighs more on convenience and performance than the correctness. So, I don't think we should be using the behavior of broadcasting as a reference.)

Defining sum(f, itr) for empty generic itr and arbitrary f is not possible in the current Julia infrastructure.

Well, technically it's possible if you "cheat" and use broadcasting: mysum(f, itr) = sum(f.(itr)).

But we now have init for sum #36188. So, as of Julia 1.6, the best practice would be to use init if you want sum to work for generic possibly empty collections.

Ok. And I see that you've updated the docstrings for sum, so that helps to clarify what happens with empty collections. However, I feel that there's still a semantic difference between sum(f, itr) and sum(f.(itr)) that is not explained in the docstring.

@tkf Question:
After #36188, does sum(Int[]) still return zero, or do you have to call sum(Int[]; init=0) to not get an error?

Defining sum(f, itr) for empty generic itr and arbitrary f is not possible in the current Julia infrastructure.

Well, technically it's possible if you "cheat" and use broadcasting: mysum(f, itr) = sum(f.(itr)).

Actually, no, it's not defining any concrete API. It's exposing undefined behavior of return_type rather than throwing an exception.

However, I feel that there's still a semantic difference between sum(f, itr) and sum(f.(itr)) that is not explained in the docstring.

Personally, I think it's better to make f.(itr) "more correct" by returning Array{Union{}} when the input is empty. But I know that this is not very popular opinion.

After #36188, does sum(Int[]) still return zero, or do you have to call sum(Int[]; init=0) to not get an error?

sum(Int[]) should be fine. More generally, if you have sum(xs) with IteratorEltype(xs) isa HasEltype and zero(eltype(xs)) is defined, sum(xs) should return the zero. For example, sum(Iterators.filter(>(0), Int[])) works. If not, please open an issue.

Note: The confusion here is that broadcasting is "cheating" by using the internal inference API return_type in the empty case.

Is that the whole story?
Regarding https://github.com/JuliaLang/julia/issues/36262#issuecomment-643523278
Broadcasting is using that internal inference API to produce the result Array{Union{Float64, Int64},1}(), but that is separate from sum(Array{Union{Float64, Int64},1}()) === 0

On the other hand:

julia> sum(Array{Union{Float64, Int64},1}())
0

So if sum(x -> x%2==0 ? 1 : 2.0, Int64[]) === 0, then that's consistent.

Actually I miss why these behaviors are inconsistent. I'd expect these behaviors if using return_type is OK.

@tkf commented on Jun 13, 2020, 4:51 AM GMT+4:30:

sum(Int[]) should be fine. More generally, if you have sum(xs) with IteratorEltype(xs) isa HasEltype and zero(eltype(xs)) is defined, sum(xs) should return the zero. For example, sum(Iterators.filter(>(0), Int[])) works. If not, please open an issue.

sum(1 for i in []) doesn't work though.

I am trying to count something using a for comprehension. I tried length(nothing for i in [1,2] if i == 1), but that throws ERROR: MethodError: no method matching length(::Base.Iterators.Filter{var"#246#248",Array{Int64,1}}).

count(==(1), [1,2]) for that case. More generally, the recently introduced init keyword for sum will help out:

julia> sum(1 for i in []; init = 0)
0

Given the difficulty in determining which zero (i.e. of which type) to return in general for an empty input collection, I wonder whether init is sufficient to close this?

Was this page helpful?
0 / 5 - 0 ratings