The sum of an empty, typed array is zero:
julia> sum(Int[])
0
julia> sum(Bool[])
0
Broadcasting over an empty array returns an empty array:
julia> (x -> 2x).(Int[])
0-element Array{Int64,1}
julia> Int[] .== 1
0-element BitArray{1}
So, as expected, summing over the above broadcasted operations returns zero:
julia> sum((x -> 2x).(Int[]))
0
julia> sum(Int[] .== 1)
0
However, if we perform the same operations using the sum(f, itr) method, we get an ArgumentError:
julia> sum(x -> 2x, Int[])
ERROR: ArgumentError: reducing over an empty collection is not allowed
Stacktrace:
[1] _empty_reduce_error() at ./reduce.jl:295
[2] mapreduce_empty(::Function, ::Function, ::Type{T} where T) at ./reduce.jl:334
[3] _mapreduce(::var"#7#8", ::typeof(Base.add_sum), ::IndexLinear, ::Array{Int64,1}) at ./reduce.jl:392
[4] _mapreduce_dim(::Function, ::Function, ::NamedTuple{(),Tuple{}}, ::Array{Int64,1}, ::Colon) at ./reducedim.jl:312
[5] #mapreduce#580 at ./reducedim.jl:307 [inlined]
[6] mapreduce at ./reducedim.jl:307 [inlined]
[7] _sum at ./reducedim.jl:657 [inlined]
[8] #sum#584 at ./reducedim.jl:653 [inlined]
[9] sum(::Function, ::Array{Int64,1}) at ./reducedim.jl:653
[10] top-level scope at REPL[8]:1
julia> sum(==(1), Int[])
ERROR: ArgumentError: reducing over an empty collection is not allowed
Stacktrace:
[1] _empty_reduce_error() at ./reduce.jl:295
[2] mapreduce_empty(::Function, ::Function, ::Type{T} where T) at ./reduce.jl:334
[3] _mapreduce(::Base.Fix2{typeof(==),Int64}, ::typeof(Base.add_sum), ::IndexLinear, ::Array{Int64,1}) at ./reduce.jl:392
[4] _mapreduce_dim(::Function, ::Function, ::NamedTuple{(),Tuple{}}, ::Array{Int64,1}, ::Colon) at ./reducedim.jl:312
[5] #mapreduce#580 at ./reducedim.jl:307 [inlined]
[6] mapreduce at ./reducedim.jl:307 [inlined]
[7] _sum at ./reducedim.jl:657 [inlined]
[8] #sum#584 at ./reducedim.jl:653 [inlined]
[9] sum(::Function, ::Array{Int64,1}) at ./reducedim.jl:653
[10] top-level scope at REPL[9]:1
What if, for example, f(x) always returns some specific type, regardless of the type of x?
For example, suppose we define f(x) = 1.0. So f(x) always returns a Float64, regardless of the type of x.
In this example, sum(f, Int[1]) will return a Float64, and sum(f, Int[1, 2]) will return a Float64, but sum(f, Int[]) will return an Int?
Won't that mean that this method of sum is not type-stable? Because you can get two different output types for the same input types.
Well, ok. I guess I shouldn't have put zero(eltype(itr)) in the title. Ideally sum(f, itr) would return the zero of the output type of f (or more precisely, the zero of the type resulting from summing an array of the type of the output of f). Because this is inconsistent:
julia> sum((x -> 1.0).(Int[]))
0.0
julia> sum(x -> 1.0, Int[])
ERROR: ArgumentError: reducing over an empty collection is not allowed
Stacktrace:
[1] _empty_reduce_error() at ./reduce.jl:295
[2] mapreduce_empty(::Function, ::Function, ::Type{T} where T) at ./reduce.jl:334
[3] _mapreduce(::var"#13#14", ::typeof(Base.add_sum), ::IndexLinear, ::Array{Int64,1}) at ./reduce.jl:392
[4] _mapreduce_dim(::Function, ::Function, ::NamedTuple{(),Tuple{}}, ::Array{Int64,1}, ::Colon) at ./reducedim.jl:312
[5] #mapreduce#580 at ./reducedim.jl:307 [inlined]
[6] mapreduce at ./reducedim.jl:307 [inlined]
[7] _sum at ./reducedim.jl:657 [inlined]
[8] #sum#584 at ./reducedim.jl:653 [inlined]
[9] sum(::Function, ::Array{Int64,1}) at ./reducedim.jl:653
[10] top-level scope at REPL[28]:1
However, off the top of my head, I don't know how to implement that. Is there an internal function somewhere in Base or Core that returns the inferred output type of a function?
I can't speak to the efficiency of this, but here's an implementation that has the desired behavior:
julia> mysum(f, itr) = sum(f.(itr))
mysum (generic function with 1 method)
julia> mysum(x -> 1.0, Int[])
0.0
julia> mysum(==(1), Int[])
0
What zero should it return, when the type is not inferrable?
On the other hand:
julia> sum(Array{Union{Float64, Int64},1}())
0
So if sum(x -> x%2==0 ? 1 : 2.0, Int64[]) === 0, then that's consistent.
Defining sum(f, itr) for empty generic itr and arbitrary f is not possible in the current Julia infrastructure.
But we now have init for sum https://github.com/JuliaLang/julia/pull/36188. So, as of Julia 1.6, the best practice would be to use init if you want sum to work for generic possibly empty collections.
(Note: The confusion here is that broadcasting is "cheating" by using the internal inference API return_type in the empty case. The broadcasting implementation weighs more on convenience and performance than the correctness. So, I don't think we should be using the behavior of broadcasting as a reference.)
Defining
sum(f, itr)for empty genericitrand arbitraryfis not possible in the current Julia infrastructure.
Well, technically it's possible if you "cheat" and use broadcasting: mysum(f, itr) = sum(f.(itr)).
But we now have
initforsum#36188. So, as of Julia 1.6, the best practice would be to useinitif you wantsumto work for generic possibly empty collections.
Ok. And I see that you've updated the docstrings for sum, so that helps to clarify what happens with empty collections. However, I feel that there's still a semantic difference between sum(f, itr) and sum(f.(itr)) that is not explained in the docstring.
@tkf Question:
After #36188, does sum(Int[]) still return zero, or do you have to call sum(Int[]; init=0) to not get an error?
Defining
sum(f, itr)for empty genericitrand arbitraryfis not possible in the current Julia infrastructure.Well, technically it's possible if you "cheat" and use broadcasting:
mysum(f, itr) = sum(f.(itr)).
Actually, no, it's not defining any concrete API. It's exposing undefined behavior of return_type rather than throwing an exception.
However, I feel that there's still a semantic difference between
sum(f, itr)andsum(f.(itr))that is not explained in the docstring.
Personally, I think it's better to make f.(itr) "more correct" by returning Array{Union{}} when the input is empty. But I know that this is not very popular opinion.
After #36188, does
sum(Int[])still return zero, or do you have to callsum(Int[]; init=0)to not get an error?
sum(Int[]) should be fine. More generally, if you have sum(xs) with IteratorEltype(xs) isa HasEltype and zero(eltype(xs)) is defined, sum(xs) should return the zero. For example, sum(Iterators.filter(>(0), Int[])) works. If not, please open an issue.
Note: The confusion here is that broadcasting is "cheating" by using the internal inference API
return_typein the empty case.
Is that the whole story?
Regarding https://github.com/JuliaLang/julia/issues/36262#issuecomment-643523278
Broadcasting is using that internal inference API to produce the result Array{Union{Float64, Int64},1}(), but that is separate from sum(Array{Union{Float64, Int64},1}()) === 0
On the other hand:
julia> sum(Array{Union{Float64, Int64},1}()) 0So if
sum(x -> x%2==0 ? 1 : 2.0, Int64[]) === 0, then that's consistent.
Actually I miss why these behaviors are inconsistent. I'd expect these behaviors if using return_type is OK.
@tkf commented on Jun 13, 2020, 4:51 AM GMT+4:30:
sum(Int[])should be fine. More generally, if you havesum(xs)withIteratorEltype(xs) isa HasEltypeandzero(eltype(xs))is defined,sum(xs)should return the zero. For example,sum(Iterators.filter(>(0), Int[]))works. If not, please open an issue.
sum(1 for i in []) doesn't work though.
I am trying to count something using a for comprehension. I tried length(nothing for i in [1,2] if i == 1), but that throws ERROR: MethodError: no method matching length(::Base.Iterators.Filter{var"#246#248",Array{Int64,1}}).
count(==(1), [1,2]) for that case. More generally, the recently introduced init keyword for sum will help out:
julia> sum(1 for i in []; init = 0)
0
Given the difficulty in determining which zero (i.e. of which type) to return in general for an empty input collection, I wonder whether init is sufficient to close this?
Most helpful comment
Actually, no, it's not defining any concrete API. It's exposing undefined behavior of
return_typerather than throwing an exception.Personally, I think it's better to make
f.(itr)"more correct" by returningArray{Union{}}when the input is empty. But I know that this is not very popular opinion.sum(Int[])should be fine. More generally, if you havesum(xs)withIteratorEltype(xs) isa HasEltypeandzero(eltype(xs))is defined,sum(xs)should return the zero. For example,sum(Iterators.filter(>(0), Int[]))works. If not, please open an issue.