Julia: sum(i for i in Vector{Int}()) does not return 0

Created on 25 Jun 2018  ·  28Comments  ·  Source: JuliaLang/julia

sum(i for i in Vector{Int}()) raises an error instead of returning 0. The sum of an empty list is zero (mathematically). Moreover in this case the return type (Int) is also clear. Therefore I don't see why this should not return 0 (Int). Of course the same applies to any number type.

See also: https://discourse.julialang.org/t/sum-i-for-i-in-vector-int-raises-error-instead-of-returning-0/1705/8

Most helpful comment

This is something that I often trip over as well. +1 for resolving it.

I would be quite comfortable putting much of the burden on the user by prescribing and init keyword argument as for reduce. I.e. if I anticipate that a collection might be empty, then I will call

sum(i for i in Vector{Int}(); init=0)

All 28 comments

Related to #18695 and #18873 (generators don't have an eltype).

The sum of an empty list is the additive identity ("zero"), but the value of the additive identity may not be 0 in general. For example, if sum an empty list of 2×2 integer matrices, the result should be a 2×2 matrix of 0. But I agree that in this case, in principle, the correct type of zero could be inferred.

I guess it should return "zero" (of type T, if that is a matrix or whatever) whenever the type T can be inferred.

"For empty collections, it may throw an error or return zero(T), with T being the element type of the collection, depending on the state of the world." I don't think I like that kind of semantic. It's too likely to lure you into believing it works today just because it worked yesterday.

Since sum(Int[]) === 0 it would not be crazy for this to do the same.

The root problem here is

julia> eltype(i for i in Vector{Int}())
Any

Inference works in this particular case:

julia> g = (i for i in Vector{Int}())
Base.Generator{Array{Int64,1},getfield(Main, Symbol("##7#8"))}(getfield(Main, Symbol("##7#8"))(), Int64[])

julia> Base.return_types(g.f, Tuple{eltype(g.iter)})
1-element Array{Any,1}:
 Int64

but it may be tricky to use this information without making sum(::Generator) type-unstable in general.

Indeed. See also discussion about element type of generators at https://discourse.julialang.org/t/can-eltype-deduce-the-element-type-of-a-generator/6429.

We do generally make a best effort for empty collections based on inferred return types though.

The problem here is that we'd need zero(Any) for the case where inference doesn't succeed in finding a tight bound.

Sure but why can't the error only happen in that case?

Because that would probably make people rely on inference.

Imaginary bug report:
"I have bar(n) = sum(foo(x) for 1:n), and bar(0) used to work, but since I updated my dependency Foo to v0.34, it throws an error. Interestingly, the error goes away if I comment out using Bar in my code"
Reply:
"That's by design. Your function foo is type-unstable and calls bork with a parameter that is inferred as Any. Now, bork returns an Int, which can be inferred and all is fine. But Foo (since v0.34) and Baralso define methods of bork. These, too, return an Int, but inference doesn't know which method is called and there are too many now to infer them all, so the return type is widened to Any."

That does not sound too attractive to me. (Also, figuring all that out probably cost someone a good amount of time.)

Making the element type of empty arrays depend on inference already was a measure of last resort. Deciding whether to throw an error or not depending on inference seems way too risky to me. Although I admit it will work most of the time, encouraging people to write code that works _only most of the time_ is questionable.

That is true in every case were we use return_type to determine what to do in the empty collection case. Is the argument that this is worse because it's an error whereas choosing an element type doesn't necessarily cause an error—although it might if your code was previously inserting a value where a change to inference causes a tighter type to be inferred.

Yes. Chances are that the element type of an empty collection does not matter at all. Furthermore, the element type for the empty case may only be wider than in the non-empty case, so inserting an element should usually not be trap. An example where you may get a sudden error would be sum(foo.(x)) for empty x (or sum(collect(foo(i) for i in Int[])), if you try to workaround the issue discussed here).

So for the collect/broadcast reliance on return_type in the empty case, there is a high chance of a change in the inference to Any going unnoticed/only causing a performance penalty. For sum, that chance would be 0.

IMO, a much saner approach would be to let sum take a v0 (or init, ref #27711) parameter one can pass in situations like this: sum(i for i in Vector{Int}(), init=0).

Maybe sum can take a type parameter that does two things:
1) ensures that all elements encountered are of this type (otherwise throws an error)
2) returns zero of this type if the iterator is empty

I guess it should return "zero" (of type T, if that is a matrix or whatever) whenever the type T can be inferred.

Even if the type can be inferred, you may not be able to create an appropriate zero since array types don't carry their sizes with them. What is

out = sum(i for i in Vector{Matrix{Float64}())

? I mean, what is size(out)?

Not being able to conjure a zero of some type is a different issue than not knowing what type of zero one wants. That can be a failure of zero(T) instead of a failure of sum.

To follow up on the lines of thought hinted at by @martinholters and @StefanKarpinski: Perhaps the real problem is the lack of a generic representation of additive identity, which is a well-defined concept. Would it be crazy to have a singleton type to represent this concept in cases where the type is abstract? In relevant operations it could just promote to the appropriate concrete type.

We already have that: Boolfalse is a "strong zero" and true is a "strong one" in the sense that false * x == zero(x) and true * x == x. We could potentially return false in these cases and it wouldn't be too bad since convert(T, false) is defined to give you zero(T) in general.

The Bool has additional semantics not carried by zero (you can use the Bool in an if, but not the zero). Therefore returning false does not seem consistent.

We'd need a "strong zero" in the additive sense, and false + x == x is deprecated for arrays x and probably fails for unitful x.

The idea of a strong zero seems analogous of I, the UniformScaling representing an identity matrix of any size.

It seems inconsistent that prod(()) == 1 is fine but sum(()) is an error. Either it returns zero (following prod), or prod(()) should also be an error, IMO.

This is something that I often trip over as well. +1 for resolving it.

I would be quite comfortable putting much of the burden on the user by prescribing and init keyword argument as for reduce. I.e. if I anticipate that a collection might be empty, then I will call

sum(i for i in Vector{Int}(); init=0)

That's a nice idea, @cortner 👍

Would it be crazy to have a singleton type to represent this concept in cases where the type is abstract?

FYI I created https://github.com/tkf/InitialValues.jl to implement this approach for general foldl and reduce. I find it handy if you want to handle "zero" for generic operations.

But, in this case, I'm +1 for sum(...; init) as suggested by @martinholters and @cortner. In general, I think it makes sense to add init for any function that is a thin wrapper of foldl or reduce.

36188 adds init to sum etc.

36188 is merged so we can use init in sum now. Can this issue be closed?

Thanks @tkf! Will it make it into 1.5?

We already had 1.5 feature freeze like a few weeks ago, unfortunately. So, it'll be in 1.6.

Was this page helpful?
0 / 5 - 0 ratings

Related issues

yurivish picture yurivish  ·  3Comments

wilburtownsend picture wilburtownsend  ·  3Comments

helgee picture helgee  ·  3Comments

omus picture omus  ·  3Comments

manor picture manor  ·  3Comments