sum(i for i in Vector{Int}())
raises an error instead of returning 0. The sum of an empty list is zero (mathematically). Moreover in this case the return type (Int
) is also clear. Therefore I don't see why this should not return 0 (Int
). Of course the same applies to any number type.
Related to #18695 and #18873 (generators don't have an eltype
).
The sum of an empty list is the additive identity ("zero"), but the value of the additive identity may not be 0
in general. For example, if sum an empty list of 2×2 integer matrices, the result should be a 2×2 matrix of 0
. But I agree that in this case, in principle, the correct type of zero could be inferred.
I guess it should return "zero" (of type T
, if that is a matrix or whatever) whenever the type T
can be inferred.
"For empty collections, it may throw an error or return zero(T)
, with T
being the element type of the collection, depending on the state of the world." I don't think I like that kind of semantic. It's too likely to lure you into believing it works today just because it worked yesterday.
Since sum(Int[]) === 0
it would not be crazy for this to do the same.
The root problem here is
julia> eltype(i for i in Vector{Int}())
Any
Inference works in this particular case:
julia> g = (i for i in Vector{Int}())
Base.Generator{Array{Int64,1},getfield(Main, Symbol("##7#8"))}(getfield(Main, Symbol("##7#8"))(), Int64[])
julia> Base.return_types(g.f, Tuple{eltype(g.iter)})
1-element Array{Any,1}:
Int64
but it may be tricky to use this information without making sum(::Generator)
type-unstable in general.
Indeed. See also discussion about element type of generators at https://discourse.julialang.org/t/can-eltype-deduce-the-element-type-of-a-generator/6429.
We do generally make a best effort for empty collections based on inferred return types though.
The problem here is that we'd need zero(Any)
for the case where inference doesn't succeed in finding a tight bound.
Sure but why can't the error only happen in that case?
Because that would probably make people rely on inference.
Imaginary bug report:
"I have bar(n) = sum(foo(x) for 1:n)
, and bar(0)
used to work, but since I updated my dependency Foo
to v0.34, it throws an error. Interestingly, the error goes away if I comment out using Bar
in my code"
Reply:
"That's by design. Your function foo
is type-unstable and calls bork
with a parameter that is inferred as Any
. Now, bork
returns an Int
, which can be inferred and all is fine. But Foo
(since v0.34) and Bar
also define methods of bork
. These, too, return an Int
, but inference doesn't know which method is called and there are too many now to infer them all, so the return type is widened to Any
."
That does not sound too attractive to me. (Also, figuring all that out probably cost someone a good amount of time.)
Making the element type of empty arrays depend on inference already was a measure of last resort. Deciding whether to throw an error or not depending on inference seems way too risky to me. Although I admit it will work most of the time, encouraging people to write code that works _only most of the time_ is questionable.
That is true in every case were we use return_type
to determine what to do in the empty collection case. Is the argument that this is worse because it's an error whereas choosing an element type doesn't necessarily cause an error—although it might if your code was previously inserting a value where a change to inference causes a tighter type to be inferred.
Yes. Chances are that the element type of an empty collection does not matter at all. Furthermore, the element type for the empty case may only be wider than in the non-empty case, so inserting an element should usually not be trap. An example where you may get a sudden error would be sum(foo.(x))
for empty x
(or sum(collect(foo(i) for i in Int[]))
, if you try to workaround the issue discussed here).
So for the collect
/broadcast
reliance on return_type
in the empty case, there is a high chance of a change in the inference to Any
going unnoticed/only causing a performance penalty. For sum
, that chance would be 0.
IMO, a much saner approach would be to let sum
take a v0
(or init
, ref #27711) parameter one can pass in situations like this: sum(i for i in Vector{Int}(), init=0)
.
Maybe sum
can take a type parameter that does two things:
1) ensures that all elements encountered are of this type (otherwise throws an error)
2) returns zero of this type if the iterator is empty
I guess it should return "zero" (of type T, if that is a matrix or whatever) whenever the type T can be inferred.
Even if the type can be inferred, you may not be able to create an appropriate zero since array types don't carry their sizes with them. What is
out = sum(i for i in Vector{Matrix{Float64}())
? I mean, what is size(out)
?
Not being able to conjure a zero of some type is a different issue than not knowing what type of zero one wants. That can be a failure of zero(T)
instead of a failure of sum
.
To follow up on the lines of thought hinted at by @martinholters and @StefanKarpinski: Perhaps the real problem is the lack of a generic representation of additive identity, which is a well-defined concept. Would it be crazy to have a singleton type to represent this concept in cases where the type is abstract? In relevant operations it could just promote to the appropriate concrete type.
We already have that: Bool
— false
is a "strong zero" and true
is a "strong one" in the sense that false * x == zero(x)
and true * x == x
. We could potentially return false
in these cases and it wouldn't be too bad since convert(T, false)
is defined to give you zero(T)
in general.
The Bool
has additional semantics not carried by zero (you can use the Bool
in an if
, but not the zero). Therefore returning false
does not seem consistent.
We'd need a "strong zero" in the additive sense, and false + x == x
is deprecated for arrays x
and probably fails for unitful x
.
The idea of a strong zero seems analogous of I
, the UniformScaling
representing an identity matrix of any size.
It seems inconsistent that prod(()) == 1
is fine but sum(())
is an error. Either it returns zero (following prod
), or prod(())
should also be an error, IMO.
This is something that I often trip over as well. +1 for resolving it.
I would be quite comfortable putting much of the burden on the user by prescribing and init
keyword argument as for reduce
. I.e. if I anticipate that a collection might be empty, then I will call
sum(i for i in Vector{Int}(); init=0)
That's a nice idea, @cortner 👍
Would it be crazy to have a singleton type to represent this concept in cases where the type is abstract?
FYI I created https://github.com/tkf/InitialValues.jl to implement this approach for general foldl
and reduce
. I find it handy if you want to handle "zero" for generic operations.
But, in this case, I'm +1 for sum(...; init)
as suggested by @martinholters and @cortner. In general, I think it makes sense to add init
for any function that is a thin wrapper of foldl
or reduce
.
init
to sum
etc.init
in sum
now. Can this issue be closed?Thanks @tkf! Will it make it into 1.5?
We already had 1.5 feature freeze like a few weeks ago, unfortunately. So, it'll be in 1.6.
Most helpful comment
This is something that I often trip over as well. +1 for resolving it.
I would be quite comfortable putting much of the burden on the user by prescribing and
init
keyword argument as forreduce
. I.e. if I anticipate that a collection might be empty, then I will call