Currently, the expression sum([])
leads to an error, since there is no way to know the intended neutral element ("zero") for an array of type Array{Any}
. I suggest to add a method to sum
(or to provide a new function sum0
) that accepts such a zero element. This would correspond to the existing method reduce(op, v0, itr)
.
The implementation would be straightforward:
function sum0(itr, v0)
isempty(itr) && return v0
sum(itr)
end
Equivalent methods should be added for prod
, minimum
, and maximum
as well.
Whether we can use the name sum
for this, or need to use a new name sum0
, depends on what methods currently exist for sum
and whether there would be a conflict.
What's wrong with calling it reduce(+, itr, v0)
?
There's a discussion elsewhere where people complain about / are puzzled about / don't like that sum([])
doesn't return 1
, or true
, or a similar "generic" value for zero. If people feel strongly enough about sum
doing the right thing, then such a method would help.
Otherwise, we need to update the documentation of sum
pointing out that one either needs to ensure the collection is not empty, or needs to ensure that it has an eltype that isn't Any
. Otherwise, one needs to fall back to using reduce(+)
instead. And since there is already a function sum
that doesn't do much more than call reduce(+)
, I'd argue that our beginner-friendliness needs to be extended here.
We could potentially handle this the same way we handle inferring the element type of empty comprehensions. Of course, that approach has its own predictability issues, but in the future, if we decide that the approach is acceptable we could handle empty reductions the same way – i.e. by calling zero(T)
or one(T)
when the element type of the result can be inferred as concrete.
@StefanKarpinski, we already call zero(T)
in sum
for an empty Array{T}
. The problem is what to do with T=Any
and similar cases where it is impossible to define zero
in a completely general way.
I'm inclined to think this is not a practical problem. If you are summing an Any
array, and it throws an exception on the case of an empty array, what should you do:
reduce(+, itr, v0)
?sum
that lets you supply an extra element?In 95% of such cases, you should probably be using a more concretely typed array. In the remaining 5% of cases, I don't see the problem with calling reduce
. We could add a line to the sum
documentation mentioning this possibility.
Adding an extra optional argument to sum
won't really solve anything because it will be optional. The people who are seeing exceptions now for sum([])
are precisely the people who wouldn't supply the optional argument to sum
until after they encounter the exception, at which point they might as well change the array type or call reduce
.
See also #5311
See #18367 for some relevant discussion as well
It's possible to just call reduce(+, itr, v0)
, and this is the workaround that made its way into Iterators.jl
. But I still don't understand what's desirable about throwing an error as opposed to returning an additive identity of the simplest possible type. Isn't the whole point of providing sum
, prod
, etc. to provide a better interface?
sum
is defined as a generic function — it's not just const sum = xs -> reduce(+, xs)
. It comes with features like finding a meaningful zero type, which reduce
can't do. So why does it just give up on the empty case? Sure, there's no "universal zero" (although there's nothing preventing that from being added, as @andyferris has suggested). But there's a value that's good enough as universal zero in 95% of cases: currently, false
.
There is no precedent for "if this doesn't hold in full generality, bail out". Imagine if vect
did this:
julia> Base.vect(1, 2)
2-element Array{Int64,1}:
1
2
julia> Base.vect(1.0, 2.0)
2-element Array{Float64,1}:
1.0
2.0
julia> Base.vect()
ERROR: Cannot figure out type.
That's ridiculous, so Base.vect()
picks the array type most likely to work in most situations — Any
.
Same goes for Base.map()
:
julia> map(x -> 0, 1)
0
julia> map(x -> 0, [1])
1-element Array{Int64,1}:
0
julia> map(() -> 0)
0
There's no correct return type here either. But a scalar is the sane choice, because it will work properly in most cases. The remaining 5% of cases isn't worth throwing an error for.
Similarly, I think Base.sum([])
and Base.prod([])
ought to pick the value that will work in most cases. Since it's an Any
array, there's no chance of its return value being type stable, so there's no disadvantage to picking something that will promote to any type: Bool
at present, possibly UInt1
or Zero
if desired.
@stevengj I disagree that it's not a practical problem; it's shown up in Iterators.jl
, so clearly it's not merely academic. I agree with your point that it's easy to work around — reduce(+, itr, v0)
works as well — but I'm simply not convinced that this workaround is necessary. Why can't sum
just do the right thing?
Because the empty sum has a well defined answer — the additive identity — but there is no single additive identity that works for any type defining +
, hence there is _no possible_ correct answer for sum(Any[])
. If there were _some_ possible correct answer, even if there were multiple choices, I wouldn't mind us picking one arbitrarily.
In contrast, if Base.vect(args...)
is defined as a container that holds all of its arguments Base.vect() --> Any[]
conforms with that definition, hence it is a _possible_ correct answer.
Similarly, map(f) -> f()
is consistent with the definition of map
. It is true that there are other possible consistent definitions, but when have multiple correct choices I don't think it's a problem to pick the one must likely to be useful.
I disagree that false
is not a correct answer. Sure, it is not _always_ correct, but it is consistent with the definition of sum
the same way map(f) -> f()
is consistent with the definition of map
. Why does it have to work for any type defining +
? false
works for a large number of types; certainly more types than MethodError
does. If we want it to work on all types, then something like a singleton ZeroType
might make sense.
This issue has been marked with the doc tag. If we aren't going to fix sum
and prod
to work properly, then I would be fine with including a warning of that in the documentation, recommending reduce(+, itr, false)
instead when itr
does not have concrete eltype
. I don't think a neutral element is necessary.
The definition for sum
of an empty collection of type T
is the additive identity of type T
. Since there is no universal additive identity of type Any
, then sum
must throw an error. false
is _never_ the correct answer to the question "what is the additive identity for Any
(i.e. for _all_ types)."
The documentation should definitely suggest reduce(+, itr, false)
if one must use a collection of a type that does not have a zero, or alternatively using a more strongly typed collection.
However, the type doesn't have to be concrete. It would be perfectly reasonable to define zero(::Type{Number}) = false
, for example.
(Currently, zero(Number) --> Int(0)
, which is questionable, although this is an additive identity in the sense of ==
if not isequal
.)
That sounds like a fair solution to me. 0
being an additive identity for zero(Number)
is also not true in general, at least in package land: Unitful.jl makes dimensioned quantities subtype Number
also. (Though, arguably, this is the fault of Unitful — Base.Dates
does not do this.)
To follow up on the previous comment, I'm still open to changing Unitful, ideally before I register it as a package and people start relying on it to behave a certain way. I had a productive discussion with @timholy about related issues here: https://github.com/ajkeller34/Unitful.jl/issues/9. To avoid derailing this thread, I'm happy to engage you there if you have thoughts specific to whether or not physical quantities should be a subtype of Number
.
Because the empty sum has a well defined answer — the additive identity — but there is no single additive identity that works for any type defining
+
, hence there is no possible correct answer forsum(Any[])
. If there were some possible correct answer, even if there were multiple choices, I wouldn't mind us picking one arbitrarily.
Since I've been mentioned here, I'll respond to this (highlighted by me) comment by @stevengj by saying that there seems to be an underlying demand for having access to an abstract additive identity. The fact that there isn't one suggests that we should really consider adding one (or just use the zero
function itself), along with generic definitions like +(::typeof{zero}, x) = x
. The advantages would be twofold: you can use it in cases like Vector{Any}
where code isn't type stable so the abstract type won't slow you down but will give you a correct answer for sum([])
, or you can use it in type-stable code as an optimization to remove useless operations (if you don't want to rely on LLVM to _maybe_ do that for you). One could further define other conveniences along the lines of colon(a,b) = a:one:b
. If you wanted to double-down on abstractions, perhaps "zero" as Identity{+}
and "one" as Identity{*}
could be similarly useful.
To me, the downside is there is no limit to where the abstractions end. E.g. we could make assumptions like +
is commutative and *
may or may not be (this thinking already pervades Julia, e.g. "taking string concatenation seriously" #11030). But then what? Implement the distributive law as a default fallback? And so on and so on...
Going back to the feasibility of the opening suggestion: I would love it, because it is being grok-able, type-stable and work for empties, but there are practical problems to making the nicest form of this syntax available. For me, a proposal for sum(xs, zero)
should address this design question:
julia> sum([i^2 for i in 1:5], 0)
ERROR: ArgumentError: dimension must be = 1, got 0
julia> sum(i^2 for i in 1:5, 0)
ERROR: syntax: invalid iteration specification
Vector
s clashes with the second argument giving the dimension, one has to write sum([i^2 for i in 1:5], 1, 0)
insteadfor
loops, where the comma leads the syntax parser expects something like for i in 1:5, j in ...
instead of a second argument for sum
. This can be avoided by sum((i^2 for i in 1:5), 0)
, but we don't usually force parenthesis for generators.So if the sum(xs, zero)
would force ugly notation in most(?) of the useful cases anyway, you might as well give up and write reduce(+, zero, xs)
-- unfortunately. Or is there another way?
Looking at https://github.com/JuliaLang/julia/issues/5311, I realised that the notation sum(T::Type, xs)
can be extended to achieve most of my goals for sum(xs, zero)
. It just needs this PR: https://github.com/JuliaLang/julia/pull/20561. Could that functionality be an adequate replacement for the sum(xs, zero)
proposed here?
Most helpful comment
Because the empty sum has a well defined answer — the additive identity — but there is no single additive identity that works for any type defining
+
, hence there is _no possible_ correct answer forsum(Any[])
. If there were _some_ possible correct answer, even if there were multiple choices, I wouldn't mind us picking one arbitrarily.In contrast, if
Base.vect(args...)
is defined as a container that holds all of its argumentsBase.vect() --> Any[]
conforms with that definition, hence it is a _possible_ correct answer.Similarly,
map(f) -> f()
is consistent with the definition ofmap
. It is true that there are other possible consistent definitions, but when have multiple correct choices I don't think it's a problem to pick the one must likely to be useful.