As @timholy pointed out in #35839, the meaning of init
in in-place multi-dimensional reducers like sum!
and count!
are incompatible with reduce
etc.
Now that sum(...; init)
#36188 is merged and would be shipped with Julia 1.6, it's a bit concerning that init
in sum!(r, A; init = false)
and sum(A; init = false)
means something completely different. I think it might make sense to deprecate or "soft-deprecate" init
for sum!
etc. By "soft-deprecate" I mean to add a new keyword argument that has init
as the fallback without depwarn
. This new keyword argument would be documented as a preferred way to control the initialization of the destination array. Since sum!
etc. do not actually document init
, going directly to deprecation might be OK. But, since this keyword argument exists for a long time, I'm inclined to treat this as a fully-fledged public API. (This is a bit tricky case because it's unclear if performance-breaking but semantics-preserving change should be treated as non-breaking.)
For the actual name of the new keyword argument, how about sum!(r, A; fill = true)
instead of sum!(r, A; init = true)
?
Looking at it again, now I'm not so sure about fill
as the new keyword argument. Brainstorming some other names:
initialize
initfirst
needinit
Alternatively, it also makes sense to rename init
of reduce
etc. to identity
, id
, neutral
, or something. We need something similar to init
for foldl
/foldr
/accumulate
, though.
(cc'ing other people who might be interested in this: @nalimilan @mbauman @JeffBezanson)
How about sum!(r, A; keep = false)
or sum!(r, A; discard = true)
?
I think discard
is a good candidate! I guess keep
is kind of a same idea but keep
ing some value happens only if the corresponding region of the input array is zero (or NaN
/missing
is in the destination). I think discard
is better because discard
ing old values happen always.
I also like discard
, great suggestion @mcabbott and thanks @tkf for spearheading this and so many other advances!
Doesn't the keyword argument to sum!
control whether the array gets filled first or not? This keyword isn't documented anywhere that I can see and even figuring out what it does requires perusing some fairly confusing code in reducedim.jl
. Perhaps I'm missing something but I don't get how keep
or discard
are more appropriate than fill
.
Right, init
for sum!
is not mentioned. So very strictly speaking it's not a public API. But I don't know how it's used in the wild.
In Julia 1.5, we'll have count!
which mentions init
:
I don't get how
keep
ordiscard
are more appropriate thanfill
.
Well, I'm OK with fill
, too. I thought discard
was better because fill
_could_ mean filling something else (e.g., filling missing
/NaN
; although I think the chance for such misunderstanding is very low). OK, maybe it can be said for discard
("discarding" missing
s). Naming is damn hard....
I actually think initialize
is slightly better than discard
. I think it also avoid confusion with filling missing
/NaN
.
Another option is reset
I've never used a bang reduction with init
, and — even knowing that this was a boolean flag based on the first post — my first instinct was that this flag would work oppositely as it does (that is, if true, use the existing values in the output array as the reducer's init
kwarg). So that's another motivation to change this.
In some senses, I think a pretty good API would be one in which we defaulted to using the values in the output array and if an init
kwarg is provided we fill!(output, init)
before performing the reduction. Unfortunately that's definitely and hugely breaking.
Alternatively, we could use a sentinel like nothing
or undef
to opt out of initialization and allow init
'ing arbitrary values. Again, though, that's changing the meaning of init
in the wild. It's not used much though — that's a fairly conservative search and really only flags GPUArrays (but it misses multi-line arg lists).
If we're going to use a boolean flag, reset
is the best option yet. I wish we could also convey the zeroing property, but neither zero
nor zero!
make for good kwargs. We could alternatively define this as a function that "preprocesses" the output array, with a default of out->fill!(out, zero(eltype(out)))
... but specifying preprocess=identity
doesn't feel like a good way of opting out of this behavior.
In hindsight, it seems like it would be best if the out-of-place accumulators were defined to always increment the target array: if you want to start at zero, pass in a zeroed array.
maximum!
does something a bit different than filling the identity element (it uses the first slice of the input array). I think auto-initialization is a useful API as initial value handling can be a headache sometimes.
maximum!
does something a bit different than filling the identity element (it uses the first slice of the input array)
You could still think of this as filling an identity element — the "maximative" identity is the typemin
, yeah?
Edit: in fact, this framing would allow for a non-erroring empty reduction:
julia> sum!(fill(NaN, 1,4), ones(0,4))
1×4 Matrix{Float64}:
0.0 0.0 0.0 0.0
julia> maximum!(fill(NaN, 1,4), ones(0,4))
ERROR: BoundsError: attempt to access 0×4 Matrix{Float64} at index [1:1, 1:4]
Well, I'm a bit ambivalent about using typemin
/typemax
for maximums
/minimum
. I think erroring out is a bit better. This is because the "semantic" domain of the function may not be captured by the type. For example, if you have
ys .= sin.(xs)
maximum(ys)
it's in some sense not desirable to get -Inf
when xs
is empty. It'd be nice if it's -1 but of course, that's rather impossible.
In Julia 1.5, we'll have
count!
which mentionsinit
:
Should we remove the mention of that keyword argument before people start using it?
Yeah, good point. Now that everyone agrees that rename is the way to go (and converging to reset
as the new name?), it makes sense to at least hide it in the documentation. I just opened #36305 that does that.
Is everyone OK with renaming init
to reset
?
Can someone write a sentence explaining what the artist formerly known as init
does?
f!(r, A; init)
does the following before the reduction iff init = true
:
sum!
, count!
: fill!(r, zero(eltype(r)))
prod!
: fill!(r, one(eltype(r)))
all!
: fill!(r, true)
any!
: fill!(r, false)
maximum!
, minimum!
: fill r
with the first slice of A
In the context of 0-element reductions:
init
is used as the output. If not provided, error.In the context of 1+-element reductions:
init
is used as an argument to the first call to the reducer. If not provided and there's only one element, just return that one element itself.In the context of bang (in-place) reductions, it's really hard. Here's the best I can do:
init
argument or not.The first two are the ones we're keeping the init
term for (and it fits) so I'm not worried about them. It's the last tough one I'm trying to get a handle on and I fear that reset
is no clearer, which is why I wanted to play the "write a sentence that explains what it does" game since there's usually a word in that sentence that is a good choice for the argument name. One thing that keeps occurring to me is initialized
with the opposite meaning of the current name that indicates whether the incoming argument is already initialized or not (if it isn't, then the reducer should initialize it).
I created a WIP PR for this #36332. I updated the docstring to what it does. I also added a few doctest demos. I guess it'd help the game?
Here is the canonical phrase I used in the PR:
If
reset
istrue
, values inr
are discarded. Otherwise, they are used as the initial value of the accumulation.
(reset
is just a placeholder for now)
One option would be to use the word init
but pass either _an array_ the same shape as out
or nothing
. This would allow folks to pass out
itself, and we could still easily deprecate init::Bool
.
Hm... I'm a bit confused. Do you mean sum!(A; out = r)
instead of sum!(r, A; init = false)
? What is the signature for sum!(r, A)
?
No, I mean you'd call sum!(r, A; init=r)
in cases where you want to preserve the values in r
as the init
state of the reductions. By default we'd pass init=nothing
, which would be the sentinel that flags not having an init
in each reduction. You could also pass init=zeros(size(r))
or whatever to explicitly choose what values you want (without messing with the output ahead of time).
Edit: or we could default to maximum!(r, A; init = view(A, #= whatever indexing would give you the first slice in the shape of r =#)))
If the out
array is not the one mutated, isn't it incompatible with the name out(put)? (Or at least that's how it is used in Numpy so I was confused.)
@StefanKarpinski's initialized
sounds good to me too, BTW.
Hunh? This is just determining what the output array gets pre-filled with. E.g.,
function sum!(r, A; init=zeros(size(r)))
if init === r
# do nothing
elseif init isa Bool
# depwarn
else
r .= init
end
# now proceed with reduction in-place using the values in `r` as the first argument
return r
end
Oh, I misunderstood your proposal. I thought you were suggesting to use out
as the name of the keyword argument.
Wondering about init :: AbstractArray
API, I still think flag-based interface is easier to compose. For example, I think
r = sum!(preprocess(), A; reset = false)
is much cleaner than
r = preprocess()
sum!(r, A; init = r)
Most helpful comment
In the context of 0-element reductions:
init
is used as the output. If not provided, error.In the context of 1+-element reductions:
init
is used as an argument to the first call to the reducer. If not provided and there's only one element, just return that one element itself.In the context of bang (in-place) reductions, it's really hard. Here's the best I can do:
init
argument or not.