Julia: (Soft-)deprecate `init` for `sum!` etc?

Created on 13 Jun 2020 · 28Comments · Source: JuliaLang/julia

As @timholy pointed out in #35839, the meaning of init in in-place multi-dimensional reducers like sum! and count! are incompatible with reduce etc.

Now that sum(...; init) #36188 is merged and would be shipped with Julia 1.6, it's a bit concerning that init in sum!(r, A; init = false) and sum(A; init = false) means something completely different. I think it might make sense to deprecate or "soft-deprecate" init for sum! etc. By "soft-deprecate" I mean to add a new keyword argument that has init as the fallback without depwarn. This new keyword argument would be documented as a preferred way to control the initialization of the destination array. Since sum! etc. do not actually document init, going directly to deprecation might be OK. But, since this keyword argument exists for a long time, I'm inclined to treat this as a fully-fledged public API. (This is a bit tricky case because it's unclear if performance-breaking but semantics-preserving change should be treated as non-breaking.)

For the actual name of the new keyword argument, how about sum!(r, A; fill = true) instead of sum!(r, A; init = true)?

fold

Source

tkf

👍3

Most helpful comment

In the context of 0-element reductions:

init is used as the output. If not provided, error.

In the context of 1+-element reductions:

init is used as an argument to the first call to the reducer. If not provided and there's only one element, just return that one element itself.

In the context of bang (in-place) reductions, it's really hard. Here's the best I can do:

This flag chooses whether to use the existing values in the output as the reducer's init argument or not.

mbauman on 16 Jun 2020

👍3

All 28 comments

Looking at it again, now I'm not so sure about fill as the new keyword argument. Brainstorming some other names:

initialize
initfirst
needinit

Alternatively, it also makes sense to rename init of reduce etc. to identity, id, neutral, or something. We need something similar to init for foldl/foldr/accumulate, though.

(cc'ing other people who might be interested in this: @nalimilan @mbauman @JeffBezanson)

tkf on 14 Jun 2020

How about sum!(r, A; keep = false) or sum!(r, A; discard = true)?

mcabbott on 15 Jun 2020

I think discard is a good candidate! I guess keep is kind of a same idea but keeping some value happens only if the corresponding region of the input array is zero (or NaN/missing is in the destination). I think discard is better because discarding old values happen always.

tkf on 15 Jun 2020

I also like discard, great suggestion @mcabbott and thanks @tkf for spearheading this and so many other advances!

timholy on 15 Jun 2020

Doesn't the keyword argument to sum! control whether the array gets filled first or not? This keyword isn't documented anywhere that I can see and even figuring out what it does requires perusing some fairly confusing code in reducedim.jl. Perhaps I'm missing something but I don't get how keep or discard are more appropriate than fill.

StefanKarpinski on 15 Jun 2020

Right, init for sum! is not mentioned. So very strictly speaking it's not a public API. But I don't know how it's used in the wild.

In Julia 1.5, we'll have count! which mentions init:

https://github.com/JuliaLang/julia/blob/13b07fcc57d268b37234d8b8753b4ef3447be8db/base/reducedim.jl#L393-L400

I don't get how keep or discard are more appropriate than fill.

Well, I'm OK with fill, too. I thought discard was better because fill _could_ mean filling something else (e.g., filling missing/NaN; although I think the chance for such misunderstanding is very low). OK, maybe it can be said for discard ("discarding" missings). Naming is damn hard....

I actually think initialize is slightly better than discard. I think it also avoid confusion with filling missing/NaN.

tkf on 15 Jun 2020

Another option is reset

timholy on 15 Jun 2020

👍2 ❤1

I've never used a bang reduction with init, and — even knowing that this was a boolean flag based on the first post — my first instinct was that this flag would work oppositely as it does (that is, if true, use the existing values in the output array as the reducer's init kwarg). So that's another motivation to change this.

In some senses, I think a pretty good API would be one in which we defaulted to using the values in the output array and if an init kwarg is provided we fill!(output, init) before performing the reduction. Unfortunately that's definitely and hugely breaking.

Alternatively, we could use a sentinel like nothing or undef to opt out of initialization and allow init'ing arbitrary values. Again, though, that's changing the meaning of init in the wild. It's not used much though — that's a fairly conservative search and really only flags GPUArrays (but it misses multi-line arg lists).

If we're going to use a boolean flag, reset is the best option yet. I wish we could also convey the zeroing property, but neither zero nor zero! make for good kwargs. We could alternatively define this as a function that "preprocesses" the output array, with a default of out->fill!(out, zero(eltype(out)))... but specifying preprocess=identity doesn't feel like a good way of opting out of this behavior.

mbauman on 15 Jun 2020

👍2

In hindsight, it seems like it would be best if the out-of-place accumulators were defined to always increment the target array: if you want to start at zero, pass in a zeroed array.

StefanKarpinski on 16 Jun 2020

👍1

maximum! does something a bit different than filling the identity element (it uses the first slice of the input array). I think auto-initialization is a useful API as initial value handling can be a headache sometimes.

tkf on 16 Jun 2020

maximum! does something a bit different than filling the identity element (it uses the first slice of the input array)

You could still think of this as filling an identity element — the "maximative" identity is the typemin, yeah?

Edit: in fact, this framing would allow for a non-erroring empty reduction:

julia> sum!(fill(NaN, 1,4), ones(0,4))
1×4 Matrix{Float64}:
 0.0  0.0  0.0  0.0

julia> maximum!(fill(NaN, 1,4), ones(0,4))
ERROR: BoundsError: attempt to access 0×4 Matrix{Float64} at index [1:1, 1:4]

mbauman on 16 Jun 2020

Well, I'm a bit ambivalent about using typemin/typemax for maximums/minimum. I think erroring out is a bit better. This is because the "semantic" domain of the function may not be captured by the type. For example, if you have

ys .= sin.(xs)
maximum(ys)

it's in some sense not desirable to get -Inf when xs is empty. It'd be nice if it's -1 but of course, that's rather impossible.

tkf on 16 Jun 2020

👍1

In Julia 1.5, we'll have count! which mentions init:

Should we remove the mention of that keyword argument before people start using it?

nalimilan on 16 Jun 2020

Yeah, good point. Now that everyone agrees that rename is the way to go (and converging to reset as the new name?), it makes sense to at least hide it in the documentation. I just opened #36305 that does that.

tkf on 16 Jun 2020

Is everyone OK with renaming init to reset?

tkf on 16 Jun 2020

👍2

Can someone write a sentence explaining what the artist formerly known as init does?

StefanKarpinski on 16 Jun 2020

f!(r, A; init) does the following before the reduction iff init = true:

sum!, count!: fill!(r, zero(eltype(r)))
prod!: fill!(r, one(eltype(r)))
all!: fill!(r, true)
any!: fill!(r, false)
maximum!, minimum!: fill r with the first slice of A

tkf on 16 Jun 2020

👍1

In the context of 0-element reductions:

init is used as the output. If not provided, error.

In the context of 1+-element reductions:

init is used as an argument to the first call to the reducer. If not provided and there's only one element, just return that one element itself.

In the context of bang (in-place) reductions, it's really hard. Here's the best I can do:

This flag chooses whether to use the existing values in the output as the reducer's init argument or not.

mbauman on 16 Jun 2020

👍3

The first two are the ones we're keeping the init term for (and it fits) so I'm not worried about them. It's the last tough one I'm trying to get a handle on and I fear that reset is no clearer, which is why I wanted to play the "write a sentence that explains what it does" game since there's usually a word in that sentence that is a good choice for the argument name. One thing that keeps occurring to me is initialized with the opposite meaning of the current name that indicates whether the incoming argument is already initialized or not (if it isn't, then the reducer should initialize it).

StefanKarpinski on 17 Jun 2020

I created a WIP PR for this #36332. I updated the docstring to what it does. I also added a few doctest demos. I guess it'd help the game?

Here is the canonical phrase I used in the PR:

If reset is true, values in r are discarded. Otherwise, they are used as the initial value of the accumulation.

(reset is just a placeholder for now)

tkf on 17 Jun 2020

One option would be to use the word init but pass either _an array_ the same shape as out or nothing. This would allow folks to pass out itself, and we could still easily deprecate init::Bool.

mbauman on 17 Jun 2020

Hm... I'm a bit confused. Do you mean sum!(A; out = r) instead of sum!(r, A; init = false)? What is the signature for sum!(r, A)?

tkf on 17 Jun 2020

No, I mean you'd call sum!(r, A; init=r) in cases where you want to preserve the values in r as the init state of the reductions. By default we'd pass init=nothing, which would be the sentinel that flags not having an init in each reduction. You could also pass init=zeros(size(r)) or whatever to explicitly choose what values you want (without messing with the output ahead of time).

Edit: or we could default to maximum!(r, A; init = view(A, #= whatever indexing would give you the first slice in the shape of r =#)))

mbauman on 17 Jun 2020

If the out array is not the one mutated, isn't it incompatible with the name out(put)? (Or at least that's how it is used in Numpy so I was confused.)

tkf on 17 Jun 2020

@StefanKarpinski's initialized sounds good to me too, BTW.

tkf on 17 Jun 2020

Hunh? This is just determining what the output array gets pre-filled with. E.g.,

function sum!(r, A; init=zeros(size(r)))
    if init === r
        # do nothing
    elseif init isa Bool
        # depwarn
    else
        r .= init
    end
    # now proceed with reduction in-place using the values in `r` as the first argument
    return r
end

mbauman on 17 Jun 2020

Oh, I misunderstood your proposal. I thought you were suggesting to use out as the name of the keyword argument.

tkf on 17 Jun 2020

Wondering about init :: AbstractArray API, I still think flag-based interface is easier to compose. For example, I think

r = sum!(preprocess(), A; reset = false)

is much cleaner than

r = preprocess()
sum!(r, A; init = r)

tkf on 17 Jun 2020

Was this page helpful?

0 / 5 - 0 ratings

Related issues

bundle Julia with a better terminal on Windows?

StefanKarpinski · 216Comments

API consistency review

StefanKarpinski · 131Comments

abstract types with fields

JeffBezanson · 145Comments

allow overloading of a.b field access syntax

StefanKarpinski · 249Comments

should `in` use `isequal` to test for containment?

StefanKarpinski · 138Comments