Julia: Add a 2-arg flavor, argmax(f, domain)

Created on 16 Jun 2018  Â·  84Comments  Â·  Source: JuliaLang/julia

With the rename to argmax, and dims now a kwarg, it seems very tempting to me to define a 2-arg flavor, argmax(f, itr), where itr is an iterable subset of the domain of f. This would return an x in itr that maximizes f(x), which is quite literally the mathematical definition of argmax.

(originally proposed here)

Edit: Previous wording may have been confusing, so I have attempted to improve it.

fold good first issue help wanted

Most helpful comment

I've started on a PR for this. Should be ready pretty soon.

I'd like to make the definitions articulated by @tkf official and documented (these don't require changes to any of the existing methods, as I understand it):

  • argmax(f, [domain]) returns a point in the domain
  • maximum(f, [domain]) returns a point in the codomain
  • findmax(f, [domain]) returns a pair of a point in the codomain and a point in the domain

Where domain is provided it is permitted to be any iterable.

In cases where domain is omitted, f must be an object with an implicit domain. Indexable collections will be interpreted as a mapping from domain (keys) to codomain (values) and implemented with the pairs iterator (no implementation change).

If iterators gain a definition for pairs in the future, this still allows argmax, maximum and findmax to operate lazily.

This makes the 2-arg version of findmax and argmax different to all(?) other 2-arg functions (where f is implied to be identity), but I think that's okay if the definitions are given as above.

Questions

  1. The existing definitions of findmin and findmax state that "If any data element is NaN, this element is returned.", but they don't test for NaN with isnan, they just test if the element is equal to itself. They also don't return the first NaN but the last.

    Those are both strange results, imo. I feel like I should also propagate NaN here, but I wonder if it would be best to test with isnan(f_x) (and perhaps break on the first NaN instead of searching the rest (this could cause confusing behaviour if someone is relying on side effects from f, but... you probably shouldn't do that and the NaN should be a clue that something weird is happening)).

    • Should we use isnan?
    • Should we terminate early on NaN?
  2. The existing definitions use isless, which is only defined for types with total order. I feel like we should be able to use argmax with types that only have a partial order too, so I propose just using <.

    • Should we compare with < or isless?

WIP Implementation

"""
    findmax(f, domain) -> (f(x), x)

Returns a pair of a point in the codomain and a point in the domain such that `f(x)` is maximised. If there are multiple maximal points, then the first one will be returned.

Where `domain` is omitted, `f` must be an indexable collection and is interpreted as a mapping from domain (keys) to codomain (values) and implemented with the `pairs` iterator.

Where `domain` is provided it is permitted to be any iterable.

"""
function findmax(f, domain)
    y = iterate(domain)
    if y === nothing
        throw(ArgumentError("collection must be non-empty"))
    end
    m, s = y
    f_m = f(m)
    while true
        y = iterate(domain, s)
        y === nothing && break
        x, s = y
        f_x = f(x)
        if isnan(f_x) || isless(f_m, f_x) 
            f_m, m = f_x, x
        end
    end
    return (f_m, m)
end

argmax(f, domain) = findmax(f, domain)[2]

All 84 comments

If we have #27612, then this is as simple as defining the following:

Base.argmax(f, itr) = Base.argmax(f(x) for x in itr)

(but I haven't checked what to do about dims yet)

A more correct definition would be to adapt the current definitions of findmin and findmax to accept a function argument, then simply adjust the current argmax definition to

Base.argmax(f, x::AbstractArray; dims=:) = findmax(f, x, dims=dims)[2]
Base.argmax(f, itr) = findmax(f, itr)[2]

with the addition of

Base.argmax(x::AbstractArray; dims=:) = findmax(identity, x, dims=dims)[2]
Base.argmax(itr) = findmax(identity, itr)[2]

and likewise for argmin. Whether generators are supported is somewhat orthogonal here.

Thanks very much for taking a look at this. I think you're right. Would the compiler be able to inline the identity function so there is no cost to this change in the most commonly used case?

Yep. This is the same pattern we use in a lot of places where we define methods that accept function arguments.

Are there any other functions that could be prime candidates to get the function argument improvement? Besides, argmin, argmax... extrema?

Wait, I think there are some subtleties in reconciling @ararslan's nice suggestion with my original intent. Let me try to explain, and please forgive me if I confuse things further (e.g. I'm not entirely sure how to talk about the indices of an AbstractArray, but I gave it a shot!)

One of the difficulties in thinking about this is that there are really two flavors of argmax, the one defined in array.jl and the one defined in reducedim.jl.

In reducedim.jl, argmax defines an operation on A::AbstractArray. Here, I think the domain is implicitly CartesianIndices(axes(A)) and the function is implicitly getindex(A, i), where i ∈ CartesianIndices(axes(A)). The way I was thinking about it, argmax(A) is a convenient shorthand for something like argmax(i -> getindex(A, i), CartesianIndices(axes(A))) and I am proposing to generalize this to allow argmax(f, itr) for any function f and collection itr where this makes sense. In the special case that itr::AbstractArray then this would be implemented in reducedim.jl and support the dims kwarg and otherwise we have a fallback function in array.jl as long as keys(itr) is defined.

I think @ararslan's suggestion is distinct from this. If I understand correctly, for A::AbstractArray the domain would still implicitly be CartesianIndices(axes(A)) and the result of the new 2-arg flavor argmax(f, A) would not be an element of A (as proposed above), but rather an index of A. But please, correct me if I have misunderstood!

We may be talking past each other, but I'll try to clarify.

For any iterable x, argmax(x) _always_ returns an index into x, not an element of x. That can be seen more clearly from its maiden name, indmax. To ensure consistency with how the function is supposed to work, argmax(f, x) should return the _index_ i at which f(x[i]) is maximized, not the element x[i]. To get the element, just index this back into x, e.g. x[argmax(f, x)].

I think @ararslan’s is the standard Julian extension of a 1-argument function on collections to the case with an additional function argument. But @drewrobson is proposing the traditional 2-arg mathematical argmax, which is less consistent with the existing argmax but does coexist with it in math notation-land?

Yes exactly:

I completely agree that with the former name, it was clear that indmax should apply to things that can be indexed, and it should return the index that maximizes that thing. I'm also 100 % with you that if we want to stick with that original meaning even after the rename to argmax, then your suggestion is an undoubtably useful way to add functionality that is consistent with the original meaning of indmax.

With the rename to argmax, I am raising the (perhaps heretical!) suggestion that we should change what we mean by this function. To me, argmax now applies most naturally to a function, and it should return the arg that maximizes that function. As a special case, we would preserve the existing 1-arg flavor for things that can be indexed, with the understanding that getindex is the function to be maximized.

In summary, I think this is a backwards compatible and satisfying reinterpretation of what this function should do, but I get that it feels like a radical proposal!

Ohhh okay, I see now, thanks @jekbradbury and @drewrobson for clarifying.

An unfortunate consequence of the renaming of indmax to argmax is indeed that the latter conflicts a bit with the identically named function from math when it comes to applying a function to the given collection. If we're to add a function argument to argmax I think we should ensure that the behavior is consistent with that of other Julia functions that accept function arguments, in particular that g(f, x) is equivalent to g(map(f, x)). I realize that's rather unsatisfying in this case given the more general name, but I really think making the cases behave differently will cause more confusion in the long run.

Yes, avoiding confusion is important. But then, perhaps argmax should _only_ be defined in the 2-arg form proposed above, and then indmax should be defined as indmax(A) = argmax(i -> getindex(A, i), CartesianIndices(axes(A))) (or however you write that). Those names seem closest to what each function does, but I don't know if you'd want to split it up like that.

Edit: @Nosferican's post has reminded me that in addition to the 2-arg form proposed above, we would also keep the 1-arg argmax defined in array.jl for any itr for which keys/pairs is defined. Which will include generators, if #27612 happens!

  • argmax is returning the index rather than the definition due to a historical accident (renaming of indmax to argmax) which should be fixed ASAP
  • argmax works only on pairs-implemented collections due to the implementation which restricts the function to only a subset of all cases included in the mathematical definition (should be fixed)
  • A 2-args method for passing a non identity function should be added
    My very naive implementation complies with the above,
iter = Set(1:3)
function my_argmax(f::Function, iter)
    next = iterate(iter)
    x = next[1]
    y = f(x)
    while next !== nothing
        (i, state) = next
        yáµ¢ = f(i)
        if y < yáµ¢
            x = i[1]
        end
        next = iterate(iter, state)
    end
    return x
end

My ignorant self doesn't know what is the appropriate abstract type for an iterate.

@Nosferican, note that with #27612 this becomes a 1-liner. In other words, first define

Base.keys(g::Base.Generator) = g.iter

and then define

argmax(f::Function, itr) = argmax(f(x) for x in itr)

argmax is returning the index rather than the definition due to a historical accident (renaming of indmax to argmax) which should be fixed ASAP

Can you elaborate on what you mean by that?

An unfortunate consequence of the renaming of indmax to argmax

I may have misinterpreted the above as historical background on why argmax is returning the index rather than the value (it seems that indmax was deprecated in favor of argmax). Per the mathematical definition, the code should return the first element of findmax and not the second (value not the index). It seems to be a classic case of renaming the function to indicate a different concept, but keeping the implementation with the original concept. I would consider this a bug. Even the docstring for argmax uses the definition of indmax instead of one describing the mathematical function.

@nosferican, I think you're talking about the mathematical definition of max, not argmax. The debate here is whether argmax should return an arg or an index, not whether it should return the max value!

Shouldn't be multi-taking. Sorry for the confusion (I suffered it the worst); indmax does return the value and position. The original point before getting lost somewhere, was that it shouldn't be the index of the solution, but the actual value in the domain.

itr = [1, 4, 3]
argmax(abs2, itr) == 4 # not 2

rather than the current behavior which is indmax: argmax(itr) == 2. That way you can support iterate which do not have pairs and it follows the actual mathematical definition: which argument in the domain gives the maximum value in the range.

There we go. Yes, that is consistent with my original proposal above.

I'd be interested to hear @stevengj's thoughts on this.

And perhaps @JeffBezanson and @StefanKarpinski as well? Stefan encouraged me to look into this on Discourse and I know Jeff has seen my other issue at least. Thanks, and sorry to rock the boat! I mean well.

I think the bug was an oversight when renaming the function from indmax to argmax (two different mathematical concepts). @drewrobson, thanks for bringing that up. Wouldn't have wanted to have this wrong in a release.

I don't see a problem with the single-argument argmax(a) returning the key of the maximum; it seems pretty natural to view an index container a as a map from keys k to values a[k].

I like that. That is consistent with what I was trying to say here:

"As a special case, we would preserve the existing 1-arg flavor for things that can be indexed, with the understanding that getindex is the function to be maximized."

But the remaining question is, what should be the result of the 2-arg form:

itr = [1, 4, 3]
argmax(abs2, itr)

I think @Nosferican and I are arguing that it should be 4, the arg / domain value in itr that maximizes abs2, whereas if we want argmax(abs2, itr) to be the same as argmax(map(abs2, itr)) then it should be 2, the index in itr that maximizes abs2.(itr).

Edit: If we go down the latter road then I think we'd want separate verbs indmax and argmax to avoid the confusion that @ararslan warned about. indmax(f, itr) would do what @ararslan proposed and argmax(f, itr) would do the mathematical operation proposed in this issue.

indmax should have not been deprecated for argmax as these are two different concepts. argmax as a mathematical concept is defined for sets. The actual definition for argmax doesn't deal with indices at all because it is a subset function. It subsets a set to include only elements with mappings to the maximum of the range.

obj = Set(-3:3)
argmax(abs2, obj) == Set([-3, 3])

Long-hand form,

function argmax(f::Function, domain::Any)
    output = [ (x, f(x)) for x ∈ domain ] # Generates the domain and mapping
    frontier = maximum(last.(output)) # Obtains the frontier
    output = filter(x -> isequal(last(x), frontier), output) # subsets domain
    output = Set(first.(output)) # return the elements which meet the criteria
    return output
end

Refresher: Wikipedia

@Nosferican is right to bring up the issue of ties. I have to sheepishly admit I was trying to avoid that because for efficiency reasons I wanted to do the same thing as indmax and just return an answer for _one_ maximum. Honestly, if this constructs a Set I'd probably go back to using indmax and then indexing into my domain / collection. Hmm.

A compromise I would be willing to accept is for ordered collections to return a Vector

function argmax(f::Function, domain::AbstractVector)
    range_of_domain = f.(domain)
    frontier = maximum(range_of_domain)
    return domain[range_of_domain .== frontier]
end

For Dict, the proper way would be call argmax(f, keys(dict)) or argmax(f, values(dict)) to avoid ambiguity.

I really like the current behavior of argmax(d::Dict), which returns one key that maximizes d. I suspect in most real life use cases, the efficiency of doing a single pass and keeping the best result is what a programmer wants. Admittedly, this is a slight departure from the rigorous mathematical definition, but code has to live in the real world! And even in math, in casual usage I'm pretty sure I've seen "let x = argmax_{x} f(x)" when technically one should have written "let x ∈ argmax_{x} f(x)", because sometimes one isn't picky about which x as long as it maximizes f(x) and so the fact that there may be more than one to pick from is just a nuisance.

When in doubt, drop a keyword argument... If one only cares about a single element you could implement the easy/efficient way and the faithful mathematical function when that is what you want.
You can probably still get the short and fast behavior by calling itr[last(findmax(f, itr))] (assuming findmax is getting the two-args flavor.

I guess, but together with dims you end up with things like (N-1) dimensional Sets. We're well past the use case that I had in mind when I created this issue. I dunno, hopefully others will weigh in tomorrow!

I think we should really avoid splitting this into distinct functions indmax and argmax, as raised above (e.g. here). For one thing, what the heck would findmax do? Surely we don't want to have indfindmax and argfindmax.

I think @stevengj and I have both argued here that we can keep all the functionality under one roof without contradiction. The 1-arg version argmax(A) returns an index, i.e. an element of the implied domain. The 2-arg version argmax(f, itr) returns an x in itr, i.e. an element of the explicit domain. With this rule, it's clear that 1-arg and 2-arg findmax should also return an index and an explicit domain element too.

The question of whether to return one or all in case of ties is orthogonal. In the case of ties, one could just as easily want the set of all indices (the 1-arg case) as the set of all explicit domain elements (the 2-arg case). I still think the option to return all ties is only worth adding if people think there is a real-world use case that merits the extra complexity.

IIUC, this is what you propose:

julia> a = [-9, 1, 3, -6, -7];

julia> argmax(abs, a)
9

julia> argmax(i -> abs(a[i]), eachindex(a))
1

While this does make some sense, I'm pretty sure many people will do argmax(abs, a) when what they meant was argmax(i -> abs(a[i]), eachindex(a)). Also, doesn't maximum(abs, a) already do what you propose argmax(abs, a) should do?

No, I'm proposing that argmax(abs, a) should be -9, i.e. the element in a that maximizes abs.

Oh, yes, of course. So then it doesn't duplicate functionality of maximum.

I still think this would be nice trap for people to fall in (expecting the index instead of the element), but that doesn't necessarily mean we shouldn't do this.

Au contraire. The mathematical definition of argmax is the subset of the domain for which the mapping values are at the frontier (nothing about indices). The "proper" implementation would be

function argmax(f::Function, obj::Any)
    domain = Set(obj) # we consider the unique values
    mapping = [ (x, f(x)) for x ∈ domain ] # mapping of domain to range
    frontier = maximum(last, mapping) # maximum in the range
    output = Set(first(x) for x ∈ mapping if isequal(last(x), frontier)) # subset domain to solution
    return output
end

@drewrobson I disagree to having different concepts under the same function. Either we have keymax or something that returns the indices / keys or an argmax that returns the subdomain solution. If we want to have both, then I think these should be different functions. Having the 1-arg and 2-arg give different seems is contrary to the API used throughout the whole language where the 1-arg uses a default function (identity) and also would be extremely confusing.

As to whether people use the argmax function (the mathematical definition), at least in my field of economics a lot! Most optimization, agent-based modeling, and of course economic models use it all the time. If we gonna have only one of the implementations, I prefer to have the non-trivial one.

Yes, of course people use the argmax function. So do I, that's why I opened this issue! In my use case, I either assume the result is unique or I don't care which one I get, which is precisely the way people currently use indmax/argmax now. I really think "how many to return" is orthogonal to the original proposal, since it can be applied to the set of indices or any other domain. I also think to support the dims case, having a unique answer along the dimensions being reduced seems important, at least given how reducedim.jl works now (and for good reason).

Edit: Ok, I guess that "unique" answer can be a Set, but still I'm not sold on the practicality of this.

From an efficiency perspective, dealing with Set which are unique is more efficient than the current index approach which has to map every element. On the memory size, it might be less efficient since you need to allocate a key-value of length number of unique values. As for the dims, the Set approach could give a pretty substantial efficiency boost by doing the mapping for the whole domain first (cache) the result for the various dimensions.

But in general you still have to construct that Set:

domain = Set(obj) # the first step of your function above

Certainly, if obj is already a Set then this is a no-op, but then the usual argmax would also benefit from the uniqueness too.

In my experience, it is best to consider one change at a time! The original behavior of argmax(A::AbstractArray) and argmax(d::Dict) was to return just one maximizer. I think we should decide what a 2-arg form of argmax should do first (which is already controversial!), under the existing behavior of returning just one maximizer. If we can reach a concensus on that, then we can open a further can of worms by debating an additional change in the behavior of indmax, argmax, or whatever functions we end up with after the first round!

Regarding @martinholters and @ararslan's very reasonable points above, perhaps a 2-arg form of argmax(f, itr) really should do the usual Julian thing to avoid confusion / inconsistency.

As an alternative way to achieve the original proposal, perhaps we could say that a Generator is _the_ definitive way to represent a function on a domain, and hence argmax(g::Generator) becomes a reasonable extension of argmax(d::Dict) and we don't need any drastic changes.

Then one would just write argmax(f(x) for x in itr) to specify a custom domain.

As a summary, I think a sensible solution would be a PR for

function argmax(f::Function, domain::Any)
    T = eltype(domain)
    tested = Set{T}()
    output = Set{T}()
    frontier = nothing
    for elem ∈ domain
        if !(elem ∈ tested)
            push!(tested, elem)
            value = f(elem)
            if (frontier === nothing) || (value > frontier)
                frontier = value
                output = Set(elem)
            elseif value == frontier
                push!(output, elem)
            end
        end
    end
    return output
end
argmax(domain::Any) = argmax(identity, domain)

Might need to think a bit about the dimension aspect...

And as I said above, I think we should consider one change at a time.

My proposal for the original issue is simply this. This adds only one line to the language and appears to satisfy the original intent of this issue. The key idea is to find a way to express both a function and a domain in a single argument, to keep available the 2-arg form favored by @ararslan, @martinholters, and now by me as well.

It would be great if people could weigh in on this one change first. I'm wondering if @Nosferican's valid but orthogonal proposal to return a Set should be a separate issue, so it gets due consideration without distracting from the change being proposed here.

Edit: Admittedly, I would have to change the title of this issue for consistency since @Nosferican's proposal is a 2-arg argmax and mine no longer is! But I'd really like some feedback on whether people think my proposed 1-arg solution is a good way to keep the 2-arg version open for @ararslan's more Julian 2-arg function.

Sounds good. Do you want to open the other issue or should I?

  • Change argmax / argmin to take a domain::Generator (this issue) +1
  • Change argmax / argmin to return the element / elements of the domain rather than index (issue to open)

The first of those two issues already exists. It proposes that argmax(g) should return an element of the domain, if one can say that g.iter represents the domain of the function g.f.

That issue and this one are both arguing that argmax should in general return an element of the domain rather than an index, but I'm starting to think this issue is unnecessary since maybe having a 2-arg argmax for this is a bad idea (as initially pointed out by @ararslan).

I thought the specific change that you were looking to add on top of this is that argmax should return multiple elements, i.e. a Set. I thought maybe that specific aspect should be debated in a separate issue, but I'm not sure.

After thinking it over, my proposed change is so simple now that I might as well make a PR, and its relative merits can just be debated there. In that case, I think the current title of this issue supports @Nosferican's current proposal and I would encourage him to continue here if he likes! (and sorry for before, I'm new at this!)

Good luck on the PR for Generator support, I think that design if definitely superior to the current one. Don't forget about argmin.
I will go ahead and open a separate issue for the Set return so that can be discussed there. Will probably wait your PR to get merged and then make a PR for the efficient implementation of my proposed change.

Sketch definition I proposed over at https://github.com/JuliaLang/julia/issues/27639#issuecomment-399518134:

function argmax(f, itr)
    r = iterate(itr)
    r === nothing && error("empty collection")
    m, state = r
    f_m = f(m)
    while true
        r = iterate(itr, state)
        r === nothing && break
        x, state = r
        f_x = f(x)
        isless(f_m, f_x) || continue
        m, f_m = x, f_x
    end
    return m
end

I'm putting together a PR. As a preview, here are some thoughts:

1) I'm planning to follow the 1-arg design: The new argmax(f, itr) will be defined as findmax(f, itr)[2] and the new 2-arg implementation will actually go in findmax(f, itr).

2) @StefanKarpinski's sketch is nearly a duplicate of the existing _findmax in array.jl except one calls pairs(itr) and the other calls f on the iterates (the existing _findmax also handles NaN behavior, which we should keep of course). Maybe there's a design that avoids this code duplication (like #27612, but without violating abstractions?), but for now I'll just do the simple thing and define the new 2-arg function independently, as in the sketch.

3) Should the new 2-arg function support the dims kwarg? Looking at #22907 for hints, I think the answer is yes. This would be argmax(f, itr::AbstractArray; dims=:) to restrict it to collections with dims.

4) Ever since #22907, I'm not sure argmax and friends belong in array.jl anymore. Maybe they should be moved to reduce.jl? The versions with a dims kwarg are in reducedim.jl, so I think putting the dims = : versions in reduce.jl might make sense.

This might be easier to debate once there's a PR, but I just wanted to give an update.

Moving it to reduce seems like a good idea. Would be good to check that basing it on the findmax machinery is as efficient as we want, but otherwise that also seems good.

Is the PR still in progress or did this functionality end up somewhere else?

Very sorry, such a long delay for a simple PR but I ran out of time to work on this. Soonest I can take another look is probably a month from now, but if someone wants it sooner, feel free to jump in.

Bump. I ran into this

I've started on a PR for this. Should be ready pretty soon.

I'd like to make the definitions articulated by @tkf official and documented (these don't require changes to any of the existing methods, as I understand it):

  • argmax(f, [domain]) returns a point in the domain
  • maximum(f, [domain]) returns a point in the codomain
  • findmax(f, [domain]) returns a pair of a point in the codomain and a point in the domain

Where domain is provided it is permitted to be any iterable.

In cases where domain is omitted, f must be an object with an implicit domain. Indexable collections will be interpreted as a mapping from domain (keys) to codomain (values) and implemented with the pairs iterator (no implementation change).

If iterators gain a definition for pairs in the future, this still allows argmax, maximum and findmax to operate lazily.

This makes the 2-arg version of findmax and argmax different to all(?) other 2-arg functions (where f is implied to be identity), but I think that's okay if the definitions are given as above.

Questions

  1. The existing definitions of findmin and findmax state that "If any data element is NaN, this element is returned.", but they don't test for NaN with isnan, they just test if the element is equal to itself. They also don't return the first NaN but the last.

    Those are both strange results, imo. I feel like I should also propagate NaN here, but I wonder if it would be best to test with isnan(f_x) (and perhaps break on the first NaN instead of searching the rest (this could cause confusing behaviour if someone is relying on side effects from f, but... you probably shouldn't do that and the NaN should be a clue that something weird is happening)).

    • Should we use isnan?
    • Should we terminate early on NaN?
  2. The existing definitions use isless, which is only defined for types with total order. I feel like we should be able to use argmax with types that only have a partial order too, so I propose just using <.

    • Should we compare with < or isless?

WIP Implementation

"""
    findmax(f, domain) -> (f(x), x)

Returns a pair of a point in the codomain and a point in the domain such that `f(x)` is maximised. If there are multiple maximal points, then the first one will be returned.

Where `domain` is omitted, `f` must be an indexable collection and is interpreted as a mapping from domain (keys) to codomain (values) and implemented with the `pairs` iterator.

Where `domain` is provided it is permitted to be any iterable.

"""
function findmax(f, domain)
    y = iterate(domain)
    if y === nothing
        throw(ArgumentError("collection must be non-empty"))
    end
    m, s = y
    f_m = f(m)
    while true
        y = iterate(domain, s)
        y === nothing && break
        x, s = y
        f_x = f(x)
        if isnan(f_x) || isless(f_m, f_x) 
            f_m, m = f_x, x
        end
    end
    return (f_m, m)
end

argmax(f, domain) = findmax(f, domain)[2]

I would not use isnan, because it's less generic. Only a few numeric types implement it.

The other questions are interesting but we should consider them separately since they would involve changing existing behavior.

findmax currently has:

    while true
        y = iterate(p, s)
        y === nothing && break
        m != m && break
        (i, ai), s = y
        if ai != ai || isless(m, ai)
            m = ai
            mi = i
        end
    end

which looks like it stops on the first nan due to m != m && break ?

I think all the behaviors we want for NaN and partial orders would come out of returning for argmax, maximum, findmax, etc. the first value that is not less than any other element. I.e. the core check for maxima would be

if max_value < value
    max_value = value
end

Similarly, for minima it would be

if min_value > value
    min_value = value
end

This has the appropriate poisoning behavior for NaNs:

julia> value = 0.0
0.0

julia> max_value = NaN
NaN

julia> if max_value < value
           max_value = value
       end

julia> max_value
NaN

julia> min_value = NaN
NaN

julia> if value < min_value
           min_value = value
       end

julia> min_value
NaN

No special logic for NaN handling is required—the property that NaN is neither < nor > than other values is sufficient to get the right behavior. This also works correctly for collections of partially ordered values like sets. For example, using this definition the maximal and minimal elements of

[Set([1,2,3]), Set([2]), Set([1,3]), Set([2,3,4]), Set([1])]
  • maxima: Set([1,2,3])
  • minima: Set([2])

Note that NaN handling is just a special case of non-total ordering.

findmax uses isless instead of <. So I suppose it doesn't matter if < is non-total? I thought NaN handling is rather about optimization (early termination).

BTW, it would be nice if explicit NaN handling can be removed from the _specification_ of findmax etc. We can still have early termination as an optimization by using eltype (for findmax(iter)). Given how isless is specified, it is strange that findmax([NaN, missing]) returns (NaN, 1).

Right, that definition doesn't require totality. Bailing early on a known extreme value is doable but I'm not sure if it's worth it, personally. I don't get NaNs often enough to need to bail early on them.

Thanks Stefan, as discussed on triage I'm going to try that out and submit a PR. It's a neat solution, I think :)

@tkf

The docs for < says that it can be partial. Nobody on the triage call seemed to care about early termination, so I think it's fine.

Here's my current best-stab at a docstring for findmax, slightly revised from last time to give people not familiar with the language a better chance of understanding it. I've tried to remove NaN from the definition, too.

````
"""
findmax(f, domain) -> (f(x), x)
findmax(f)

Returns a pair of a value in the codomain (outputs of f) and the corresponding
value in the domain (inputs to f) such that f(x) is maximised. If there
are multiple maximal points, then the first one will be returned.

When domain is provided it may be any iterable and must not be empty.

When domain is omitted, f must have an implicit domain. In particular, if
f is an indexable collection, it is interpreted as a function mapping keys
(domain) to values (codomain), i.e. findmax(itr) returns the maximum element
of the collection itr and its index.

If the ordering of the codomain is partial, then maximal means "is not less
than any other value in the codomain".

Examples

julia> findmax(identity, 5:9)
(9, 4)

julia> findmax(-, 1:10)
(-1, 1)

julia> findmax(cos, 0:Ï€/2:2Ï€)
(1.0, 0.0)

julia> findmax([8,0.1,-9,pi])
(8.0, 1)

julia> findmax([1,7,7,6])
(7, 2)

julia> findmax([1,7,7,NaN])
(NaN, 4)

"""
function findmax(f, domain)
y = iterate(domain)
if y === nothing
throw(ArgumentError("domain must not be empty"))
end
m, s = y
f_m = f(m)
while true
y = iterate(domain, s)
y === nothing && break
x, s = y
f_x = f(x)
if f_m < f_x
f_m, m = f_x, x
end
end
return (f_m, m)
end
````

The docs for < says that it can be partial.

Yes, that's why I'm against using <. Note that max and hence maximum uses isless:

https://github.com/JuliaLang/julia/blob/d33c5a5eaedb2b014125cc006ddd6aca7f207cd4/base/operators.jl#L408

https://github.com/JuliaLang/julia/blob/d33c5a5eaedb2b014125cc006ddd6aca7f207cd4/base/reduce.jl#L615

The documentation of findmax seems to emphasize the consistency with max:

https://github.com/JuliaLang/julia/blob/d33c5a5eaedb2b014125cc006ddd6aca7f207cd4/base/array.jl#L2094-L2099

It would be very confusing if only findmax(f, domain) uses <. Furthermore, it throws if f returns missing.

If you are not going to implement early termination, I suggest using mapfoldl:

_rf_findmax((x, fx), (y, fy)) = isless(fx, fy) ? (y, fy) : (x, fx)
findmax′(f, domain) = mapfoldl(x -> (x, f(x)), _rf_findmax, domain)

This uses transducers to fuse iterator comprehensions with the reduction for cases like findmax′(row -> row.y, (row for row in table if row.x > 0)).

Thanks for your comments!

I'm not confident I understand your objection to partial ordering, is it just for consistency with max and maximum?

< returning missing is something I forgot could happen, that's pretty annoying...

Ah, are you against < because it is a partial function or because it is for partial orders?

And, yup, the existing definition of findmax breaks on missing, too. I guess the least wrong thing to do is to treat missing the same way NaN is treated, but it's a bit rubbish to be putting these special cases in (because I can't just rely on the isless behaviour because I need missing to poison both min and max and isless sorts missing as greater than everything else).

Hm... I don't understand why partial ordering or not is relevant at all. Existing functions like findmax(itr), maximum([f], itr), max etc. all use isless. Using isless for findmax(f, itr) seems to be very natural even just in terms of consistency.

My bad, the problem is findmin. It also has to return NaN...

So, making findmin compatible with min and minimum is tricky, especially if you don't want to refer to NaN and missing explicitly. I suggest something like

function _rf_findmin((x, fx), (y, fy))
    xisnan = fx != fx
    xisnan isa Bool || return (x, fx)  # fx is missing
    yisnan = fy != fy
    yisnan isa Bool || return (y, fy)  # fy is missing
    xisnan && return (x, fx)
    yisnan && return (y, fy)
    return isless(fy, fx) ? (y, fy) : (x, fx)
end
findmin′(f, domain) = mapfoldl(x -> (x, f(x)), _rf_findmin, domain)
findmin′(f) = findmin′(last, pairs(f))[1]

I think it's probably best to test explicitly for missing, I just don't
want to deal with it at all ;)

I'll try to come up with something that works and makes me reasonably happy.

Cool, so the issue with < is just consistency with the other functions. I
kinda think the other functions should also use <, but I don't care all
that much.

All my genuine use cases are solved by isless, the < version is just more
elegant (without missing, anyway) and feels more correct, but I won't get
hung up on that.

On Fri, 20 Mar 2020, 02:42 Takafumi Arakaki, notifications@github.com
wrote:

So, making findmin compatible with min and minimum is tricky, especially
if you don't want to refer to NaN and missing explicitly. I suggest
something like

function _rf_findmin((x, fx), (y, fy))

xisnan = fx != fx

xisnan isa Bool || return (x, fx)  # fx is missing

yisnan = fy != fy

yisnan isa Bool || return (y, fy)  # fy is missing

xisnan && return (x, fx)

yisnan && return (y, fy)

return isless(fy, fx) ? (y, fy) : (x, fx)

end
findmin′(f, domain) = mapfoldl(x -> (x, f(x)), _rf_findmin, domain)
findmin′(domain) = findmin′(last, pairs(domain))[1]

—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub
https://github.com/JuliaLang/julia/issues/27613#issuecomment-601502228,
or unsubscribe
https://github.com/notifications/unsubscribe-auth/ABNZA6M2VGLKCFA3PTZSMSTRILJZTANCNFSM4FFKFUNQ
.

The problem with the explicit approach is that it does not let people define missing-like objects. Base handles it explicitly everywhere, though. See: https://github.com/JuliaLang/julia/pull/34809#issuecomment-588311347

Yes, these functions do currently use isless but the behavior of maximium and minimum on NaN suggests that they shouldn't be and that using < would actually be more consistent. missing probably has to be handled explicitly—hard to say what to do with missing values.

Perhaps a case where people should just explicitly use skipmissing if they need to deal with missing values?

Using isless simplifies a lot for supporting NaN and missing for maximum-like reductions. As we need to handle missing anyway, I don't think < simplifies minimum-like reductions.

Ultimately, I think it'd be better to create a function (say) isgreater and move the complicated logic of min to it. If we use isgreater from min and findmin, the consistency is automatic.

missing probably has to be handled explicitly

Shouldn't we aim for "implicit" handling? It's consistent with the policy to be reluctant to add missing overload for Base and stdlib functions.

Perhaps a case where people should just explicitly use skipmissing if they need to deal with missing values?

We already have minimum([-Inf, NaN, missing]) === missing. So I don't think we can change how it's handled?

@StefanKarpinski Thinking about this some more, I agree with @tkf that < doesn't give what we want on its own.

We want NaN to poison, but it just doesn't dominate all other values (NaN is not less than any other value, but other values aren't less than NaN either), so they will not be consistently chosen by any rule of the type "not less than any other value".

I also belatedly realised that floats are not partially ordered because ≤ isn't reflexive for NaN. Woops.

Here's a quick worked example of why < on its own isn't sufficient:

# In findmax somewhere:

if fm < fx
    fm, m = fx, x
end
# true if fm is less than fx
#       good
# false if fm >= fx
#       good
# false if fm is NaN
#       good
# false if fx is NaN
#       not good: we won't switch from a non NaN to a NaN

The existing min and max functions all define a preference order for poison values. I think it's a shame that they mostly use isless for that because it only works nicely in one direction and you don't need a total ordering to have poisoning happen, you only need to define an ordering of the poisons relative to normal values and, if you care about which poison you receive, an ordering for the poisons.

I think rather than introduce an isgreater function, it might be better to define poisoning lessthan and greaterthan operations (or a poisoning order HOF).

is_unorderable(x) = !isa(x ≤ x, Bool) || !(x ≤ x)

poison_lt(a, b) = is_unorderable(a) || is_unorderable(b) ? isless(a, b) : a < b
poison_gt(a, b) = is_unorderable(a) || is_unorderable(b) ? isless(a, b) : a > b

# or a higher order function, whatever
poisoning_order(op) = (a, b) -> is_unorderable(a) || is_unorderable(b) ? isless(a, b) : op(a, b)

That way, partial ordering is permitted and we don't introduce a new canonical total ordering into Julia. We could then use these in all of the min and max functions.

If that's too big a change or otherwise undesirable, then I think we should define isgreater. If we allow ourselves to order normally unorderable values in an arbitrary but consistent fashion (as with isless) and put missing last, then I think this definition is fine and satisfies our need for all the weird values to end up at the end.

isgreater(a, b) = is_unorderable(a) || is_unorderable(b) ? isless(a, b) : isless(b, a)

Anyway, depending on what people are willing to accept, the implementation is either this:

findmin(f, domain) = mapfoldl(x -> (f(x), x), _rf_findmin, domain)
_rf_findmin((fm, m), (fx, x)) = !poison_gt(fm, fx) ? (fm, m) : (fx, x)

findmax(f, domain) = mapfoldl(x -> (f(x), x), _rf_findmax, domain)
_rf_findmax((fm, m), (fx, x)) = !poison_lt(fm, fx) ? (fm, m) : (fx, x)

or this:

findmin(f, domain) = mapfoldl(x -> (f(x), x), _rf_findmin, domain)
_rf_findmin((fm, m), (fx, x)) = isless(fm, fx) ? (fm, m) : (fx, x)

findmax(f, domain) = mapfoldl(x -> (f(x), x), _rf_findmax, domain)
_rf_findmax((fm, m), (fx, x)) = isgreater(fm, fx) ? (fm, m) : (fx, x)

Edit: these definitions of findmax return identical results except for types like Sets, for those the first definitions find the first value that is not smaller than all other results. The definition with isgreater just gives an error.

OK, I agree that there is a way to use < (so the discussion on partial/total ordering matters). But why prefer partial ordering? I think important questions are:

  1. Do we want maximum, argmax, and findmax to be consistent?
  2. Do we want to return the first maximal element from maximum? (It does not seem to be consistent with the definition in Greatest and least elements - Wikipedia.)

I see that findmax docstring is mentioning that it returns the first maximal element. However, there is no multiple maximal element support because it uses isless in the implementation. (OK, this is technically true if you run findmax on floats and check the identity with ===. But other than this case, I don't think partial ordering is supported.)

  1. Do we want maximum, argmax, and findmax to be consistent?

I vote yes. If we support partial order in one, we should support it in all of the others. So, perhaps that is beyond the scope of this issue and I should just submit a PR that adds the 2-arg versions of arg* and find* and another that changes all of the min/max functions to support types with partial orders.

  1. Do we want to return the first maximal element from maximum? (It does not seem to be consistent with the definition in Greatest and least elements - Wikipedia.)

From that wikipedia article:

The greatest element of a partially ordered subset must not be confused with maximal elements of the set, which are elements that are not smaller than any other elements. A set can have several maximal elements without having a greatest element. However, if it has a greatest element, it can't have any other maximal element.

I intended to follow that definition of maximal and I think I did?

Why prefer partial ordering?

Supporting partial orders makes no difference to the existing supported uses (types with total orderings) and provides new functionality that seems reasonable for types that only have partial orderings.

I don't have a use case for this, though. Maybe it's something that nobody actually needs.

I think the partial vs total order thing comes down to this, do you think this should throw an error? If not, what do you think it should return?

min(Set(1), Set(2), Set(1:3))

If it should return something, then so too should all the other functions we're talking about.

I intended to follow that definition of maximal and I think I did?

Yes, but I think _that's_ the problem; i.e., maximum returns a maximal.

In other words, a maximal is not (necessary) the maximum in the definition mentioned in Wikipedia (which I think is pretty standard). So, if you take the definition literally, returning a maximal from a function named _maximum_ is confusing.

Sorry for the misunderstanding! I wondered if we should define a maximal function, but I thought it might become confusing.

In any case, if there is a maximum or greatest element, then it is also the only maximal. If there are multiple maximals, then we can (and already have, for findmax) just say that we'll give the first one.

I think a line like _"If there are multiple maximal elements, then the first one is returned"_ is pretty clear and better than these functions just erroring on partially ordered types, but I can see your perspective too: we are no longer guaranteeing that the element is the maximum element (though it sort of wasn't before because of NaN and missing anyway).

My intention is to submit a PR that adds the 2-arg variants of find* and arg* and ask for triage on whether these functions should error or return the first maximal for types that can only be partially ordered. Does that sound good to you?

I think a line like _"If there are multiple maximal elements, then the first one is returned"_ is pretty clear and better than these functions just erroring on partially ordered types

Yes, I can see that a function like this may be useful sometimes. At the same time, I think having a value guaranteed to be the maximum is also useful, too (e.g., using it as a sentinel value).

it sort of wasn't before because of NaN and missing anyway

Current behavior of findmax for NaN is the correct maximum w.r.t isless. This is why it is so easy to implement findmax when using isless. This is not the case for findmin, though (which is why I mentioned isgreater).

By the way, another problem for <-based approach is:

julia> maximum([-0.0, 0.0])
0.0

julia> foldl([-0.0, 0.0]) do a, b
           a < b ? b : a
       end
-0.0

Current behavior of findmax for NaN is the correct maximum w.r.t isless. This is why it is so easy to implement findmax when using isless. This is not the case for findmin, though (which is why I mentioned isgreater).

isless doesn't guarantee in it's documentation that NaN will be the maximum, or that unorderable elements are bigger than orderable elements, those are extra guarantees that are specified for some of the min/max functions. We could just rewrite the docs to guarantee that, though.

Maybe the original sin here is that I believed the docstring for <. That says < implements a partial order, but that's just not true (1. It's not reflexive even for integers; 2. <= is a partial order for lots of stuff, but not for Floats because of NaN).

I think that that probably merits a documentation change, and, if we wanted to take partial orders seriously then there should probably be a new function for the default partial order relation on a type (and we'd need to decide what it should do for unorderables and missings: error or put them at the end).

[float zeroes are a problem]

Thanks for pointing that out.-0.0 == 0.0, so that's fine in a partial order, but it's plausible that someone is relying on the sign of the result and that would make this a breaking change, contrary to my statement above :( It would be fairly easy to fix this case, but I'm not sure that a partial-ish order is a very good idea.

I basically just need to finalise the docstrings for this and check that isgreater isn't hilariously wrong and I'll submit the PR for review, probably tomorrow. I'll be editing the docstrings of isless, isgreater, findmax/min, argmax/min, <, and _maybe_, but probably not minimum, min, maximum, max.

We want NaN to poison, but it just doesn't dominate all other values (NaN is not less than any other value, but other values aren't less than NaN either), so they will not be consistently chosen by any rule of the type "not less than any other value".

I haven't read all the rest here, but that's exactly why the condition to update the max is that the max is _less_ than the new value: because NaN < 0.0 and NaN > 0.0 are both false. I demonstrated the correct NaN poisoning behavior already with that logic. I also demonstrated the correct behavior with partially ordered types like sets.

The issue is that if the current max is say 3.0, then we won't switch to NaN because 3.0 < NaN is false.

If floats were partially ordered it would be fine, but they're not, NaNs are unordered and will only be picked as the max if they're the first item in the iterable.

isless doesn't guarantee in it's documentation that NaN will be the maximum, or that unorderable elements are bigger than orderable elements, those are extra guarantees that are specified for some of the min/max functions.

That's a fair point. isless on concrete floats are defined in Base so I'd say it's OK to exploit the implementation details inside Base. But users can define their own AbstractFloat subtypes. So I agree it's better to document ordering of NaN for floats w.r.t isless. It's documented for missing so I think it's reasonable.

[float zeroes are a problem]

Thanks for pointing that out.-0.0 == 0.0, so that's fine in a partial order, but it's plausible that someone is relying on the sign of the result and that would make this a breaking change

Yeah, it's true that this behavior is not in the docstring but it doesn't mean changing it is OK. (In a way "true" SemVer is a fantasy...)

It is already documented that isequal(-0.0, 0.0) is false and isequal and isless are compatible. So, documenting isless(-0.0, 0.0) === !isless(0.0, -0.0) === true seems to not hurt anything. Once it is documented that maximum-like reductions are using isless as-is, I suppose there is no ambiguity in the specification any more for Union{T,Missing} where T is one of concrete AbstractFloat subtypes defined in Base?

I think it's defined somewhere or other that negative zero isless positive zero. If I can't find where then I'll add it to the docstring in implementer notes or something.

I finally submitted that PR. It's here: #35316. :tada:

Was this page helpful?
0 / 5 - 0 ratings

Related issues

Keno picture Keno  Â·  3Comments

helgee picture helgee  Â·  3Comments

yurivish picture yurivish  Â·  3Comments

tkoolen picture tkoolen  Â·  3Comments

iamed2 picture iamed2  Â·  3Comments