The definition of argmax
from Wikipedia,
Given a collection of elements (domain), which map to an ordered set (range), what is the subset containing those elements in the domain whose image is the frontier.
Currently, argmax
is actually indmax
which is defined for indexed collections and returns the index of the first encountered element whose image is equal to the frontier. The generalization of keymax
(discussed, but not implemented) extends the supported collections from indexed to pairs
-implemented collections. This is the result of renaming indmax
to argmax
. A current proposal is for argmax
to be defined for Generator
(#27613),
argmax(f(x) for x ∈ domain)
with that change, we could fix the implementation with something like
import Base: Generator, argmax
function argmax(f::Function, domain::Generator)
T = eltype(domain)
tested = Set{T}()
output = Set{T}()
frontier = nothing
for elem ∈ domain
if !(elem ∈ tested)
push!(tested, elem)
value = f(elem)
if (frontier === nothing) || (value > frontier)
frontier = value
output = Set(elem)
elseif value == frontier
push!(output, elem)
end
end
end
return output
end
and call it with the following syntax
argmax(abs2, elem for elem ∈ -3:3) == Set([-3, 3])
That definition and implementation don't depend on Generator
; any iterable would work.
I agree this is useful but I don't think argmax
should return a Set by default; it's quite inefficient in the common case where the answer is e.g. a single integer. This could be a new function like argmaxall
or argmaxes
.
The answer would have form of some collection of eltype(domain)
. Maybe if the length of the container is one, it could return the value. I used Set
in the implementation to guarantee uniqueness (no need to compute the value for an element which we already considered, but overkill since I am checking for it in the loop). Other ways could call unique
on the domain as first step (but why do an extra pass?). There shouldn't be any efficiency loses from returning the full set rather than the first since the implementation only passes through each once (no ability to break early since it doesn't do two passes). output
could be set to Vector{eltype(domain)}
if that is more convenient (bit extra information of order which shouldn't matter, but why not). Since output
is guarantee to be non-empty, people could call first
on it if it is a <:AbstractVector
. Any iterable should work.
We should probably resurrect indmax
and implement
argmax(f::Function, itr::Any, unique::Bool=false)
separately.
What is the unique
flag meant to do?
The mathematical definition defines the solution as a set, but for computational reasons it might be best to provide a Vector{eltype(domain)}
. In certain applications, the solution is guaranteed to be a single element which would return a Vector{eltype(domain)}
of length one. For convenience we could offer the unique
flag to perform a first(output)
to return a eltype(domain)
. I am not married to it, as users could just call first
on it directly (also an advantage of using <:AbstractVector
over <:Dict
).
@nalimilan, thanks for linking those issues (I had hunting for those in my task list for today). I would like to know your thoughts on the issue. I concur indmax
was not the best name and changing it would be beneficial, but I believe argmax
was not the right name as it represents a different concept/operation. My ideal solution would be to offer the current behavior of argmax
renamed with something better and use argmax
for the mathematical concept described here (that would also be a great to expedite and simplify the work on a PR to implement it).
Well, if we wanted to be fully consistent argmax
would have to be renamed to findfirstmax
/findmaxfirst
, and then the function returning the indices of all values giving the maximum would be findallmax
/findmaxall
. In practice these names aren't too appealing and argmax
is a well-known concept, so I'd rather keep the current behavior of argmax
, but (as @JeffBezanson said) provide another function or an argument to get a vector of indices rather than just the first index.
Something I would really have liked to change is renaming findmax
to something else for consistency with other find*
functions (https://github.com/JuliaLang/julia/issues/24865).
On the decision of returning the indices or subset of domain,
R uses which.min
and which.max
with a keyword for first or every index.
Python uses values.index(max(values))
similar to findall(isequal(maximum(obj)), obj)
Matlab uses max
with the 2-nargout option (similar to last(findmax(obj))
The only implementation of this functionality using argmax
as the name is numpy
.
My number one concern is to avoid having a misnamed concept similarly to the numpy
case. I believe it would be nice to have an efficient and convenient function for the argmax
concept and one for the indices (especially one which doesn't require materializing the whole image), but not to have the one returning the indices named argmax
.
The mathematical meaning of argmax(f, itr)
is clear: find an element of itr
which maximizes f
. Having argmax(v)
return indices seems wrong if you think of f = identity
and v = itr
. However, it makes perfect sense if you think of f
as v
—or more precisely f = i -> v[i]
and itr = keys(v)
. As long as we explain that, it does not seem like there's anything wrong with using the name argmax
for that quite useful operation. Moreover, the alternative possible meaning of argmax(v)
as argmax(identity, v)
is silly because maximum(v)
already does that.
The only question in my mind is whether it would be better to leave argmax(v)
undefined and have a separate indmax(f, v)
function which returns an index in v
at which the function f
is maximized and then have indmax(v) = indmax(identity, v)
which is what our argmax
currently does (not coincidentally, it was previously called indmax
). But I can't see much need for indmax(f, v)
since we can just write argmax(i->f(v[i]), keys(v))
or if some version of https://github.com/JuliaLang/julia/pull/24990 gets merged, argmax(f∘v[_], keys(v))
which doesn't seem so bad in cases that you want indices instead of values.
Bottom line: if you're interested in indices, you can always use the indices as your collection and make indexing part of the function you're maximizing. You can kind of go in the other direction since you can always use an index to extract an element back out of an indexed collection, but if we're choosing between indmax
and argmax
, argmax
wins since it makes sense on any iterable, not just indexable collections. So having argmax
without indmax
is fine but having indmax
without argmax
is not—you still have no way of doing what argmax(f, itr)
does in general. Of course, we don't currently have that method, but we should.
Good points. The edge cases seem to suggest the best design would be a version of
argmax(f::Function, itr::Any; first::Bool = false)
as it seems the most versatile and useful.
Is that first
keyword intended to control whether to get only the first maximizing value or all of the? If so, I'd make a few comments: I do not think that returning all maximizing elements is more standard or more useful so it should not be the default. Sure, the formal definition of argmax is that it returns a set but if you look at typical usages in math writing this is usually glossed over and it is assumed that argmax gives you a single maximizing value.
I am perfectly happy with having that option default to a single value (i.e., first::Bool = true
), especially if argmax
will be used for querying indices/keys. Another option would be to default to return a Vector{eltype(domain)}
and then people can call first(argmax(f, domain))
as it will always be a non-empty <:AbstractVector
. Either solution seems good.
Either solution seems good.
Doing first(argmax(f, domain))
is wildly inefficient.
Using the implementation at the start of this of this post it wouldn't be costly at all. However, I now think having the keyword argument would be best since the implementation could be adapted to only update upon finding a new frontier as opposed to expanding the solution set when new images are at the frontier (i.e., the efficiency gains would come from knowing the desired output at the start). With a bit of guidance I could prep a PR for these changes after deciding on the specifics.
Using the implementation at the start of this of this post it wouldn't be costly at all.
I think we have different notions of what "costly" means. Your implementation allocates two Set
objects (why two? only one seems necessary). The scalar version of argmax(f, itr)
shouldn't allocate anything at all.
Here's a sketch implementation of the two-arg argmax(f, itr)
:
function argmax(f, itr)
r = iterate(itr)
r === nothing && error("empty collection")
m, state = r
f_m = f(m)
while true
r = iterate(itr, state)
r === nothing && break
x, state = r
f_x = f(x)
isless(f_m, f_x) || continue
m, f_m = x, f_x
end
return m
end
The iterable itr
is iterated only once and f
is applied to the yielded values exactly once per item. It allocates nothing unless f
or the iteration of itr
does.
The reason for the two collections are different: (1) keeps track of whether the image is known (it skips the computation of f
on elements it has tested which is best suited for costly mapping functions or a high number of duplicated elements in the domain) and (2) it would only allocate it when doing the full subset, for the unique solution one it wouldn't allocate a collection and just keep track of the frontier and best guess.
I originally used Set
, but since I keep track of duplicated it can just be a Vector
.
Set
versus Vector
doesn't much matter, they're both big, expensive data structures.
We could just ask the users to pass argmax(f, unique(domain))
, if it is a costly mapping or many duplicates, no?
unique
is an expensive allocating operation itself, so that doesn't help. Perhaps you're missing something about what I'm saying: I'm not worried about the number of elements in in the collection that's returned, my problem is with returning a collection at all. Especially given that the scalar version of this that people generally actually want can be implemented (as above), in a completely efficient generic way that does no allocation whatsoever. For example, if I have a definition like this:
g(a, b) = argmax(x -> -abs(x-3), min(a,b):max(a,b))
then the native code for a call like g(-10, 10)
is really efficient:
cmpq %rdi, %rsi
movq %rdi, %r8
cmovleq %rsi, %r8
cmovlq %rdi, %rsi
cmpq %rsi, %r8
jne L23
movq %r8, %rax
retq
L23:
leaq -3(%r8), %rax
movl $3, %ecx
subq %r8, %rcx
testq %rax, %rax
cmovnsq %rax, %rcx
negq %rcx
movq %r8, %rax
L48:
movl $2, %edi
subq %r8, %rdi
leaq -2(%r8), %r9
leaq 1(%r8), %rdx
testq %r9, %r9
cmovnsq %r9, %rdi
negq %rdi
cmpq %rdi, %rcx
cmovlq %rdx, %rax
cmovlq %rdi, %rcx
movq %rdx, %r8
cmpq %rsi, %rdx
jne L48
retq
It's all just simple integer instructions, comparisons and jumps. If there was any allocation whatsoever in the definition of argmax
then this would be totally different and not so incredibly simple and fast.
I think we can close this after discussion at https://github.com/JuliaLang/julia/issues/27612#issuecomment-399215782 which leaves us in the situation that the only actionable change is adding the two-argument argmax
method, for which there's already a specific issue open: https://github.com/JuliaLang/julia/issues/27613.
The mathematical meaning of argmax(f, itr) is clear: find an element of itr which maximizes f. Having argmax(v) return indices seems wrong if you think of f = identity and v = itr. However, it makes perfect sense if you think of f as v—or more precisely f = i -> v[i] and itr = keys(v). As long as we explain that, it does not seem like there's anything wrong with using the name argmax for that quite useful operation. Moreover, the alternative possible meaning of argmax(v) as argmax(identity, v) is silly because maximum(v) already does that.
Technically there's no ambiguity, but am I the only one to find it weird that argmax(identity, x)
would be completely different from argmax(x)
? None of our reduction functions does that AFAICT.
I'm starting to regret having renamed indmax
to argmax
now that we want to add that two-argument method with an inconsistent meaning...
It's a little unexpected but having argmax(v)
mean argmax(i -> v[i], keys(v))
seems to be intuitive to many people as well. As I argued above, indmax
isn't really necessary as a function at all.
I, too, find it weird. Also, there is no f
such that argmax(f, x) == argmax(x)
, i.e. f
cannot be seen as an optional argument in non-final position here. Doesn't mean it's no-go, but it's unusual and probably would give me a pause if I came across it without being aware of the background in this issue and related PRs.
I concur with @nalimilan here. I find argmax(v)
as a shortcut for argmax(i -> v[i], keys(v))
really weird, it seems a pattern that I haven't seen anywhere else in base. On the flipside, the pattern foo(v)==foo(identity, v)
seems really common, so that is certainly what I would have expected.
I think fundamentally things would be easier if argmax
always returned an element out of the original source, and never an index. Getting and index seems kind of surprising to me, and as @nalimilan pointed out also not something often seen in other languages.
A separate issue seems to be whether it should return a single value, a list or a set. On that I concur with @StefanKarpinski: a single element seems the most common and useful case.
Most helpful comment
Here's a sketch implementation of the two-arg
argmax(f, itr)
:The iterable
itr
is iterated only once andf
is applied to the yielded values exactly once per item. It allocates nothing unlessf
or the iteration ofitr
does.