Julia: maximum on dictionary returns max key instead of max val

Created on 14 Jan 2016  Â·  10Comments  Â·  Source: JuliaLang/julia

I don't really know how this works in Matlab, but I find the current situation extremely confusing and error prone:

d = Dict{Int,Int}(1=>2,3=>6,5=>10,7=>8)
maximum(d)    # gives pair 7,8 when I was expecting 5,10

finding the maximum of the keys is easy with maximum(keys(d)). Finding the key with the maximum value on the other hand currently requires writing a small function. Unless I'm missing something of course :)

So, is this behaviour intentional?

collections

Most helpful comment

For the specific case of finding the key with the largest value, it seems findmax and indmax should work (but currently don't, nor do they work for e.g. OffsetArrays).

All 10 comments

here's a possible function that does what I would like the default maximum to do on a dict:

function mymax{K,V}(d::Dict{K,V})
    @assert length(d) > 0
    maxfound = false
    maxkey = zero(K)
    maxval = zero(V)
    for (k,v) in d 
        if !maxfound
            maxfound = true
            maxkey = k
            maxval = d[k]
        elseif d[k] > maxval
            maxkey = k
            maxval = d[k]
        end
    end
    Pair(maxkey,maxval)
end

That won't give the right behavior if the dict only contains negative values. nevermind sorry

Pairs are ordered lexicographically, but reductions on Associative could maybe be special-cased. For now you can use something like mapfoldl(Base.IdFun(), (x,y) -> x.second > y.second ? x : y, d)

This is a decent argument for making pairs compare their RHS first and LHS second, since the primary application is to represent the data in an associative mapping and getting the maximum key is, as you point out, simple, whereas there's no obvious way to get the key associated with the maximum value.

Trying this out seems pretty decent:

julia> Base.isless(p::Pair, q::Pair) =
           ifelse(!isequal(p.second,q.second),
               isless(p.second,q.second),
               isless(p.first,q.first)
           )
isless (generic function with 44 methods)

julia> d = Dict{Int,Int}(1=>2,3=>6,5=>10,7=>8)
Dict(7=>8,3=>6,5=>10,1=>2)

julia> maximum(values(d))
10

julia> maximum(keys(d))
7

julia> maximum(d)
5=>10

julia> maximum(d)[1]
5

It's kind of an unintuitive ordering though, and getting the key for the maximum value this way does have the side effect that it compares keys as well as values, which you don't necessarily care about – you just want some key that maps to the maximum value, and may not even care about the keys being comparable. Of course, in that case, you can just find all the keys equal to the maximum value.

It's kind of an unintuitive ordering though, and getting the key for the maximum value this way does have the side effect that it compares keys as well as values, which you don't necessarily care about – you just want some key that maps to the maximum value, and may not even care about the keys being comparable. Of course, in that case, you can just find all the keys equal to the maximum value.

Indeed, it doesn't look like this ordering for pairs is really worth it. It would make more sense to special-case Associative as @tkelman suggested.

I just ran into this as well, from the findmax direction.

julia> d = Dict{Int,Int}(1=>2,3=>6,5=>10,7=>8)
Dict{Int64,Int64} with 4 entries:
  7 => 8
  3 => 6
  5 => 10
  1 => 2

julia> findmax(d)
(7=>8,1)

Not only does it find the largest key, which is of questionable value, but it also returns the index with respect to an arbitrary internal order, which seems completely useless.

I'm in favor of specialcasing reductions on associatives, focusing on values rather than keys.

bump. i agree it's a little weird that findmax checks keys instead of values

I've been thinking about Dict iteration a bunch lately, and how dealing with Pairs is generally difficult. See:

5794

17886

https://github.com/JuliaLang/julia/issues/20402#issuecomment-313161433
https://github.com/JuliaLang/julia/pull/22194#issuecomment-315630766

For the specific case of finding the key with the largest value, it seems findmax and indmax should work (but currently don't, nor do they work for e.g. OffsetArrays).

Part of #20402

findmax and indmax now do what the OP requests. If you want the maximum value, maximum(values(d)) works. From there, I think it's simplest for maximum just to operate on iterators like it does now, so the behavior with pairs is strange but it's what falls out. I think the only possible further change from there is to change how Dicts iterate.

Was this page helpful?
0 / 5 - 0 ratings

Related issues

omus picture omus  Â·  3Comments

tkoolen picture tkoolen  Â·  3Comments

Keno picture Keno  Â·  3Comments

yurivish picture yurivish  Â·  3Comments

musm picture musm  Â·  3Comments