Julia: map for Dict

Created on 13 Feb 2014  Â·  35Comments  Â·  Source: JuliaLang/julia

I noticed there wasn't any map methods that generated Dicts. Which function/name - comments please:

map{K,V}(f::Callable, d::Associative{K,V}) = [ f(k,v) for (k,v)=d ]
map{K,V}(f::Callable, d::Associative{K,V}) = [ (k, f(k,v)) for (k,v)=d ] # or this?
map{K,V}(f::Callable, d::Associative{K,V}) = [ (k, f(v)) for (k,v)=d ] # ..or this?

To distinguish the above from the current map (which you still may want to use), they will need new names - suggestions?

I noticed that map for Arrays handles multiple arrays - presumably this is more of a join-by-key semantic for Dicts and therefore not needed in map for Dicts?

collections

Most helpful comment

If we wanted map to work on values then we do map(f, values(dict))

If we wanted a map to work on pairs then we should be able to do map(f, pairs(dict)) — oh but wait, that just returns a Dict (because dicts _do_ iterate pairs) and still doesn't work!

IMO, we'd ideally just iterate/map/etc over values, and require pairs to do the for (k, v) in thing, but there's no way we're changing iteration in 1.x. Much of this discussion predates pairs.

So now we're stuck in this weird never-never land where map is available to do what I'd want it to do, but iteration isn't doing what I want. So do we just wait for 2.0 to unify everything?

All 35 comments

Currently, map for dictionary arguments falls back to the general iterable case. I'm not sure I see a difference between the status quo and what you are looking for. Does this not work the way you'd like it to?

julia> f(x) = x;
julia> @which map(f,[1=>"1",2=>"2"])
map(f::Union(Function,DataType),iters...) at abstractarray.jl:1265

julia> map(f,[1=>"1",2=>"2"])
2-element Array{Any,1}:
 (2,"2")
 (1,"1")

This is very nearly your first suggestion, effectively like … = [ f(x) for x in d ]

It would be more convenient if the key and value were provided as separate arguments instead of as a pair. Of course, then mapping with the identity function doesn't produce an equal Dict, but I think that may well be ok. Another issue is what to do about key collisions.

Yeah, I see. And this was a little surprising to me:

julia> f(x1,x2) = (x1,x2);
julia> map(f,1:3,1:2)
ERROR: dimensions must match # as expected
julia> map(f,Dict(1:3,map(string,1:3)),Dict(11:12,map(string,11:12)))
2-element Array{Any,1}:
 ((2,"2"),(11,"11"))
 ((3,"3"),(12,"12"))

Perhaps the iters… map method should check to see if they're all done at the end?

Oh! I see this came from the mailing list: https://groups.google.com/forum/?fromgroups=#!topic/julia-users/naGp9DUdSq8

How should we distinguish between a map-I-want-a-Dict-returned and map-just-a-tuple-array-is-fine?

Should map stay returning an Array, and amap return an Association? It seems in some cases you want either.

The more general problem is: what about functions where you want to specify the return type - but the parameter types are identical? Is there a Julian solution for this problem - one being the convert convention of specifying the target type. But in this case you don't necessarily want to verbosely specify the exact Dict{K,V} that it is to return - just distinguish between "array" and "association please".

I believe there is a dictionary constructor from an iterable of tuples now, so while the whole thing could be done more efficiently, constructing a Dict is pretty straightforward if that's what you want.

Might need better docs (I haven't checked).

So we definitely need a specific associative method, since currently the type information gets dropped:

function f() # to avoid global-induced Any types
       local a = ["a"=>1, "b"=>2] # Dict{ASCIIString,Int64}
       which(map, kv->kv, a) # map(f::Union(DataType,Function),iters...) at abstractarray.jl:1265
       local m = map(kv->kv, a) # Array{Any,1}
end

I was actually expecting map to return an Array{(ASCIIString,Int64},1}. Why not? Answer: Associative is not an AbstractArray. Why is there not an Iterable abstract type that is a parent of both Associative and AbstractArray Should we have a typed iterator to handle both cases?

(edit: it seems that anonymous functions are another reason the type information gets dropped)

Would Associative support be required in map anyway? We could do:

Dict(map(kv->(k[1], v[2]+1), ["a"=>1, "b"=>2]))
Dict(map(k->k, v->v+1, ["a"=>1, "b"=>2]))
Dict(map(v->v+1, ["a"=>1, "b"=>2]))

Is it possible to constrain Function parameters? Looking at doing a map for AssociativeArray I want to be able to constrain the mapping function signature:

Base.map{K,V,KR,VR}(f::((K,V)->(KR,VR)), d::Associative{K,V}) # produces an Associative{KR,VR}

Note: not saying this specific example is a good idea, just asking generally about constraining functions.

This function would at least get typed correctly (as opposed to the iter-based function)

map{K,V}(f::Callable, d::Associative{K,V}) = [f(k,v) for (k,v) in d]

Then you wrap the return in a Dict if that's what you want.

Of course anonymous functions have issues with preserving type information in comprehensions, which seems why map is not currently implemented with one.

My leaning is to implement the above with a comprehension, and use inline functions (not anonymous syntax). Then I have a hope of getting correctly types out even when an empty array is fed in. The current map can't do this.

I just ran into this problem. Given the functional nature of Julia and its native collections, this should be implemented ASAP:

mapValues(dict::Dict{T, U}, fun::Function{U}) # should return a new Dict{T, V}.
# also implement mapValues!(...)

map(dict::Dict{T, U}, fun::Function{(T,U)}) # should return a new Array{V}

I don't think the syntax is perfect, but you get the idea. You should look to Scala, as those guys have gotten it down, Map (equivalent to Julia's Dict) in particular. The other available functions that are missing in Julia should also be added, like flatMap which excludes all null value Nullable objects (Scala's equivalent is Option).

http://www.scala-lang.org/api/current/scala/collection/Map.html

http://www.scala-lang.org/api/current/index.html#scala.Option

Thanks for the input, @BryanARivera. Any chance you could start a pull request? That would be the best way for this to get implemented ASAP. Most people working on Julia right now aren't focused on this area, but most are willing to review pull requests.

Also, consider opening an issue (and/or pull request) at DataStructures.jl. This is rather less organized (currently) than Scala's Collections, but it would initially be a more appropriate home for some of the other dictionary types.

Cheers!

@kmsquire Hopefully I will have the chance to delve into it soon. I will try to implement everything I can as Scala's collections are quite central to the language. I envision it would accelerate Julia dev as well.

Should be relatively straightforward. Scala has got the naming and function conventions down, we just need to rebuild it in Julia. Would DataStructures.jl be the equivalent of Collections (meaning this is where PRs should be targeted) ?

@BryanARivera, that sounds fantastic. I'm also a huge fan of Scala's Collections framework.

DataStructures is probably the right place, but if what you want to implement is very different than what is already there, it might turn out that a new package would be better. But let's start in DataStructures.jl and see where that goes. Cheers!

@BryanARivera I'm very happy to have read this since I too have been looking for a mapvalues function. The current implementation of map seems quite bizarre after you see the example with filter! which can return an intuitively filtered Dict, but I suppose it may be undesirable to many if we changed map! to behave more like filter!, as in:

x = ["a" =>1, "b"=>2, "c"=>3]
map!((k,v)->v+1, x)  # x = ["a" =>2, "b"=>3, "c"=>4]

since it could break existing code (unless someone like @StefanKarpinski thinks this is a good move?). In any case, functions which return Arrays and functions which return Dicts may both be needed.

We'll probably need both mapvalues and mapvalues! (note no capital 'V') (or possibly mapdict?? probably less generic...). Did you make a start on an implementation?

I would be happy to write a PR. Does this look OK:

extractValue(fun) = (pair) -> fun(pair.second)

mapvalues(fun, dict) = map(extractValue(fun), dict)

It won't have the best performance until Jeff is done with his anon functions improvements.

We don't usually camel-case function names in base.

Also, you don't really need that second function. Just do:

mapvalues(f, d::Associative) = map(p->f(p.second), d)

Sounds like a nice addition to me (as well as mapvalues!). I guess you can rely on anonymous functions being fast as this will only be added in 0.5 anyways.

But that wouldn't return a Dict unless you also add a special map method for Associative that does so: you'd have to include both in your PR.

@nalimilan I see - it's on my todo list.

This works, but doesn't look optimized:

mapvalues(f, d::Associative) = Dict( zip( keys(d), map(p->f(p.second), d)) )

Is there any way to specify a function return type so that this will work?

map(p->f(p.second), d)::Array{AbstractString}

As I said, I think you should first write map, and then implement mapvalues on top on that. Something like this seems to work. It's inspired by map, in particular as regards type inference: it computes the element type from the first element.

Base.similar{K,V}(d::Dict, ::Type{Pair{K,V}}) = Dict{K,V}()

function Base.map(f, a::Associative)
    if isempty(a)
        return similar(a)
    end
    st = start(a)
    a1, st = next(a, st)
    first = f(a1)
    dest = similar(a, typeof(first))
    sizehint!(dest, length(a))
    push!(dest, first)
    while !done(a, st)
        el, st = next(a, st)
        push!(dest, f(el))
    end
    dest
end

mapvalues(f, a::Associative) = map(p->p.first=>f(p.second), a)

I'm not sure about the case where the dict is empty: the existing map for arrays has a special-case for when f is a type, but that doesn't work here.

My code would have to be adapted a bit if we want to write map((k,v)->.., d) instead of map(p->..., d).

FWIW, Haskell defines its map (or rather fmap) as what was named mapvalues above, which is more sound if you share @timholy's view on "what an array means":

it's an associative container for which the space of "keys" can be determined from the array type and its dimensions.

In both cases (array or associative), the function takes only the value (in the key-value pairs) and returns a new value for the _same_ key.

I was surprised recently by the current behavior of map on Dict. When you iterate over a Dict you get (key, value) pairs, so I was expecting my function f to receive a single such pair each time, not one argument for the key and another for the value.

So while I agree that the view on arrays as associative containers is nice, I'm not sure if this should take precedence over the differences between how they are iterated. (Or would we want to bring that in line between dicts and arrays? It's a major breaking change)

I agree that consistency with what iteration over a Dict yields may matter more.
For reference, in the context of Haskell, having map working with (key, value) pairs would not satisfy the following functor rule: map(f, map(g, d)) == map(x->f(g(x)), d) (where d::Dict).

Part of #20402

Deprecation is in; a new map method can be added any time post-0.7.

@JeffBezanson as you are the assignee of this, I have reviewed it an I think that it is covered by current functionality so it should likely be closed.
I think that Dict with a generator covers all the use cases of this issue

We should go back to the old behavour.
I am going to make the claim that the definition of map
is that it takes in an iterator, and returns a Vector.

Dicts iterate as Pairs.
map on dict should be defined and should iterate on pairs,
returning a vector.

If we wanted map to work on values
then we do map(f, values(dict))

if we want a map that returns a dict
then we do Dict(map(f, dict))

I think it is good to be able to thing of map
as the do-blockable form of a Vector comprehansion (assuming no filter)
and this is a case that [f(x) for x in xs] differs from map(f, xs)
And I often want to write:

map(xs) do (key, val)
 ...
end

when [... for (key, val) in xs] would be too long

What if it was lazy and produced a pair generator? My reasoning is that you really never want a vector of pairs, you always want to do something with that vector of pairs: put it in a Dict, etc.

What if it was lazy and produced a pair generator? My reasoning is that you really never want a vector of pairs, you always want to do something with that vector of pairs: put it in a Dict, etc.

It is an interesting point,
but I don't know that it works out.
the eltype of the return type of map(f, xs)
is determineed not by the eltype of xs
but of the return type of f(::eltype(xs))

For example

map(dict) do (kk, vv)
   val = round(vv, sigdigits=2)
   return " - $(kk): $val  (2 sig figs)"
end

I don't think that should return anything with the eltype of Pair

If we wanted map to work on values then we do map(f, values(dict))

If we wanted a map to work on pairs then we should be able to do map(f, pairs(dict)) — oh but wait, that just returns a Dict (because dicts _do_ iterate pairs) and still doesn't work!

IMO, we'd ideally just iterate/map/etc over values, and require pairs to do the for (k, v) in thing, but there's no way we're changing iteration in 1.x. Much of this discussion predates pairs.

So now we're stuck in this weird never-never land where map is available to do what I'd want it to do, but iteration isn't doing what I want. So do we just wait for 2.0 to unify everything?

Options include:

1) make Pairs not subtype AbstractDict but that is breaking

2) We could make the Pairs type an exception to the rule that AbstractDict can't be used in map. Maybe confusing, but nonbreaking.

3) Or we can say that AbstractDict iterate as pairs and thus should map as a pairs

4) we say that you can map AbstractDict but not Dict. i.e. the generalization of 2. This avoids any weird cases with Dict(map(f, dict)) that can maybe happen with hash collisions and it becoming order sensitive. (I can't remember the full logic but I know it results in some weirdness for Sets)


Any of these options also open us up to having map(f, ::NamedTuple) work on the values (as per the fact that NamedTuples iterate values already.
And then to having map(f, pairs(nt)) for iterating namedtuples as pairs

I kind of think we should clean this up in Julia 2.0. It appears to me that a unified solution is likely to involve at least some breaking changes, and adding new functionality to Julia 1.x only increases the impact of any potential changes to the API.

(Also - I haven't been following closely for a while, but was there any thoughts on how many more 1.x releases there will be before we work on 2.0?)

What if it was lazy and produced a pair generator?

@StefanKarpinski Can we start this by defining

map(f, itrs...) = Base.Generator(f, itrs...)

in Base.Iterators? This way, people can use

using Base.Iterators: map, filter, flatten

as kind of using Future.

If we wanted map to work on values
then we do map(f, values(dict))

@oxinabox Does it return a Vector (or some plain iterable)? Maybe it should produce a ValueIterator so that we can get back a dictionary via Dict(zip(keys(dict), map(f, values(dict)))). Maybe it's nice to have parent(::AbstractDict) s.t. parent(values(dict)) === parent(keys(dict)) === dict. This way, we can do parent(map(f, values(dict))) to iterate over values of a dictionary and get another dictionary sharing keys with the original dictionary.

Was this page helpful?
0 / 5 - 0 ratings

Related issues

dpsanders picture dpsanders  Â·  3Comments

ararslan picture ararslan  Â·  3Comments

musm picture musm  Â·  3Comments

i-apellaniz picture i-apellaniz  Â·  3Comments

omus picture omus  Â·  3Comments