I noticed there wasn't any map
methods that generated Dict
s. Which function/name - comments please:
map{K,V}(f::Callable, d::Associative{K,V}) = [ f(k,v) for (k,v)=d ]
map{K,V}(f::Callable, d::Associative{K,V}) = [ (k, f(k,v)) for (k,v)=d ] # or this?
map{K,V}(f::Callable, d::Associative{K,V}) = [ (k, f(v)) for (k,v)=d ] # ..or this?
To distinguish the above from the current map
(which you still may want to use), they will need new names - suggestions?
I noticed that map
for Array
s handles multiple arrays - presumably this is more of a join-by-key semantic for Dicts and therefore not needed in map
for Dict
s?
Currently, map for dictionary arguments falls back to the general iterable case. I'm not sure I see a difference between the status quo and what you are looking for. Does this not work the way you'd like it to?
julia> f(x) = x;
julia> @which map(f,[1=>"1",2=>"2"])
map(f::Union(Function,DataType),iters...) at abstractarray.jl:1265
julia> map(f,[1=>"1",2=>"2"])
2-element Array{Any,1}:
(2,"2")
(1,"1")
This is very nearly your first suggestion, effectively like … = [ f(x) for x in d ]
It would be more convenient if the key and value were provided as separate arguments instead of as a pair. Of course, then mapping with the identity function doesn't produce an equal Dict, but I think that may well be ok. Another issue is what to do about key collisions.
Yeah, I see. And this was a little surprising to me:
julia> f(x1,x2) = (x1,x2);
julia> map(f,1:3,1:2)
ERROR: dimensions must match # as expected
julia> map(f,Dict(1:3,map(string,1:3)),Dict(11:12,map(string,11:12)))
2-element Array{Any,1}:
((2,"2"),(11,"11"))
((3,"3"),(12,"12"))
Perhaps the iters… map method should check to see if they're all done at the end?
Oh! I see this came from the mailing list: https://groups.google.com/forum/?fromgroups=#!topic/julia-users/naGp9DUdSq8
How should we distinguish between a map-I-want-a-Dict-returned
and map-just-a-tuple-array-is-fine
?
Should map
stay returning an Array
, and amap
return an Association
? It seems in some cases you want either.
The more general problem is: what about functions where you want to specify the return type - but the parameter types are identical? Is there a Julian solution for this problem - one being the convert
convention of specifying the target type. But in this case you don't necessarily want to verbosely specify the exact Dict{K,V} that it is to return - just distinguish between "array" and "association please".
I believe there is a dictionary constructor from an iterable of tuples now, so while the whole thing could be done more efficiently, constructing a Dict is pretty straightforward if that's what you want.
Might need better docs (I haven't checked).
So we definitely need a specific associative method, since currently the type information gets dropped:
function f() # to avoid global-induced Any types
local a = ["a"=>1, "b"=>2] # Dict{ASCIIString,Int64}
which(map, kv->kv, a) # map(f::Union(DataType,Function),iters...) at abstractarray.jl:1265
local m = map(kv->kv, a) # Array{Any,1}
end
I was actually expecting map
to return an Array{(ASCIIString,Int64},1}
. Why not? Answer: Associative
is not an AbstractArray
. Why is there not an Iterable
abstract type that is a parent of both Associative
and AbstractArray
Should we have a typed iterator to handle both cases?
(edit: it seems that anonymous functions are another reason the type information gets dropped)
Would Associative
support be required in map
anyway? We could do:
Dict(map(kv->(k[1], v[2]+1), ["a"=>1, "b"=>2]))
Dict(map(k->k, v->v+1, ["a"=>1, "b"=>2]))
Dict(map(v->v+1, ["a"=>1, "b"=>2]))
Is it possible to constrain Function parameters? Looking at doing a map
for AssociativeArray
I want to be able to constrain the mapping function signature:
Base.map{K,V,KR,VR}(f::((K,V)->(KR,VR)), d::Associative{K,V}) # produces an Associative{KR,VR}
Note: not saying this specific example is a good idea, just asking generally about constraining functions.
This function would at least get typed correctly (as opposed to the iter-based function)
map{K,V}(f::Callable, d::Associative{K,V}) = [f(k,v) for (k,v) in d]
Then you wrap the return in a Dict
if that's what you want.
Of course anonymous functions have issues with preserving type information in comprehensions, which seems why map
is not currently implemented with one.
My leaning is to implement the above with a comprehension, and use inline functions (not anonymous syntax). Then I have a hope of getting correctly types out even when an empty array is fed in. The current map
can't do this.
I just ran into this problem. Given the functional nature of Julia and its native collections, this should be implemented ASAP:
mapValues(dict::Dict{T, U}, fun::Function{U}) # should return a new Dict{T, V}.
# also implement mapValues!(...)
map(dict::Dict{T, U}, fun::Function{(T,U)}) # should return a new Array{V}
I don't think the syntax is perfect, but you get the idea. You should look to Scala, as those guys have gotten it down, Map (equivalent to Julia's Dict) in particular. The other available functions that are missing in Julia should also be added, like flatMap which excludes all null value Nullable objects (Scala's equivalent is Option).
http://www.scala-lang.org/api/current/scala/collection/Map.html
http://www.scala-lang.org/api/current/index.html#scala.Option
Thanks for the input, @BryanARivera. Any chance you could start a pull request? That would be the best way for this to get implemented ASAP. Most people working on Julia right now aren't focused on this area, but most are willing to review pull requests.
Also, consider opening an issue (and/or pull request) at DataStructures.jl
. This is rather less organized (currently) than Scala's Collections
, but it would initially be a more appropriate home for some of the other dictionary types.
Cheers!
@kmsquire Hopefully I will have the chance to delve into it soon. I will try to implement everything I can as Scala's collections are quite central to the language. I envision it would accelerate Julia dev as well.
Should be relatively straightforward. Scala has got the naming and function conventions down, we just need to rebuild it in Julia. Would DataStructures.jl
be the equivalent of Collections
(meaning this is where PRs should be targeted) ?
@BryanARivera, that sounds fantastic. I'm also a huge fan of Scala's Collections framework.
DataStructures is probably the right place, but if what you want to implement is very different than what is already there, it might turn out that a new package would be better. But let's start in DataStructures.jl and see where that goes. Cheers!
@BryanARivera I'm very happy to have read this since I too have been looking for a mapvalues
function. The current implementation of map
seems quite bizarre after you see the example with filter!
which can return an intuitively filtered Dict
, but I suppose it may be undesirable to many if we changed map!
to behave more like filter!
, as in:
x = ["a" =>1, "b"=>2, "c"=>3]
map!((k,v)->v+1, x) # x = ["a" =>2, "b"=>3, "c"=>4]
since it could break existing code (unless someone like @StefanKarpinski thinks this is a good move?). In any case, functions which return Array
s and functions which return Dict
s may both be needed.
We'll probably need both mapvalues
and mapvalues!
(note no capital 'V') (or possibly mapdict
?? probably less generic...). Did you make a start on an implementation?
I would be happy to write a PR. Does this look OK:
extractValue(fun) = (pair) -> fun(pair.second)
mapvalues(fun, dict) = map(extractValue(fun), dict)
It won't have the best performance until Jeff is done with his anon functions improvements.
We don't usually camel-case function names in base.
Also, you don't really need that second function. Just do:
mapvalues(f, d::Associative) = map(p->f(p.second), d)
Sounds like a nice addition to me (as well as mapvalues!
). I guess you can rely on anonymous functions being fast as this will only be added in 0.5 anyways.
But that wouldn't return a Dict
unless you also add a special map
method for Associative
that does so: you'd have to include both in your PR.
@nalimilan I see - it's on my todo list.
This works, but doesn't look optimized:
mapvalues(f, d::Associative) = Dict( zip( keys(d), map(p->f(p.second), d)) )
Is there any way to specify a function return type so that this will work?
map(p->f(p.second), d)::Array{AbstractString}
As I said, I think you should first write map
, and then implement mapvalues
on top on that. Something like this seems to work. It's inspired by map
, in particular as regards type inference: it computes the element type from the first element.
Base.similar{K,V}(d::Dict, ::Type{Pair{K,V}}) = Dict{K,V}()
function Base.map(f, a::Associative)
if isempty(a)
return similar(a)
end
st = start(a)
a1, st = next(a, st)
first = f(a1)
dest = similar(a, typeof(first))
sizehint!(dest, length(a))
push!(dest, first)
while !done(a, st)
el, st = next(a, st)
push!(dest, f(el))
end
dest
end
mapvalues(f, a::Associative) = map(p->p.first=>f(p.second), a)
I'm not sure about the case where the dict is empty: the existing map
for arrays has a special-case for when f
is a type, but that doesn't work here.
My code would have to be adapted a bit if we want to write map((k,v)->.., d)
instead of map(p->..., d)
.
FWIW, Haskell defines its map
(or rather fmap
) as what was named mapvalues
above, which is more sound if you share @timholy's view on "what an array means":
it's an associative container for which the space of "keys" can be determined from the array type and its dimensions.
In both cases (array or associative), the function takes only the value (in the key-value pairs) and returns a new value for the _same_ key.
I was surprised recently by the current behavior of map
on Dict
. When you iterate over a Dict
you get (key, value) pairs, so I was expecting my function f
to receive a single such pair each time, not one argument for the key and another for the value.
So while I agree that the view on arrays as associative containers is nice, I'm not sure if this should take precedence over the differences between how they are iterated. (Or would we want to bring that in line between dicts and arrays? It's a major breaking change)
I agree that consistency with what iteration over a Dict
yields may matter more.
For reference, in the context of Haskell, having map
working with (key, value) pairs would not satisfy the following functor rule: map(f, map(g, d)) == map(x->f(g(x)), d)
(where d::Dict
).
This is now fixed by https://github.com/JuliaLang/julia/commit/fc3cbc9e15c73c937afbee35e08b8809ef282f58 and #17968 right?
Part of #20402
Deprecation is in; a new map
method can be added any time post-0.7.
@JeffBezanson as you are the assignee of this, I have reviewed it an I think that it is covered by current functionality so it should likely be closed.
I think that Dict with a generator covers all the use cases of this issue
We should go back to the old behavour.
I am going to make the claim that the definition of map
is that it takes in an iterator, and returns a Vector
.
Dict
s iterate as Pair
s.
map
on dict should be defined and should iterate on pairs,
returning a vector.
If we wanted map to work on values
then we do map(f, values(dict))
if we want a map that returns a dict
then we do Dict(map(f, dict))
I think it is good to be able to thing of map
as the do-blockable form of a Vector comprehansion (assuming no filter)
and this is a case that [f(x) for x in xs]
differs from map(f, xs)
And I often want to write:
map(xs) do (key, val)
...
end
when [... for (key, val) in xs]
would be too long
What if it was lazy and produced a pair generator? My reasoning is that you really never want a vector of pairs, you always want to do something with that vector of pairs: put it in a Dict
, etc.
What if it was lazy and produced a pair generator? My reasoning is that you really never want a vector of pairs, you always want to do something with that vector of pairs: put it in a Dict, etc.
It is an interesting point,
but I don't know that it works out.
the eltype
of the return type of map(f, xs)
is determineed not by the eltype
of xs
but of the return type of f(::eltype(xs))
For example
map(dict) do (kk, vv)
val = round(vv, sigdigits=2)
return " - $(kk): $val (2 sig figs)"
end
I don't think that should return anything with the eltype
of Pair
If we wanted map to work on values then we do
map(f, values(dict))
If we wanted a map to work on pairs then we should be able to do map(f, pairs(dict))
— oh but wait, that just returns a Dict
(because dicts _do_ iterate pairs) and still doesn't work!
IMO, we'd ideally just iterate/map/etc over values, and require pairs
to do the for (k, v) in
thing, but there's no way we're changing iteration in 1.x. Much of this discussion predates pairs
.
So now we're stuck in this weird never-never land where map
is available to do what I'd want it to do, but iteration isn't doing what I want. So do we just wait for 2.0 to unify everything?
Options include:
1) make Pairs
not subtype AbstractDict
but that is breaking
2) We could make the Pairs
type an exception to the rule that AbstractDict
can't be used in map
. Maybe confusing, but nonbreaking.
3) Or we can say that AbstractDict
iterate as pairs and thus should map as a pairs
4) we say that you can map
AbstractDict
but not Dict
. i.e. the generalization of 2. This avoids any weird cases with Dict(map(f, dict))
that can maybe happen with hash collisions and it becoming order sensitive. (I can't remember the full logic but I know it results in some weirdness for Sets
)
Any of these options also open us up to having map(f, ::NamedTuple)
work on the values (as per the fact that NamedTuple
s iterate values already.
And then to having map(f, pairs(nt))
for iterating namedtuples as pairs
I kind of think we should clean this up in Julia 2.0. It appears to me that a unified solution is likely to involve at least some breaking changes, and adding new functionality to Julia 1.x only increases the impact of any potential changes to the API.
(Also - I haven't been following closely for a while, but was there any thoughts on how many more 1.x releases there will be before we work on 2.0?)
What if it was lazy and produced a pair generator?
@StefanKarpinski Can we start this by defining
map(f, itrs...) = Base.Generator(f, itrs...)
in Base.Iterators
? This way, people can use
using Base.Iterators: map, filter, flatten
as kind of using Future
.
If we wanted map to work on values
then we domap(f, values(dict))
@oxinabox Does it return a Vector
(or some plain iterable)? Maybe it should produce a ValueIterator
so that we can get back a dictionary via Dict(zip(keys(dict), map(f, values(dict))))
. Maybe it's nice to have parent(::AbstractDict)
s.t. parent(values(dict)) === parent(keys(dict)) === dict
. This way, we can do parent(map(f, values(dict)))
to iterate over values of a dictionary and get another dictionary sharing keys with the original dictionary.
Most helpful comment
If we wanted a map to work on pairs then we should be able to do
map(f, pairs(dict))
— oh but wait, that just returns aDict
(because dicts _do_ iterate pairs) and still doesn't work!IMO, we'd ideally just iterate/map/etc over values, and require
pairs
to do thefor (k, v) in
thing, but there's no way we're changing iteration in 1.x. Much of this discussion predatespairs
.So now we're stuck in this weird never-never land where
map
is available to do what I'd want it to do, but iteration isn't doing what I want. So do we just wait for 2.0 to unify everything?