As dicussed in https://groups.google.com/forum/#!topic/julia-users/1bwx3fjSO5A, many are unhappy with how verbose Dict literal construction has become in 0.4. I'm aware there were real problems with the old syntax, but maybe we can still think of a way to allow a more parsimonious syntax going forward.
See #6739 for discussion on the original change.
What's so verbose about Dict(3=>4, 5=>6)
? It is only three four more characters than [3=>4, 5=>6]
. (I can't count.)
To add a little more context, this is especially true when dealing with Dicts of Dicts (e.g., when printing Julia representation of JSON objects):
julia> using JSON
julia> a="{\"menu\": {
\"id\": \"file\",
\"value\": \"File\",
\"popup\": {
\"menuitem\": [
{\"value\": \"New\", \"onclick\": \"CreateNewDoc()\"},
{\"value\": \"Open\", \"onclick\": \"OpenDoc()\"},
{\"value\": \"Close\", \"onclick\": \"CloseDoc()\"}
]
}
}}
"
"{\"menu\": {\n \"id\": \"file\",\n \"value\": \"File\",\n \"popup\": {\n \"menuitem\": [\n {\"value\": \"New\", \"onclick\": \"CreateNewDoc()\"},\n {\"value\": \"Open\", \"onclick\": \"OpenDoc()\"},\n {\"value\": \"Close\", \"onclick\": \"CloseDoc()\"}\n ]\n }\n }}\n "
julia> println(JSON.parse(a))
Dict{AbstractString,Any}("menu"=>Dict{AbstractString,Any}("id"=>"file","value"=>"File","popup"=>Dict{AbstractString,Any}("menuitem"=>Any[Dict{AbstractString,Any}("onclick"=>"CreateNewDoc()","value"=>"New"),Dict{AbstractString,Any}("onclick"=>"OpenDoc()","value"=>"Open"),Dict{AbstractString,Any}("onclick"=>"CloseDoc()","value"=>"Close")])))
@kmsquire, even if you need to specify the type, the old syntax was still nearly as verbose (three fewer characters): (AbstractString=>Any)[ ... ]
vs. Dict{AbstractString,Any}( ... )
.
Just so concisely summarize the original change motivation, it seems like the decision to make =>
first-class (which I totally agree with) is what disallows [a=>b, c=>d]
(since it would be ambiguous with a Vector of Pairs). What was the probelm with curly-brace syntax?
@malmaud, curly braces used to be Any[...]
(ala Matlab cell arrays), and this needs to be deprecated for at least one major release, before it can be repurposed.
Also, punctuation is precious. Even when curly braces are available to be repurposed, is it really worth using them to save typing 3-4 characters?
Ah right. So maybe this is as simple as repurposing curlies in .5.
I think the leading contender is tuple types: https://github.com/JuliaLang/julia/issues/8470
With https://github.com/JuliaLang/julia/commit/85f45974a581ab9af955bac600b90d9ab00f093b, curly braces could maybe be used for both Tuples (with types) and something else (with values, or more specifically just Pairs). Sure, it's two meanings for the same syntax, but they're used in very different contexts with very different content between the braces.
My current problem is more the inconsistencies between the type inference with [ ] and Dict( )
(see https://groups.google.com/forum/#!topic/julia-users/1bwx3fjSO5A)
Some more issues:
Dict("a"=>1,"b"=>2)
=> Dict{ASCIIString,Int64}
, but Dict("á"=>1,"b"=>2)
=> Dict{Any,Int64}
.
That could have come back as Dict{UTF8String,Int64}
, or at least Dict(AbstractString,Int64)
.
I'm against going back on this.
The comparison in the OP of the julia-users thread is not even remotely fair, because they are specifying the type in one case and not the other.
@stevengj, yes dicts are a mighty tool (See Python), spending Syntax in that might be a good investment.
@IainNZ - slightly unfair except for the following
julia> { :a => 1 }
WARNING: deprecated syntax "{a=>b, ...}".
Use "Dict{Any,Any}(a=>b, ...)" instead.
Dict{Any,Any} with 1 entry:
:a => 1
If you follow the depreciation it suggests that one should use types when replacing {}.
Subtle point I guess, because { }
means Dict{Any,Any}
, but its not clear {Any,Any}
was wanted - in fact you the example used {Symbol,Any}
- which is more like []
in 0.3
If we were being consistent we would be removing [1,2,3 ] as well and making people type Vector( 1,2,3 ) etc. I see no reason why Vectors are more special that associative collections.
I don't see why we are debating this now, 0.4 is close to being finally branched. Discussion about this change is almost a year old at this point.
@jakebolewski because right now we are spending a large amount of time updating packages and code to use 0.4 - this is the first time for many where they are seeing the impact of this change.
@mdcfrancis I don't think that necessarily follows re []
and Vector
, but if you'd like to submit a PR implementing a special syntax for Dict
I'm sure it'll be assessed on its merits for Julia 0.5.
There isn't enough syntax for every data structure, and I would argue that distinguishing data structures by bracket type is not terribly clear anyway. I also think vectors really are more fundamental than dictionaries. How are dictionaries implemented after all?
I might add that { }
is well-established notation for sets, so maybe { }
should only construct sets. But I don't want to debate whether sets or dicts are more important.
Just my 2 cents, but I greatly prefer the new syntax. Dict{K,V}(...)
reads clearer to me than (K=>V)[...]
. It is also more explicit; it's obvious that you're constructing a Dict
rather than an array of Pair
s.
If you follow the depreciation it suggests that one should use types when replacing {}.
To back up @IainNZ's point, if you use the old syntax for a type-inferred Dict
, the deprecation warning actually shows you the correct new syntax for making the same Dict
:
julia> ["a"=>2, "b"=>3]
WARNING: deprecated syntax "[a=>b, ...]".
Use "Dict(a=>b, ...)" instead.
Dict{ASCIIString,Int64} with 2 entries:
"b" => 3
"a" => 2
That fact that this was a source of confusion in the first place is an argument in favor of the new syntax, IMO.
I would argue that distinguishing data structures by bracket type is not terribly clear anyway.
I very much agree with this.
I might add that { } is well-established notation for sets, so maybe { } should only construct sets. But I don't want to debate whether sets or dicts are more important.
My vote is strongly in favor of using {}
for #8470, instead of using them to construct a new value (I suppose a type is iteslf a value of type DataType
, but you know what I mean).
To clarify on a few points
The feeling around associative collections is simply that they are very prevalent in coding, especially when you are interfacing with other systems. If you look at Escher (for example) you'll see them all over the place.
For my common use cases [ :a => 1, :b => [ :x => 2.3] ] is more that sufficient, e.g. a list of pairs which may be coerced when required into an associative. Perhaps the challenge here is more that this syntax is deprecated where it should really be encouraged?
What's so verbose about Dict(3=>4, 5=>6)? It is only four more characters
My worry is that +
-> .+
was only one more character, and look how well that turned out (#7226).
The feeling around associative collections is simply that they are very prevalent in coding, especially when you are interfacing with other systems.
No dispute there!
For my common use cases [ :a => 1, :b => [ :x => 2.3] ] is more that sufficient, e.g. a list of pairs which may be coerced when required into an associative. Perhaps the challenge here is more that this syntax is deprecated where it should really be encouraged?
The reason why running [:a => 1, :b => [:x => 2.3]]
currently throws a deprecation warning, but still actually follows through with old behavior, is to give folks time to adjust before removing the old behavior entirely. I'm not sure when the changeover will actually occur (maybe once v0.4 actually releases), but once it does, this will indeed be the right syntax for constructing an array of Pair
s.
If the new behavior is sufficient for your case and you want to make the switch now, you can explicitly "opt in" by using the Pair
constructor instead of the =>
operator:
julia> [Pair(:a, 1), Pair(:b, [Pair(:x, 2.3)])]
2-element Array{Pair{Symbol,B},1}:
:a=>1
:b=>[:x=>2.3]
Definitely not as pretty as using the =>
operator, but it will work until the new behavior fully comes into play.
I'm not convinced that it is obvious that {T1,T2} is the type of a tuple, though I guess I can get my head around it.
The idea does require some getting used to if you're used to the current Julia syntax. It's more intuitive when you think about it in relation to the role value tuples play in function application:
f
applied to arguments (1,2,3)
→ f(1,2,3)
T
applied to parameters {A,B,C}
→ T{A,B,C}
It could also really cleans up syntax that currently uses Val
types (or some similar wrapper type), which can come into play when writing generated functions for type-stable transformations over heterogeneous tuples (but I digress, discussion regarding the tuple type change should probably stay in #8470).
I also think vectors really are more fundamental than dictionaries. How are dictionaries implemented after all?
I think that is the wrong question. We are talking about the syntax and concepts here, not implementation details.
If you think of it from a _conceptual_ viewpoint, associative arrays (aka dictionaries) are more fundamental than integer subscripted vectors or arrays (Lua is very nice that way, as is CachéObjectScript and M/Mumps).
What is a vector, but an associative array with the keys restricted to integers?
Also, why do dictionaries even have to be implemented with vectors? (unless you really want to get down to the nitty gritty, where the entire memory of the computer is a vector of bytes).
In COS, globals (persistent, distributed, atomic) associative arrays were implemented with B+ trees,
and local associative arrays with a variety of structures (p-tries, vectors that stored the base index and span and allowed for missing values [arrays that had only had integer subscripts _yet_], hash tables), whatever was most efficient, but all invisible to the programmer.
If the blessing that is generic programming allow lists of pairs to realistically work in most contexts that dictionaries currently work, that seems like it would be a pretty good solution.
@malmaud and from the experiment I'm doing at the moment, you can trivially implement the associative methods for Vector{Pair{K,V}}, so perhaps the real issue here is that the deprecation should not have been such and should have been a switch to the Pair vector syntax with a thin shim which supports associative like behavior, which is often cheaper / smaller for small collections.
@JeffBezanson - how would you feel about a PR for that? e.g. go directly to the pair syntax ?
If the blessing that is generic programming allow lists of pairs to realistically work in most contexts that dictionaries currently work, that seems like it would be a pretty good solution.
That would be cool for it to work as a linear-search "dictionary", but unfortunately there's a clash of meanings with numeric keys.
d = [Pair(2=>3), Pair(3=>4)]
d[2] == (3=>4) # Is it the second element?
d[2] == 3 # Or is it the key lookup?
@mbauman it would be the key lookup, but I agree that is an odd case
Then it's no longer a Vector of Pairs. Here's the trouble:
d = Any[Pair(2=>3), Pair(3=>4)]
d[2] == (3=>4)
d = [Pair(2=>3), Pair(3=>4)]
d[2] == 3
This would be bizarre. If inference at some point fails to concretely type an array comprehension, your data structure now behaves extremely differently.
In 0.5, will x = [ :a => 1, :c => [ 2, 3] ]
give me a Vector{Pair{Symbol,Any}}
?
and typeof(x[2])
gives me Pair{Symbol,Vector{Int}}
? (really, it gives Array{Int64,1}
instead of Vector{Int}
, but they are ===
).
That is what I'd expect.
@mdcfrancis I don't agree with it doing a key lookup instead of returning the Pair.
Instead, I'd have a Dict constructor that converts a (possibly nested) vector of pairs into a Dict of the right type, and returns it.
I.e. x = Dict([:a => 1, :b => [2,3]])
returns something of type Dict{Symbol, Any}
, where
x[:a]
returns 1, and x[:b]
returns [2,3]
.
How about that? Syntax is easy, doesn't change any proposed 0.5 syntax, and gives easy to read associate array literals.
@mbauman Ya, all I was thinking of really is that functions that expect a dict, f(d)=something(d)
, would instead look like f(d)=something(asdict(d))
. Define asdict(d::Associative)=d
and asdict={T<:Pair}(x::Vector{T})=Dict(x)
(or some light-weight dict alternative that has that key-value semantics).
@kmsquire, why did you bother with all of those " to quote the JSON string? Just use """ instead, and you don't have to change them (except watch out for $'s in the text!)
@malmaud Sounds like we are thinking on exactly the same lines.
@nolta, I don't think the .+
vs. +
transition is comparable. Requiring .+
for array + scalar
was problematic because +
is extremely well established syntax for this operation in scientific computing. Also, using +
required no special support in the Julia parser, only ordinary method overloading. Whereas Julia's old Dict
syntax is neither universal nor implementable without special parser support.
The proposal is no worse than what exists in 0.3 today which a lot of people were happy with ( pairs convert to dictionary ). As @mbauman points out the ambiguity is for integer keys, for the rest of the universe of types the behavior would be consistent (key based lookups with linear performance). We could (if required) special case integer so that it does not perform the key lookup (probably a good idea).
This would not solve the case where a function is expecting an associative which seems like the main reason not to do this. We would have to go through the code and change API points to accept Vector{Pair} or Associative - I suspect this is less work than changing all the usage of { pair } and [ pair ] though and would be inline with the future direction.
I don't think changing the indexing behavior of a Vector
based on simply on its eltype
is a good idea...something with type Vector
should _behave_ like a vector, not a dictionary. If you want it to behave like a dictionary...well, that's what Dict
is for.
You are probably correct, though this does not change the indexing, it just extends it. A Vector{Pair{String,Any}} would still behave like any other vector you can push elements onto it, you can reference by integer index etc. Just that when you indexed it by a String the lookup would be on the contents of the element.
@kmsquire, why did you bother with all of those " to quote the JSON string? Just use """ instead, and you don't have to change them (except watch out for $'s in the text!)
True. That example was copy-pasted from the JSON.jl tests, and whoever wrote that originally probably wasn't aware of """ at the time.
@one-more-minute suggested the following macro for supporting direct JSON syntax in Julia.
https://groups.google.com/d/msg/julia-users/1bwx3fjSO5A/V_inIa7eCAAJ
I'm also looking forward to having vectors of pairs as soon as is practical in the next version.
These two items would remove my objections to the removal of the terse syntax (as it would still exist for my purposes :) ) . At what point do we think we will be able to remove the backward compatibility from the [] syntax?
I do love how this discussion led to a reasonable solution for @mdcfrancis (and certainly others, including myself), within less than 24 hours. It might have seemed like pointless complaining at first to some people, but look at the results.
Backwards compatibility will be removed in 0.5, after one release cycle of deprecation warnings.
@ScottPJones - agreed. @mauro3 should we close this issue and open a concise description with a 0.5 tag so that it is rembered ? We can place the link back to here.
That really is a nice macro. Good example of a situation where a macro is a good solution.
Most helpful comment
If we were being consistent we would be removing [1,2,3 ] as well and making people type Vector( 1,2,3 ) etc. I see no reason why Vectors are more special that associative collections.