Julia: Support dict.field syntax for Dict{Symbol, <:Any}

Created on 25 Jan 2018  Â·  53Comments  Â·  Source: JuliaLang/julia

It would be very convenient to be able to use d.foo with a d = Dict(:foo => ...).

Obviously, this would require defining methods of Base.getproperty, Base.setproperty! and Base.propertynames for Dict{Symbol, <:Any} in Base, which I guess would count as a breaking change. But now that #25311 is merged, it sure would be fun to have in. :-)

Most helpful comment

Refusing to let people do config.params.training.init.nepochs with built-in Dicts is just setting ourselves up for Dict being considered "that shitty dict type that Base provides... oh yeah, don't use that one, use the one provided by the CoolDicts package". Seriously – I really don't think that being purists about this is going to go the way you hope it will. This is one of _the_ usages that property overloading is perfect for; now that we have that feature, it is going to happen.

We could easily define a very generic getproperty(::AbstractDict, s::Symbol) method and the only real burden would be that you have to write getfield(d, :keys) to access the internals of their dictionary types. I don't really think that's such a big deal. A couple of people implement and maintain some fairly hairy code for that; thousands of people use the types and benefit from improved ergonomics.

All 53 comments

Question is, would we want this only for Dict{Symbol, <:Any}, or for Dict in general, to support dictionaries with mixed key types (I guess only the keys of type Symbol would be used in that case)?

I think just defining it for all dicts would make sense with d.foo being equivalent to d[:foo]. Of course, it would only work for dictionaries where d[:foo] works. This would force access to the internals of dictionaries via getfield but that's ok since poking around at dictionary internals is not the best idea anyway.

d."foo" would work.

True, and that would be nice syntax as well.

:-1: Seems like a good step to be closer to javascript.

Also this will cause a deviation between API of Dict and other similar types or make it extremely hard for users to implement them if they want to provide a similar API as the base one.

I'm also mostly against this; I think it just starts to be too many ways to do the same thing.

@JeffBezanson> I'm also mostly against this; I think it just starts to be too many ways to do the same thing.

Maybe. But on the other hand, it might be very powerful when writing generic code. I see Dict{Symbol} as the natural dynamic equivalent of struct and NamedTuple, and with this syntax the same code would work for all of them. Of course that doesn't mean people should start to use Dict instead of struct/NamedTuple.

@yuyichao> Seems like a good step to be closer to javascript.

True - but in certain uses cases, that may be a good thing. Even a high-speed Julia application may have to deal with nested dicts in a a low-speed corner (configuration data, JSON, ...), and the code will look much nicer without a flood of square brackets. Besides, we already have getproperty & friends, using them for Dict seems natural to me.

A while ago there was a discussion about using a character like $ to resemble . syntax, but with meaning a$b = a[:b], i.e., a getindex(), which would be handy for objects of type Dict and DataArray, and perhaps also drilling down Python objects in PyCall.

In fact, I worked hard to get the then-current meaning of $ as \xor operator deprecated and replaced by ⊻ in #18977, so that this character could be used for such a purpose in the future. The PR was merged about a year ago.

Now the discussions in #1974 c.s. have finally led to the implementation of getproperty. I've played around a bit with it, and indeed you can say

Base.getproperty(d::Dict, s::Symbol) = s ∈ fieldnames(Dict) ? getfield(d, s) : getindex(d, s)
d = Dict(:wallet =>1, :keys => 2)
@show d.wallet
@show d.keys

with perhaps somewhat unexpected result in the last line.

For that reason I would still consider it an option to use syntax a$b as a shortcut for getindex(a, :b)or similar---in the end, we've played $ free for such use, it resembles R syntax, and it allows proper separation between struct fields and index access.

This would first require full deprecation of $ as \xor in julia 0.7 (for which it is about time, I suppose, anyway) and different parsing.

@oschulz FYI, you've completely missed the main objection in my comment. And,

it might be very powerful when writing generic code. I see Dict{Symbol} as the natural dynamic equivalent of struct and NamedTuple, and with this syntax the same code would work for all of them.

A little bit for NamedTuple and not at all for struct and in anycase Dict are much more closer to AbstractDict and I'll claim it is completely impractical to require all of those to provide the same API so doing this will hurt generic code. Unless you are going to throw in random types to replace the Dict, defining a getindex on the type you want to replace the Dict with is trivial Base.getindex(a::AbstractConfig, key::Symbol) = getfield(a, key) if for whatever reason you don't want to just use NamedTuple.

the code will look much nicer without a flood of square brackets

Disagree completely. For one, loading unknown configure dict into Symbol is already a bad thing, String should have been used instead. a[:b] and a["b"] are also only two characters longer than a.b and a."b" so for any key/variable name of non-trivial length they don't shorten your code by much. Finally, assuming playing with Dict is not the only thing you do in your code, indexing with string or symbol keys makes it very consistent with other operations on the Dict without a known constant key and distinct from normal field access used in other part of the code. With getproperty on Dict you are just hiding the fact that you are indeed doing a dict lookup.

Besides, we already have getproperty & friends, using them for Dict seems natural to me.

No, getproperty is a feature that can be easily abused. Using it for everything does not make sense.

@yuyichao > FYI, you've completely missed the main objection in my comment.

No, I didn't - I just didn't have a good response to your argument regarding the API of Dict and other similar types. ;-) I thought about doing this for Dict only, not for similar types as well, but you can judge those implications better than I can.

@yuyichao > For one, loading unknown configure dict into Symbol is already a bad thing

I won't argue here, but I'd like to learn why. Wouldn't Symbol be a natural choice if terms repeat frequently? Or is it bad to risk "flooding" the Symbols table?

missed the main objection

I just mean that you didn't reply to them at all, together with most of the objections in https://github.com/JuliaLang/julia/issues/25750#issuecomment-360777148

Wouldn't Symbol be a natural choice if terms repeat frequently? Or is it bad to risk "flooding" the Symbols table?

If by whatever mean, you are confident that the keys are always known to be within a fixed set it's fine but that's not generally a property you can rely on when loading configure file. Allocating a symbol table entry for "a"^1000_000 that cannot be freed without a restart is pretty bad.

Honestly, this is one of the features of languages like JavaScript and K that people who use those languages really love, and one that I've personally wanted to be able to emulate for a long time. So what if it's two ways to write the same thing? When has that ever been such a huge problem? If we don't add support for this to Dict people are just going to create a SymDicts package that defines a SymDict type that wraps a Dict with this feature. Sure, you can say, "great, let them create that package" but why relegate it to package?

If strings are better as keys for config.json style objects than symbols, because they don't flood the symbol table, then that should be their representation. But still, being able to access an object in code using syntactic sugar like config.params.training.init.nepochs rather than config["params"]["training"]["init"]["nepochs"] is arguably easier to read, and could be supported by a (local) definition

Base.getproperty(d::Dict{<:AbstractString,<:Any}, s::Symbol) = s ∈ fieldnames(Dict) ? getfield(d, s) : getindex(d, string(s))

In order not to confuse people with the getfield interpretation of ., this could alternatively be written config$params$training$init$nepochs if $ receives that meaning...

So what if it's two ways to write the same thing

It's great for the consumer, but expands the API for the implementer, so the wrapper approach may actually be more reliable and easier to code (SymDicts can wrap any AbstractDict with String keys, and since it only has one field and a small API, it's easier to write getfield(d, :dict) everywhere in its implementation. This might be much easier if we re-design many of our uses of the internal top lowering to be module-specific globals as suggested in #22147, or alternatively had just merged that PR (as either would make it easy to make a module-internal getproperty function that called getfield for the Dict implementation code, but exported a different definition for Base.getproperty)

Allocating a symbol table entry for "a"^1000_000 that cannot be freed without a restart is pretty bad

As davidavdav, this isn't a fundamental issue, the code just needs to distinguish between the keys (as Strings) and the syntactic getproperty calls (as Symbols).

Refusing to let people do config.params.training.init.nepochs with built-in Dicts is just setting ourselves up for Dict being considered "that shitty dict type that Base provides... oh yeah, don't use that one, use the one provided by the CoolDicts package". Seriously – I really don't think that being purists about this is going to go the way you hope it will. This is one of _the_ usages that property overloading is perfect for; now that we have that feature, it is going to happen.

We could easily define a very generic getproperty(::AbstractDict, s::Symbol) method and the only real burden would be that you have to write getfield(d, :keys) to access the internals of their dictionary types. I don't really think that's such a big deal. A couple of people implement and maintain some fairly hairy code for that; thousands of people use the types and benefit from improved ergonomics.

Would this be only for Dict, or for all AbstractDict objects? Maybe only the former, since if you do it for the latter than it makes it harder to make a custom dictionary-like object that also has fields. Of course, you could always define getproperty(...) = getfield(...) for such types to opt-out.

(e.g. I'm trying to think how this would affect PyDict, which is an AbstractDict wrapper around a Python object, but Python objects will also need getproperty to access their methods/attributes. Of course, PyDict will define its own getproperty, so that will override any general method for AbstractDict.)

If this is only a sugar that affects Dict, I guess that's fine, too, although the argument against that would be the slight inconsistency with other Dict types, though other Dict types can of course opt-in by defining their own getproperty method. Or I guess you could make Dict <: PropertyDict <: AbstractDict so that other dict classes can subtype PropertyDict to inherit the getproperty method.

If some dictionary types have this, and others don't, then a bit more care is required in generic code, which will have to use the explicit d[:foo] style.

This would first require full deprecation of $ as xor

@davidavdav We already did that in 0.6.

We can get away with it since it only affects implementers , but getfield(d, :keys) really is annoying. We should seriously consider d$keys for this, which would make using getproperty more palatable.

@JeffBezanson , do you mean having a$b as a shorthand for getfield(a, :b)? I'd like that, it would make implementations of types that define getproperty more readable. But then we'd also need something for setfield - would it be possible to use a$b = c for that?

I think it's okay (maybe even a good thing) that overloading getproperty is a little annoying for implementors, since this is a feature that should probably be used sparingly (since we don't want to encourage OOP-like spelling in general). In Base, we could always define _keys(d::Dict) = getfield(d, :keys) for usage in base/dict.jl.

I tried grepping registered packages for .keys. Unfortunately, a lot of packages define their own types with a "keys" field, so there are a lot of false positives. It looks like at least some packages access the keys field of a Dict, though, so one would have to think carefully about the deprecation process here.

The deprecation process could be having this in 0.7:

function getproperty(d::Dict, s::Symbol)
    # print deprecation warning
    getfield(d, s)
end

That warns people to change how they're accessing the real fields; then we change it to do lookup in 1.0 or 1.1 (it might be a good idea to have throw an error in 1.0 and add lookup in 1.1).

@davidavdav We already did that in 0.6.

Sorry, my bad, I used the wrong version of Julia to test this...

We can get away with it since it only affects implementers , but getfield(d, :keys) really is annoying. We should seriously consider d$keys for this, which would make using getproperty more palatable

Using d$keys as synonym for getfield(d, :keys) has the advantage of moving the less readable $ as identifier separator to the library side of the code base, freeing the . for getindex for Dict{Union{Symbol,AbstractString}, Any} as suggested in this Issue. I like that idea, as it gives access to the more readable . to the user.

In how far would it confuse the user, that . mostly boils down to getfield for most types, except for (certain) dicts types where it would mean getindex?

2¢: to my eye foo.bar is fast direct field access and foo["bar"] is a function call and a hash lookup and a loop with some string comparison. That's an important distinction when reviewing code for performance. If I see foo.length in repeated a few lines apart, I'll assume that the compiler will optimise that to only look up the field once (if it thinks that is faster). If I see foo["length"] repeated, I'll manually cache the value.

I think it is very useful when reading code to be able to distinguish between direct struct field access and syntactic sugar for a function call.

Dynamic scripting languages can afford to come down on the side of convenience here. But it doesn't feel right to me for Julia.

However, I can see the attraction of using . for a dynamic property interface. Perhaps if . is going to be sometimes an expensive operation, we need a new operator that only means: access the field of a struct.

a$b is available and has been proposed for that.

Perhaps if . is going to be sometimes an expensive operation, we need a new operator that only means: access the field of a struct.

We have getfield(x, :foo). I'm extremely skeptical that the need for this is widespread enough to justify using an ASCII character like x$foo, though, since x.foo is fast unless you overload getproperty (and even then it could be fast with constant propagation).

Adding an ASCII character for direct field access won't really solve the code-transparency problem, since most people will continue to use ..

If half of Julia users use . and half use $ because they think it will be "faster", then I would argue that this makes the readability situation worse in practice. Forcing people to use getfield for direct field access ensures they they will do this only if they really need it (i.e. if they have overloaded getproperty and need to circumvent it).

The main motivation for x$f is not that x.f might be slow, it's that once you overload x.f then any code in the definition of the type needs to use getfield(x, :f) everywhere which gets kind of annoying. x$f is at least a bit better in those situations. Doesn't need to be added now. The use of x$f should be considered de facto private and discouraged for code that isn't part of the type's implementation. I'm not sure I buy the whole x.f is fast and x["f"] is slow distinction. Sure, that's how it is now, but that's hardly an argument for never changing it. It's still perfectly fast if x is a plain old struct, which one generally knows.

it's that once you overload x.f then any code in the definition of the type needs to use getfield(x, :f) everywhere which gets kind of annoying.

You can always write an _f(x) = getfield(x, :f) accessor function to use internally in the type's methods when accessing a private field f.

I agree that x$f would be convenient here. The question is, is the need for this convenience great enough to justify devoting a precious ASCII character to it? My impression is that only a tiny minority of Julia code will ever need to explicitly call getfield. In the Julia base code, for example, getfield(x, :foo) is currently used only ~five times out of tens of thousands of function calls. This number will go up as we use getproperty more, but enough to be worth using $?

The inconvenience of getfield(x, :foo) is also a kind of feature: we don't want people to use getproperty profligately, defining lots of OOP-style interfaces for Julia just because they can. A little inconvenience for implementors will help motivate people to overload getproperty only when it is really a clear win over the usual Julia method(object, ...) style.

The main motivation for x$f is not that x.f might be slow, it's that once you overload x.f then any code in the definition of the type needs to use getfield(x, :f)

That is one concern, but could that be handled forbidding overloading a real field names?
That way, if I write a getproperty method for field x, I'll be forced to rename the field to _x and replace all my foo.xs with foo._x.

Another concern is readability and reviewability of code that uses an API with exposed fields.
In some APIs you'll have an abstract type Foo and a handful of documented methods, but Julia does not force every API to use opaque data structures. That makes sense for Julia, given the intention to me more than yet another dynamic scripting language. So we have APIs where a simple struct is exposed as part of the public interface, and users can immediately see that it is simple and has nothing to hide:

struct FooPoint
    x::Int
    y::Int
end
convert(::Type{AbstractPoint}, p::FooPoint) = ...

To me it seems desirable that to have a compact notation for clients to write v = a.x * a.x + a.y * a.y, and as a code reviewer, to know that I don't have to worry: "is getfield being called four times here? should I put x and y into local variables first? do I need to add a performance test to ensure that getfield for this type doesn't change into an expensive method in some later version of this library?

The inconvenience of getfield(x, :foo) is also a kind of feature

Yes, I think this is under-appreciated (until you're trying to implement or use something with this feature). If there's a second convenient syntax for getfield, it potentially becomes more confusing to decide which one to call (and harder to make the "right" choice). While right now, it's pretty easy to reach for the right one (1:nfields uses getfield, while everything else uses .) – except for the implementer who wants to expose this for some general class of types (which probably isn't something we want to encourage).

Here is a wrapper that adds getproperty support to any AbstractDict and avoids this problem of breaking dict internals field access. It supports dicts with AbstractString or Symbol keys, converting symbol keys to strings as needed.

> julia d = PropertyDict(Dict("foo" => 1, :bar => 2))
PropertyDict with 2 entries:
  :bar  => 2
  "foo" => 1
julia> d.foo, d.bar, d."foo"
(1, 2, 1)
julia> d."bar"
ERROR: KeyError: key "bar" not found
struct PropertyDict{K, V, T <: AbstractDict{K, V}} <: AbstractDict{K, V}
    d::T
    PropertyDict(d::T) where {T <: AbstractDict} =
        new{keytype(d), valtype(d), T}(d)
end

unwrap(d::PropertyDict) = getfield(d, :d)

Base.getproperty(d::PropertyDict, n) = getindex(d, n)
Base.getproperty(d::PropertyDict{AbstractString}, n::Symbol) = getindex(d, String(n))

function Base.getproperty(d::PropertyDict, n::Symbol)
    v = get(d, n, Base.secret_table_token)
    if v != Base.secret_table_token
        return v
    end
    return getindex(d, String(n))
end

Base.IteratorSize(::Type{PropertyDict{T}}) where T = IteratorSize(T)
Base.IteratorEltype(::Type{PropertyDict{T}}) where T = IteratorEltype(T)
Base.getindex(d::PropertyDict, i) = getindex(unwrap(d), i)
Base.get(d::PropertyDict, k, default) = get(unwrap(d), k, default)
Base.length(d::PropertyDict) = length(unwrap(d))
Base.start(d::PropertyDict) = start(unwrap(d))
Base.done(d::PropertyDict, i) = done(unwrap(d), i)
Base.next(d::PropertyDict, i) = next(unwrap(d), i)

I'm using this in LazyJSON to provide a more 1st-class feel to JSON data access:
https://github.com/samoconnor/LazyJSON.jl/blob/master/src/PropertyDicts.jl

Somewhat related to this, I wrote a small package to make it easier in Julia to deal with javascript-like objects, so that you could use @js config.params.training.init.nepochs as in the example above in order to dive deep into some object.

(I started this as an exercise for me to better understand macros in Julia)

I still rather wish we’d done this. Accessing the fields of a Dict is something you absolutely shouldn’t be doing, I’m not sure why we needed to preserve convenient syntax for it.

I still rather wish we’d done this.

Maybe in 1.2? :-)

Access to internals of Dict wouldn't count as a behavior that has to be preserved in the 1.x line, right?

I'm afraid it's probably too breaking to be considered. It could be experimented with, however—if it doesn't break any packages when PkgEval is done, it be considered a "minor change" (i.e. technically breaking but unlikely to break real world code).

I've done a little grepping of registered packages for usage of internal Dict fields. Looks like it would break JLD.jl, but I haven't found any other examples so far.

For what it's worth, I have native Julia property access to javascript-like objects now in https://github.com/davidavdav/JSObjectLiteral.jl#the-jsobject-struct, the implementation turns out to be quite similar to @samoconnor's PropertyDicts (but less complete so far...)

I've done a little grepping of registered packages for usage of internal Dict fields. Looks like it would break JLD.jl, but I haven't found any other examples so far.

So there's hope, then? :-)

This could potentially be done for a 1.x release, if not, it would be nice in 2.0.

Maybe in Julia v1.3?

Sure, a PR would help.

StatsBase.jl accesses dictionary internals here and here.

Based on some informal testing that I've done, the access to dictionary internals in StatsBase.jl allows their countmap function to be ~3x faster than the naive implementation that I wrote up.

@CameronBieganek: you can still do this via getfield. (Thus StatsBase would need to be updated to work with this proposal).

@CameronBieganek: you can still do this via getfield. (Thus StatsBase would need to be updated to work with this proposal).

Yeah, I understand that. I'm just providing more evidence that this would be an actual breaking change.

I think an easy solution to this problem is to just define

Base.NamedTuple(d::Dict{Symbol, Any}) = NamedTuple{Tuple(keys(d))}(values(d))

How does the core team feel about this - would this be acceptable for 1.6, or should it (if done at all) wait for 2.0?

I'm not that much of a fan. Did I miss a compelling justification for this feature? Otherwise it seems like we're introducing an inconsistent interface in order to avoid typing 3 characters.

I guess an interesting use case would be nested dicts (e.g. for config data, etc.) With nests dicts,

config.some.property.of.interest

would be more readable and typeable than

config["some"]["property"]["of"]["interest"]

For me it's not about avoiding typing, it's about readability. Readability is paramount. I think the example from @oschulz pretty clearly demonstrates the superior readability of dict.field syntax.

A good first step here would be to create a package, say, SymbolicDicts.jl, which provided this functionality. I could imagine SymbolicDict being a wrapper around other Dict types, similar to the OrderedDict implementation in OrderedCollections.jl.

That seems like a pretty small win (a small gain in readability) in a specific circumstance.

If all Dicts were symbol dicts, I might be more supportive. I do not look forward to the day when we start seeing people try to write:

obj = MyObj(7, "hello", rand(8))
d.$obj = nothing

and then complaining that it doesn't work.

EDIT: seeing @mauro3's "confused" emoji, the point is that d[x] works for any object, whereas with this proposal d.x works only if x::Symbol. You could imagine people wanting it to work for other kinds of keys, and then you get into how-to-distinguish-the-variable-from-its-name problems, and hence you want interpolation.

A good first step here would be to create a package, say, SymbolicDicts.jl

I do have a package like that: https://github.com/oschulz/PropDicts.jl

(for details, see https://oschulz.github.io/PropDicts.jl/stable/api/#PropDicts.PropDict)

With PropDicts.jl:

x = PropDict(:a => PropDict(:b => 7, :c => 5, :d => 2), :e => "foo")
x.a.b == 7

FWIW, I'm trying to implement this in a generic way at https://github.com/joshday/PropertyUtils.jl#indexes, with the idea being that you can map get/setproperty to get/set[index/field], e.g.

d = Dict(:x => 1, :y => 2)

id = indexes(d)

id.x == 1

id.z = 3

d[:z] == 3
Was this page helpful?
0 / 5 - 0 ratings