When you write:
x[idx] .= z
it is processed as:
Base.materialize!(Base.dotview(x, idx), Base.broadcasted(Base.identity, z))
which allows packages to give a special meaning toBase.dotview(x, idx)
, which is important if they want to define types that are broadcasting aware (as it allows to specialize the behavior of dotview
).
On the other hand:
x.field .= z
is processed as:
Base.materialize!(Base.getproperty(x, :field), Base.broadcasted(Base.identity, z))
which does not allow to introduce a special broadcasting behavior in this case, as we call just getproperty
(this is as if we called getindex
instead of dotview
in the first example).
The solution would be to use some intermediate function (name is tentative) with a defualt definition:
maybegetproperty(x, y) = getproperty(x, y)
that could get a special implementation in packages if needed.
A concrete use case is in DataFrames.jl, which allows:
df[:, :newcol] .= 1
to add a new column to the df
DataFrame
, while:
df.newcol .= 1
errors if :newcol
does not exist in df
.
@tkf you had a more more general idea what could be done here. Can you please comment on this?
Interesting. The reason-for-being for dotview
wasn't package extension, but rather the sometimes-scalar sometimes-nonscalar aspects of indexing. That's why getproperty
wasn't similarly special-cased. That said, I fully appreciate the motivation here.
You can already specialize Base.getproperty
to return some custom object upon which you can dispatch Base.materialize!
, why wouldn't that work in your example?
@stev47 - as @mbauman commented: broadcasting allows x[...]
to have a context dependent meaning on LHS (because it was required even in Base, otherwise dotview
would not be needed). On the other hand, it was not needed in Base for x.y
to have a context dependent meaning, but in general it can be needed in packages (and it is in DataFrames.jl).
In particular - have a look at the example above. df.x
should error if there is no :x
column in data frame df
. On the other hand, we want df.x .= 1
to work and create a new column :x
filled with as many 1
s as is the value of size(df, 1)
.
I see, so the proposal is to allow a.x .= b
to be different from c = a.x; c .= b
.
One doesn't usually associate field access with context dependent broadcasting semantics, so the user might be confused by this (in the array case the syntax is about indexing and well-known for mutable access). Your example makes sense though if you absolutely want to avoid something more explicit like e.g. a macro @new df.x .= 1
.
the user might be confused by this
This is what we thought as developers of DataFrames.jl, but in practice users report that they are confused that df.x .= 1
does not work, when :x
is missing in a data frame (actually this is the question that is one of the most frequently asked). This is especially confusing since df.x = [1,2,3]
works and is a normal method to add a column to a data frame using setindex!
.
This seems simple enough to me and should be entirely nonbreaking. I think it's worth doing.
I can't say I've mastered the ins and outs of indexing in DataFrames.jl, but this seems too magical to me. How can you do in-place assignment to a column that doesn't exist yet? If anything, I would argue that the better way to resolve the mentioned inconsistency is to make df[:, :newcol] .= 1
throw an error.
See https://github.com/JuliaData/DataFrames.jl/pull/1961 for a long discussion on this. Initially the design disallowed it. The main reason for allowing it is that people find it natural in generic code to write df[:, col_name_as_variable] .= 1
and expect it to work without error without having to check if a column to with col_name_as_variable
points exists or not.
In general, the main reason for request for df.newcol .= 1
is not to "resolve inconsistency", but rather that many people find pattern df.newcol .= 1
the most natural to use (even without thinking about df[:, :newcol] .= 1
).
From triage: we might want this to return a lazy object, where the materialize!
method for that object creates the field value if it doesn't exist, does the broadcast, and then finally assigns the field. This lazy object would essentially be a "lens", and perhaps @tkf has an opinion.
Triage is :+1: on the general idea.
Thank you!
we might want this to return a lazy object, where the
materialize!
method for that object creates the field value if it doesn't exist, does the broadcast, and then finally assigns the field.
This is exactly how DataFrames.jl works now.
The decision is if this should be the last call in a chain that is lazy or some more general mechanism - as this is what @tkf proposed.
Most helpful comment
This is what we thought as developers of DataFrames.jl, but in practice users report that they are confused that
df.x .= 1
does not work, when:x
is missing in a data frame (actually this is the question that is one of the most frequently asked). This is especially confusing sincedf.x = [1,2,3]
works and is a normal method to add a column to a data frame usingsetindex!
.