Julia: Allow x.y .= z not to use getproperty(x, y)

Created on 21 Jul 2020  路  11Comments  路  Source: JuliaLang/julia

When you write:

x[idx] .= z

it is processed as:

Base.materialize!(Base.dotview(x, idx), Base.broadcasted(Base.identity, z))

which allows packages to give a special meaning toBase.dotview(x, idx), which is important if they want to define types that are broadcasting aware (as it allows to specialize the behavior of dotview).

On the other hand:

x.field .= z

is processed as:

Base.materialize!(Base.getproperty(x, :field), Base.broadcasted(Base.identity, z))

which does not allow to introduce a special broadcasting behavior in this case, as we call just getproperty (this is as if we called getindex instead of dotview in the first example).

The solution would be to use some intermediate function (name is tentative) with a defualt definition:

maybegetproperty(x, y) = getproperty(x, y)

that could get a special implementation in packages if needed.


A concrete use case is in DataFrames.jl, which allows:

df[:, :newcol] .= 1

to add a new column to the df DataFrame, while:

df.newcol .= 1

errors if :newcol does not exist in df.


@tkf you had a more more general idea what could be done here. Can you please comment on this?

broadcast

Most helpful comment

the user might be confused by this

This is what we thought as developers of DataFrames.jl, but in practice users report that they are confused that df.x .= 1 does not work, when :x is missing in a data frame (actually this is the question that is one of the most frequently asked). This is especially confusing since df.x = [1,2,3] works and is a normal method to add a column to a data frame using setindex!.

All 11 comments

Interesting. The reason-for-being for dotview wasn't package extension, but rather the sometimes-scalar sometimes-nonscalar aspects of indexing. That's why getproperty wasn't similarly special-cased. That said, I fully appreciate the motivation here.

You can already specialize Base.getproperty to return some custom object upon which you can dispatch Base.materialize!, why wouldn't that work in your example?

@stev47 - as @mbauman commented: broadcasting allows x[...] to have a context dependent meaning on LHS (because it was required even in Base, otherwise dotview would not be needed). On the other hand, it was not needed in Base for x.y to have a context dependent meaning, but in general it can be needed in packages (and it is in DataFrames.jl).

In particular - have a look at the example above. df.x should error if there is no :x column in data frame df. On the other hand, we want df.x .= 1 to work and create a new column :x filled with as many 1s as is the value of size(df, 1).

I see, so the proposal is to allow a.x .= b to be different from c = a.x; c .= b.
One doesn't usually associate field access with context dependent broadcasting semantics, so the user might be confused by this (in the array case the syntax is about indexing and well-known for mutable access). Your example makes sense though if you absolutely want to avoid something more explicit like e.g. a macro @new df.x .= 1.

the user might be confused by this

This is what we thought as developers of DataFrames.jl, but in practice users report that they are confused that df.x .= 1 does not work, when :x is missing in a data frame (actually this is the question that is one of the most frequently asked). This is especially confusing since df.x = [1,2,3] works and is a normal method to add a column to a data frame using setindex!.

This seems simple enough to me and should be entirely nonbreaking. I think it's worth doing.

I can't say I've mastered the ins and outs of indexing in DataFrames.jl, but this seems too magical to me. How can you do in-place assignment to a column that doesn't exist yet? If anything, I would argue that the better way to resolve the mentioned inconsistency is to make df[:, :newcol] .= 1 throw an error.

See https://github.com/JuliaData/DataFrames.jl/pull/1961 for a long discussion on this. Initially the design disallowed it. The main reason for allowing it is that people find it natural in generic code to write df[:, col_name_as_variable] .= 1 and expect it to work without error without having to check if a column to with col_name_as_variable points exists or not.

In general, the main reason for request for df.newcol .= 1 is not to "resolve inconsistency", but rather that many people find pattern df.newcol .= 1 the most natural to use (even without thinking about df[:, :newcol] .= 1).

From triage: we might want this to return a lazy object, where the materialize! method for that object creates the field value if it doesn't exist, does the broadcast, and then finally assigns the field. This lazy object would essentially be a "lens", and perhaps @tkf has an opinion.

Triage is :+1: on the general idea.

Thank you!

we might want this to return a lazy object, where the materialize! method for that object creates the field value if it doesn't exist, does the broadcast, and then finally assigns the field.

This is exactly how DataFrames.jl works now.

The decision is if this should be the last call in a chain that is lazy or some more general mechanism - as this is what @tkf proposed.

Was this page helpful?
0 / 5 - 0 ratings

Related issues

manor picture manor  路  3Comments

felixrehren picture felixrehren  路  3Comments

yurivish picture yurivish  路  3Comments

omus picture omus  路  3Comments

wilburtownsend picture wilburtownsend  路  3Comments