Data.table: [Documentation] list/.() can be omitted in j on single column/expression

Created on 15 Nov 2018  路  4Comments  路  Source: Rdatatable/data.table

From the vignette Introduction to data.table:

As long as j-expression returns a list, each element of the list will be converted to a column in the resulting data.table.

When there鈥檚 only one column or expression to refer to in j and by, we can drop the .() notation. This is purely for convenience.

In ?data.table, the first sentence appears in the Arguments section on j. However, the second piece of information does not appear here. Furthermore, in Details, the drop of .() is only explicitly described for by: ".() can be omitted in by on single expression for convenience"

I suggest that the part "When there鈥檚 only one column or expression to refer to in j [...], we can drop the .() notation. This is purely for convenience." is added also to the help text in Arguments section about j.

Most helpful comment

Indeed! I should have been clearer on the background of my thoughts. Consider the two examples in the Details section of ?data.table:

X[, .(sum(a)), by=c]        # get sum(a) grouped by 'c'.
X[, sum(a), by=c]           # same as above, .() can be omitted in by on single expression for convenience

I interpreted the ".() can be omitted in by on single expression" as whether to wrap by in .() or not. However, we see that the difference between the two lines of code is in fact that .() has been omitted in j (not by...). A typo? Anyway, this made me start looking for a reference in ?data.table, similar to the one in the vignette.

In the help text, the description of how variables and expressions can be specified in both j, by and on is very nice and thorough. Furthermore, it is mentioned explicitly in several places that when _not_ wrapping j in list, the result is a _vector_ (as also noted by you). This contrasts with the examples where j is not wrapped in list and returns a data.table, namely together with by, examples without further explanation. This behavior is perhaps too obvious - how else should the result be returned we may think? Still, I believe it's better to be explicit in the help text.

Given the thorough treatment of all other possible ways to specify j (and on and by), I think the "j-without-list-together-with-by" deserves a few more words, at least for consistency.

And apologize for _me_ not being more explicit in my first post.

Cheers

All 4 comments

Note however that without .() or list() the result is returned as a vector rather than a 1-column data.table

Indeed! I should have been clearer on the background of my thoughts. Consider the two examples in the Details section of ?data.table:

X[, .(sum(a)), by=c]        # get sum(a) grouped by 'c'.
X[, sum(a), by=c]           # same as above, .() can be omitted in by on single expression for convenience

I interpreted the ".() can be omitted in by on single expression" as whether to wrap by in .() or not. However, we see that the difference between the two lines of code is in fact that .() has been omitted in j (not by...). A typo? Anyway, this made me start looking for a reference in ?data.table, similar to the one in the vignette.

In the help text, the description of how variables and expressions can be specified in both j, by and on is very nice and thorough. Furthermore, it is mentioned explicitly in several places that when _not_ wrapping j in list, the result is a _vector_ (as also noted by you). This contrasts with the examples where j is not wrapped in list and returns a data.table, namely together with by, examples without further explanation. This behavior is perhaps too obvious - how else should the result be returned we may think? Still, I believe it's better to be explicit in the help text.

Given the thorough treatment of all other possible ways to specify j (and on and by), I think the "j-without-list-together-with-by" deserves a few more words, at least for consistency.

And apologize for _me_ not being more explicit in my first post.

Cheers

There are also cases where the list-wrapping is obligatory, when you want to create a list column:

library(data.table)
DT = data.table(id = 1:3)
x = 3 # to be repeated on every row
v = list(4) # to be repeated on every row
DT[, x := ..x] # works fine
DT[, v := ..v] # nope, not a list column
DT[, v2 := list(..v)] # yep

Seems to work fine with by= though:

DT = data.table(id = 1:3)
x = 3 # to be repeated in every group
v = list(4) # to be repeated in every group
DT[, x, by=id] # ok
DT[, v, by=id] # ok

though it does not like the .. prefix here (since it thinks I'm selecting column 3 or 4, apparently).

@henrik-p do let us know if the change looks good. thanks for filing!

Was this page helpful?
0 / 5 - 0 ratings

Related issues

st-pasha picture st-pasha  路  3Comments

mattdowle picture mattdowle  路  3Comments

andschar picture andschar  路  3Comments

DavidArenburg picture DavidArenburg  路  3Comments

jameslamb picture jameslamb  路  3Comments