From the vignette Introduction to data.table:
As long as
j-expression returns alist, each element of the list will be converted to a column in the resultingdata.table.
When there鈥檚 only one column or expression to refer to in
jandby, we can drop the.()notation. This is purely for convenience.
In ?data.table, the first sentence appears in the Arguments section on j. However, the second piece of information does not appear here. Furthermore, in Details, the drop of .() is only explicitly described for by: ".() can be omitted in by on single expression for convenience"
I suggest that the part "When there鈥檚 only one column or expression to refer to in j [...], we can drop the .() notation. This is purely for convenience." is added also to the help text in Arguments section about j.
Note however that without .() or list() the result is returned as a vector rather than a 1-column data.table
Indeed! I should have been clearer on the background of my thoughts. Consider the two examples in the Details section of ?data.table:
X[, .(sum(a)), by=c] # get sum(a) grouped by 'c'.
X[, sum(a), by=c] # same as above, .() can be omitted in by on single expression for convenience
I interpreted the ".() can be omitted in by on single expression" as whether to wrap by in .() or not. However, we see that the difference between the two lines of code is in fact that .() has been omitted in j (not by...). A typo? Anyway, this made me start looking for a reference in ?data.table, similar to the one in the vignette.
In the help text, the description of how variables and expressions can be specified in both j, by and on is very nice and thorough. Furthermore, it is mentioned explicitly in several places that when _not_ wrapping j in list, the result is a _vector_ (as also noted by you). This contrasts with the examples where j is not wrapped in list and returns a data.table, namely together with by, examples without further explanation. This behavior is perhaps too obvious - how else should the result be returned we may think? Still, I believe it's better to be explicit in the help text.
Given the thorough treatment of all other possible ways to specify j (and on and by), I think the "j-without-list-together-with-by" deserves a few more words, at least for consistency.
And apologize for _me_ not being more explicit in my first post.
Cheers
There are also cases where the list-wrapping is obligatory, when you want to create a list column:
library(data.table)
DT = data.table(id = 1:3)
x = 3 # to be repeated on every row
v = list(4) # to be repeated on every row
DT[, x := ..x] # works fine
DT[, v := ..v] # nope, not a list column
DT[, v2 := list(..v)] # yep
Seems to work fine with by= though:
DT = data.table(id = 1:3)
x = 3 # to be repeated in every group
v = list(4) # to be repeated in every group
DT[, x, by=id] # ok
DT[, v, by=id] # ok
though it does not like the .. prefix here (since it thinks I'm selecting column 3 or 4, apparently).
@henrik-p do let us know if the change looks good. thanks for filing!
Most helpful comment
Indeed! I should have been clearer on the background of my thoughts. Consider the two examples in the Details section of
?data.table:I interpreted the "
.()can be omitted inbyon single expression" as whether to wrapbyin.()or not. However, we see that the difference between the two lines of code is in fact that.()has been omitted inj(notby...). A typo? Anyway, this made me start looking for a reference in?data.table, similar to the one in the vignette.In the help text, the description of how variables and expressions can be specified in both
j,byandonis very nice and thorough. Furthermore, it is mentioned explicitly in several places that when _not_ wrappingjinlist, the result is a _vector_ (as also noted by you). This contrasts with the examples wherejis not wrapped inlistand returns adata.table, namely together withby, examples without further explanation. This behavior is perhaps too obvious - how else should the result be returned we may think? Still, I believe it's better to be explicit in the help text.Given the thorough treatment of all other possible ways to specify
j(andonandby), I think the "j-without-list-together-with-by" deserves a few more words, at least for consistency.And apologize for _me_ not being more explicit in my first post.
Cheers