Data.table: fill for DT[i, :=]

Created on 15 Mar 2018  路  5Comments  路  Source: Rdatatable/data.table

> DT = data.table(A=LETTERS[1:5])
> DT[ A %in% c("C","E"), val:=42][]
        A   val
   <char> <num>
1:      A    NA
2:      B    NA
3:      C    42
4:      D    NA
5:      E    42

> DT[ A %in% c("C","E"), val:=42, fill=0][]       # current behaviour
Error in `[.data.table`(DT, A %in% c("C", "E"), `:=`(val, 42), fill = 0) : 
  unused argument (fill = 0)

> DT[ A %in% c("C","E"), val:=42, fill=0][]       # desired behaviour
        A   val
   <char> <num>
1:      A     0
2:      B     0
3:      C    42
4:      D     0
5:      E    42
duplicate feature request

Most helpful comment

I guess you will want to generalize to accepting a named list, similar to what dplyr/tidyr has in its many functions' fill= args (none of their names comes to mind...).

Example from SO with i and multiple columns in :=: https://stackoverflow.com/a/49283432/

I'm thinking...

library(data.table)
DT = data.table(id = 1:3)
mDT = data.table(id = 1L, v = 2, x = 3)
defaults = list(v = 0, x = 0)

# current syntax
DT[, names(defaults) := defaults ]
DT[mDT, on=.(id), `:=`(v = i.v, x = i.x)]

# desired
DT[mDT, on=.(id), `:=`(v = i.v, x = i.x), fill = defaults]

If you do go that way, there's the similar case of shift as well: shift(data.table(a = 1:2, b = 3:4), fill = list(a = 11, b = 12)).


New example from SO: https://stackoverflow.com/q/51673607/

library(data.table)
a <- data.table(Test=1:4, TestA=5:6)
b <- data.table(TEST=1:10, TestB=11:20)

# current syntax
defaults = list(Test = 0L, TestA = 0L)
new_cols = names(defaults)
b[, (new_cols) := defaults]
b[a, on=.(TEST = Test), (new_cols) := mget(sprintf("i.%s", new_cols))]

# desired syntax
defaults = list(Test = 0L, TestA = 0L)
new_cols = names(defaults)
b[a, on=.(TEST = Test), (new_cols) := mget(sprintf("i.%s", new_cols)), fill = defaults]

All 5 comments

I guess you will want to generalize to accepting a named list, similar to what dplyr/tidyr has in its many functions' fill= args (none of their names comes to mind...).

Example from SO with i and multiple columns in :=: https://stackoverflow.com/a/49283432/

I'm thinking...

library(data.table)
DT = data.table(id = 1:3)
mDT = data.table(id = 1L, v = 2, x = 3)
defaults = list(v = 0, x = 0)

# current syntax
DT[, names(defaults) := defaults ]
DT[mDT, on=.(id), `:=`(v = i.v, x = i.x)]

# desired
DT[mDT, on=.(id), `:=`(v = i.v, x = i.x), fill = defaults]

If you do go that way, there's the similar case of shift as well: shift(data.table(a = 1:2, b = 3:4), fill = list(a = 11, b = 12)).


New example from SO: https://stackoverflow.com/q/51673607/

library(data.table)
a <- data.table(Test=1:4, TestA=5:6)
b <- data.table(TEST=1:10, TestB=11:20)

# current syntax
defaults = list(Test = 0L, TestA = 0L)
new_cols = names(defaults)
b[, (new_cols) := defaults]
b[a, on=.(TEST = Test), (new_cols) := mget(sprintf("i.%s", new_cols))]

# desired syntax
defaults = list(Test = 0L, TestA = 0L)
new_cols = names(defaults)
b[a, on=.(TEST = Test), (new_cols) := mget(sprintf("i.%s", new_cols)), fill = defaults]

Couldn't we use the already existing nomatch argument here? It'd help make nomatch to be not specific to joins. Just a thought ...

exactly, #857

Concerning @arunsrinivasan's proposal: this might interfere with the new optimized subsetting implementation, where subsets in i are redirected to joins and nomatch is set to 0L, assuming that nomatch has no significance outside joins and can be altered automatically. While this can be changed, it should be considered when implementing nomatch for non-join operations.

Closing as duplicate. nomatch arg is exactly about that.

Was this page helpful?
0 / 5 - 0 ratings