Data.table: fill for DT[i, :=]

Created on 15 Mar 2018 · 5Comments · Source: Rdatatable/data.table

> DT = data.table(A=LETTERS[1:5])
> DT[ A %in% c("C","E"), val:=42][]
        A   val
   <char> <num>
1:      A    NA
2:      B    NA
3:      C    42
4:      D    NA
5:      E    42

> DT[ A %in% c("C","E"), val:=42, fill=0][]       # current behaviour
Error in `[.data.table`(DT, A %in% c("C", "E"), `:=`(val, 42), fill = 0) : 
  unused argument (fill = 0)

> DT[ A %in% c("C","E"), val:=42, fill=0][]       # desired behaviour
        A   val
   <char> <num>
1:      A     0
2:      B     0
3:      C    42
4:      D     0
5:      E    42

duplicate feature request

Source

mattdowle

👍3

Most helpful comment

I guess you will want to generalize to accepting a named list, similar to what dplyr/tidyr has in its many functions' fill= args (none of their names comes to mind...).

Example from SO with i and multiple columns in :=: https://stackoverflow.com/a/49283432/

I'm thinking...

library(data.table)
DT = data.table(id = 1:3)
mDT = data.table(id = 1L, v = 2, x = 3)
defaults = list(v = 0, x = 0)

# current syntax
DT[, names(defaults) := defaults ]
DT[mDT, on=.(id), `:=`(v = i.v, x = i.x)]

# desired
DT[mDT, on=.(id), `:=`(v = i.v, x = i.x), fill = defaults]

If you do go that way, there's the similar case of shift as well: shift(data.table(a = 1:2, b = 3:4), fill = list(a = 11, b = 12)).

New example from SO: https://stackoverflow.com/q/51673607/

library(data.table)
a <- data.table(Test=1:4, TestA=5:6)
b <- data.table(TEST=1:10, TestB=11:20)

# current syntax
defaults = list(Test = 0L, TestA = 0L)
new_cols = names(defaults)
b[, (new_cols) := defaults]
b[a, on=.(TEST = Test), (new_cols) := mget(sprintf("i.%s", new_cols))]

# desired syntax
defaults = list(Test = 0L, TestA = 0L)
new_cols = names(defaults)
b[a, on=.(TEST = Test), (new_cols) := mget(sprintf("i.%s", new_cols)), fill = defaults]

franknarf1 on 15 Mar 2018

👍3

All 5 comments

I guess you will want to generalize to accepting a named list, similar to what dplyr/tidyr has in its many functions' fill= args (none of their names comes to mind...).

Example from SO with i and multiple columns in :=: https://stackoverflow.com/a/49283432/

I'm thinking...

library(data.table)
DT = data.table(id = 1:3)
mDT = data.table(id = 1L, v = 2, x = 3)
defaults = list(v = 0, x = 0)

# current syntax
DT[, names(defaults) := defaults ]
DT[mDT, on=.(id), `:=`(v = i.v, x = i.x)]

# desired
DT[mDT, on=.(id), `:=`(v = i.v, x = i.x), fill = defaults]

If you do go that way, there's the similar case of shift as well: shift(data.table(a = 1:2, b = 3:4), fill = list(a = 11, b = 12)).

New example from SO: https://stackoverflow.com/q/51673607/

library(data.table)
a <- data.table(Test=1:4, TestA=5:6)
b <- data.table(TEST=1:10, TestB=11:20)

# current syntax
defaults = list(Test = 0L, TestA = 0L)
new_cols = names(defaults)
b[, (new_cols) := defaults]
b[a, on=.(TEST = Test), (new_cols) := mget(sprintf("i.%s", new_cols))]

# desired syntax
defaults = list(Test = 0L, TestA = 0L)
new_cols = names(defaults)
b[a, on=.(TEST = Test), (new_cols) := mget(sprintf("i.%s", new_cols)), fill = defaults]

franknarf1 on 15 Mar 2018

👍3

Couldn't we use the already existing nomatch argument here? It'd help make nomatch to be not specific to joins. Just a thought ...

arunsrinivasan on 19 Mar 2018

👍2

exactly, #857

jangorecki on 20 Mar 2018

Concerning @arunsrinivasan's proposal: this might interfere with the new optimized subsetting implementation, where subsets in i are redirected to joins and nomatch is set to 0L, assuming that nomatch has no significance outside joins and can be altered automatically. While this can be changed, it should be considered when implementing nomatch for non-join operations.