Data.table: assign from list by group (2+) requires curly brackets

Created on 3 May 2020  路  6Comments  路  Source: Rdatatable/data.table

Spotted during db-benchmark's benchplot function.
Must be grouping/subsetting by at least 2 columns, could not reproduce on single column grouping.

library(data.table) ## 1.12.9
d = data.table(f1=c("a","b"), f2=c("x","y"))
l = list(a = c(x = "ax", y = "ay"), b = c(x = "bx", y = "by"))
copy(d)[, "out" := as.list(l[[f1]])[[f2]], by=c("f1","f2")][]
#Error in l[[f1]] : subscript out of bounds
copy(d)[, "out" := {as.list(l[[f1]])[[f2]]}, by=c("f1","f2")][]
#   f1 f2 out
#1:  a  x  ax
#2:  b  y  by
library(data.table) ## 1.12.8
d = data.table(f1=c("a","b"), f2=c("x","y"))
l = list(a = c(x = "ax", y = "ay"), b = c(x = "bx", y = "by"))
copy(d)[, "out" := as.list(l[[f1]])[[f2]], by=c("f1","f2")][]
#   f1 f2 out
#1:  a  x  ax
#2:  b  y  by
dev regression

Most helpful comment

OK I got the automated version to run in a few minutes:

# test.R
library(data.table) ## 1.12.9
d = data.table(f1=c("a","b"), f2=c("x","y"))
l = list(a = c(x = "ax", y = "ay"), b = c(x = "bx", y = "by"))
copy(d)[, "out" := as.list(l[[f1]])[[f2]], by=c("f1","f2")][]
# run_test.sh
R CMD INSTALL .
Rscript test.R

Then we just do:

git bisect start master 1.12.8 -- R/data.table.R
git bisect run sh run_test.sh

eventually I get:

5833fd9a57696bee8fe6548924810c57b1695f05 is the first bad commit

All 6 comments

git bisect shows it's my fault :)

5833fd9a57696bee8fe6548924810c57b1695f05

You will have to teach me biesect :) Thanks for so prompt fix.

Make a small script:

# run_test.R
library(data.table) ## 1.12.9
d = data.table(f1=c("a","b"), f2=c("x","y"))
l = list(a = c(x = "ax", y = "ay"), b = c(x = "bx", y = "by"))
copy(d)[, "out" := as.list(l[[f1]])[[f2]], by=c("f1","f2")][]

from master:

# begin
git bisect start
# declare master is "bad"
git bisect bad 
# declare 1.12.8 tag is "good"
git bisect good 1.12.8
# now it picks the commit halfway in between "bad" and "good"

Now repeat these steps until it finishes:

# install this branch & run the test:
R CMD INSTALL . && Rscript run_test.R
# inspect output then decide if this commit is "good" or "bad" (the $OUTCOME)
git bisect $OUTCOME

That's about it. More advanced is using git bisect --term-bad=fixed --term-good=broken, this is in the case when we inadvertently fixed an issue at some point -- git bisect requires bad to come after good, so you have to rename bad and good if it's the opposite (or keep track mentally of the switch).

There's also a way to automate it so that it will run a script and determine bad/good automatically based on the outcome. Haven't quite gotten there yet.

Oh cool, just learned something else that would help:

https://git-scm.com/docs/git-bisect

git bisect start -- R/data.table.R

would skip any commits that didn't touch R/data.table.R

OK I got the automated version to run in a few minutes:

# test.R
library(data.table) ## 1.12.9
d = data.table(f1=c("a","b"), f2=c("x","y"))
l = list(a = c(x = "ax", y = "ay"), b = c(x = "bx", y = "by"))
copy(d)[, "out" := as.list(l[[f1]])[[f2]], by=c("f1","f2")][]
# run_test.sh
R CMD INSTALL .
Rscript test.R

Then we just do:

git bisect start master 1.12.8 -- R/data.table.R
git bisect run sh run_test.sh

eventually I get:

5833fd9a57696bee8fe6548924810c57b1695f05 is the first bad commit

This is very useful!
should be much faster by replacing R CMD INSTALL with cc() inside the script, then it only recompiles files that have changed. I already added quiet argument to cc in cbindlist PR.

Was this page helpful?
0 / 5 - 0 ratings