Spotted during db-benchmark's benchplot function.
Must be grouping/subsetting by at least 2 columns, could not reproduce on single column grouping.
library(data.table) ## 1.12.9
d = data.table(f1=c("a","b"), f2=c("x","y"))
l = list(a = c(x = "ax", y = "ay"), b = c(x = "bx", y = "by"))
copy(d)[, "out" := as.list(l[[f1]])[[f2]], by=c("f1","f2")][]
#Error in l[[f1]] : subscript out of bounds
copy(d)[, "out" := {as.list(l[[f1]])[[f2]]}, by=c("f1","f2")][]
# f1 f2 out
#1: a x ax
#2: b y by
library(data.table) ## 1.12.8
d = data.table(f1=c("a","b"), f2=c("x","y"))
l = list(a = c(x = "ax", y = "ay"), b = c(x = "bx", y = "by"))
copy(d)[, "out" := as.list(l[[f1]])[[f2]], by=c("f1","f2")][]
# f1 f2 out
#1: a x ax
#2: b y by
git bisect shows it's my fault :)
5833fd9a57696bee8fe6548924810c57b1695f05
You will have to teach me biesect :) Thanks for so prompt fix.
Make a small script:
# run_test.R
library(data.table) ## 1.12.9
d = data.table(f1=c("a","b"), f2=c("x","y"))
l = list(a = c(x = "ax", y = "ay"), b = c(x = "bx", y = "by"))
copy(d)[, "out" := as.list(l[[f1]])[[f2]], by=c("f1","f2")][]
from master:
# begin
git bisect start
# declare master is "bad"
git bisect bad
# declare 1.12.8 tag is "good"
git bisect good 1.12.8
# now it picks the commit halfway in between "bad" and "good"
Now repeat these steps until it finishes:
# install this branch & run the test:
R CMD INSTALL . && Rscript run_test.R
# inspect output then decide if this commit is "good" or "bad" (the $OUTCOME)
git bisect $OUTCOME
That's about it. More advanced is using git bisect --term-bad=fixed --term-good=broken, this is in the case when we inadvertently fixed an issue at some point -- git bisect requires bad to come after good, so you have to rename bad and good if it's the opposite (or keep track mentally of the switch).
There's also a way to automate it so that it will run a script and determine bad/good automatically based on the outcome. Haven't quite gotten there yet.
Oh cool, just learned something else that would help:
https://git-scm.com/docs/git-bisect
git bisect start -- R/data.table.R
would skip any commits that didn't touch R/data.table.R
OK I got the automated version to run in a few minutes:
# test.R
library(data.table) ## 1.12.9
d = data.table(f1=c("a","b"), f2=c("x","y"))
l = list(a = c(x = "ax", y = "ay"), b = c(x = "bx", y = "by"))
copy(d)[, "out" := as.list(l[[f1]])[[f2]], by=c("f1","f2")][]
# run_test.sh
R CMD INSTALL .
Rscript test.R
Then we just do:
git bisect start master 1.12.8 -- R/data.table.R
git bisect run sh run_test.sh
eventually I get:
5833fd9a57696bee8fe6548924810c57b1695f05 is the first bad commit
This is very useful!
should be much faster by replacing R CMD INSTALL with cc() inside the script, then it only recompiles files that have changed. I already added quiet argument to cc in cbindlist PR.
Most helpful comment
OK I got the automated version to run in a few minutes:
Then we just do:
eventually I get: