allow.cartesian is ignored
library(data.table)
d1 = data.table(a=c(1L,1L), b=2:3)
d2 = data.table(a=c(1L,1L), d=3:2)
merge(d1, d2, by="a", allow.cartesian=FALSE)
# a b d
#1: 1 2 3
#2: 1 2 2
#3: 1 3 3
#4: 1 3 2
This is consistent with the documentation, since nrow(result) <= nrow(x) + nrow(i), right?
From ?data.table:
FALSE prevents joins that would result in more than nrow(x)+nrow(i) rows. This is usually caused by duplicate values in i's join columns, each of which join to the same group in 'x' over and over again: a misspecified join. Usually this was not intended and the join needs to be changed. The word 'cartesian' is used loosely in this context. The traditional cartesian join is (deliberately) difficult to achieve in data.table: where every row in i joins to every row in x (a nrow(x)*nrow(i) row result). 'cartesian' is just meant in a 'large multiplicative' sense.
Maybe clearer edit:
FALSE prevents joins that would result in more than nrow(x)+nrow(i) rows. This is usually caused by duplicate values in i's join columns, each of which join to the same group in 'x' over and over again: a misspecified join. Usually this was not intended and the join needs to be changed. The word 'cartesian' is used loosely in this context. The traditional cartesian join is (deliberately) difficult to achieve in data.table: where every row in i joins to every row in x (a nrow(x)*nrow(i) row result). 'cartesian' is just meant in a 'large multiplicative' sense , so FALSE does not always prevent a traditional cartesian join.
Related: https://github.com/Rdatatable/data.table/issues/2837 , https://github.com/Rdatatable/data.table/issues/2879#issuecomment-389162440
correct, thanks @franknarf1
library(data.table)
d1 = data.table(a=c(1L,1L,1L), b=2:4)
d2 = data.table(a=c(1L,1L,1L), d=3:1)
merge(d1, d2, by="a", allow.cartesian=FALSE)
#Error in vecseq(f__, len__, if (allow.cartesian || notjoin || !anyDuplicated(f__, :
# Join results in 9 rows; more than 6 = nrow(x)+nrow(i).
Most helpful comment
correct, thanks @franknarf1