Data.table: joins with type coercion: 1.5 == 1 is TRUE

Created on 27 Jan 2018  路  10Comments  路  Source: Rdatatable/data.table

Maybe this is a known issue, but to me it came as a very bad surprise.
During a join, if i is coerced to integer to match x's column type, 1.5 is joined to 1:

library(data.table)
dt1 <- data.table(x = 1L, y = 1)
dt2 <- data.table(x = 1.5, z = 2)
dt1[dt2, on = "x", verbose = TRUE]
# Calculated ad hoc index in 0 secs
# Coercing double column i.'x' to integer to match type of x.'x'. Please avoid coercion for efficiency.
# Starting bmerge ...done in 0 secs
#     x y z
# 1: 1 1 2
# 1: 1 1 2

merge(dt2, dt1, by = "x")
#    x y z
# 1: 1 1 2

The reason is the blind coercion to integer of dt2$x during bmerge.
We should adopt the base R loic of merge.data.frame where the join columns get coerced to the 'highest' involved type (https://github.com/wch/r-source/blob/e690b0d6998dfbc360f0fa14492eb8648df20949/src/main/unique.c) Lines 902ff:

    /* Coerce to a common type; type == NILSXP is ok here.
     * Note that above we coerce factors and "POSIXlt", only to character.
     * Hence, coerce to character or to `higher' type
     * (given that we have "Vector" or NULL) */
    if(TYPEOF(x) >= STRSXP || TYPEOF(table) >= STRSXP) type = STRSXP;
    else type = TYPEOF(x) < TYPEOF(table) ? TYPEOF(table) : TYPEOF(x);
    PROTECT(x     = coerceVector(x, type)); nprot++;
PROTECT(table = coerceVector(table, type)); nprot++;
High bug

Most helpful comment

I have done a bit more extensive testing on joins with non-matching types. The table below lists the findings and my proposed change of behaviour. Code for examples can be found at the end of this post.

|type of x column|type of i column|behaviour|recommended behaviour| comment|
|---|---|---|---|---|
|logical|integer|throws error| don't change, unify error messages||
|logical|numeric|throws error|don't change, unify error messages||
|logical|character|throws error|don't change, unify error messages||
|logical|factor|throws error|don't change, unify error messages||
|integer|logical|coercion to integer so that 1 joins to TRUE, 0 to FALSE, but 2 to nothing.|definitely add a warning|no idea if this is "correct"|
|integer|numeric|bug: coercion to integer so that 1.5 joins to 1| If the numeric column in i isReallyReal, coercion to numeric with warning. If the column in i is not reallyReal, coercion to integer with warning| see example in code below|
|integer|character|throws error|don't change, unify style of error messages||
|integer|factor|bug: joins so that integer 1 joins to the first level of the factor.| Throw error| see
|numeric|logical|similar to integer - logical|add warning|no idea if this is "correct"|
|numeric|integer|coercion to numeric| don't change, add warning||
|numeric|character|throws error|don't change, unify error messages||
|numeric|factor|throws error|don't change, unify error messages||
|character|logical|throws error|don't change, unify error messages||
|character|integer|throws error (different style)|don't change, unify error messages||
|character|numeric|throws error|don't change, unify error messages||
|character|factor|coercion to character|don't change, add warning||
|factor|logical|throws error|don't change, unify error messages||
|factor|integer|throws error|don't change, unify error messages||
|factor|numeric|throws error|don't change, unify error messages||
|factor|character|coercion to factor| don't change, add warning||
|int64|integer|coercion to integer64|don't change, add warning||
|integer|int64|coercion to integer (with overflow leading to NA if int64 is too large)|don't change, add warning||
|int64|logical|coercion to integer64|don't change, add warning||
|logical|int64|throws error|don't change, unify error message||
|int64|numeric|same bug as with integer|If the numeric column in i isReallyReal, coercion to numeric with warning. If the column in i is not reallyReal, coercion to integer64 with warning||
|numeric|int64|coercion to numeric|don't change, add warning||
|int64|character|throws error|don't change, unify error message||
|character|int64|throws error|don't change, unify error message||
|int64|factor|throws error|don't change, unify error message||
|factor|int64|throws error|don't change, unify error message||

Here is the code with examples for all cases:

library(data.table)
dt <- data.table(int = 1L:10L, 
                 num = seq(0.5,5, 0.5),
                 char = as.character(1:10),
                 bool = as.logical(0:9),
                 fact = factor(letters[1:10]))

cols <- c("int", "num", "char", "bool", "fact")
cols <- c(paste0("x.", cols), paste0("i.", cols))


## integer - logical
test <- dt[dt, cols, on = "int==bool", nomatch = 0L, with = FALSE]
class(test$i.bool)
# "integer"
test
#    x.int x.num x.char x.bool x.fact i.int i.num i.char i.bool i.fact
# 1:     1   0.5      1  FALSE      a     2   1.0      2      1      b
# 2:     1   0.5      1  FALSE      a     3   1.5      3      1      c
# 3:     1   0.5      1  FALSE      a     4   2.0      4      1      d
# 4:     1   0.5      1  FALSE      a     5   2.5      5      1      e
# 5:     1   0.5      1  FALSE      a     6   3.0      6      1      f
# 6:     1   0.5      1  FALSE      a     7   3.5      7      1      g
# 7:     1   0.5      1  FALSE      a     8   4.0      8      1      h
# 8:     1   0.5      1  FALSE      a     9   4.5      9      1      i
# 9:     1   0.5      1  FALSE      a    10   5.0     10      1      j


## logical - integer
test <- dt[dt, cols, on = "bool==int", nomatch = 0L, with = FALSE]
# Error in bmerge(i, x, leftcols, rightcols, io, xo, roll, rollends, nomatch,  : 
# typeof x.bool (logical) != typeof i.int (integer)


## integer - numeric
test <- dt[dt, cols, on = "int==num", nomatch = 0L, with = FALSE] 
class(test$i.num)
# "integer"
test
#    x.int x.num x.char x.bool x.fact i.int i.num i.char i.bool i.fact
# 1:     1   0.5      1  FALSE      a     2     1      2   TRUE      b
# 2:     1   0.5      1  FALSE      a     3     1      3   TRUE      c
# 3:     2   1.0      2   TRUE      b     4     2      4   TRUE      d
# 4:     2   1.0      2   TRUE      b     5     2      5   TRUE      e
# 5:     3   1.5      3   TRUE      c     6     3      6   TRUE      f
# 6:     3   1.5      3   TRUE      c     7     3      7   TRUE      g
# 7:     4   2.0      4   TRUE      d     8     4      8   TRUE      h
# 8:     4   2.0      4   TRUE      d     9     4      9   TRUE      i
# 9:     5   2.5      5   TRUE      e    10     5     10   TRUE      j
## we see that i.num 1 and 1.5 both join to x.int = 1 as they are coerced to integer. This is a bug


## numeric - integer
test <- dt[dt, cols, on = "num==int", nomatch = 0L, with = FALSE] 
class(test$i.int)
# "integer"
test
#    x.int x.num x.char x.bool x.fact i.int i.num i.char i.bool i.fact
# 1:     2     1      2   TRUE      b     1   0.5      1  FALSE      a
# 2:     4     2      4   TRUE      d     2   1.0      2   TRUE      b
# 3:     6     3      6   TRUE      f     3   1.5      3   TRUE      c
# 4:     8     4      8   TRUE      h     4   2.0      4   TRUE      d
# 5:    10     5     10   TRUE      j     5   2.5      5   TRUE      e


## integer - character
test <- dt[dt, cols, on = "int==char", nomatch = 0L, with = FALSE] 
# Error in bmerge(i, x, leftcols, rightcols, io, xo, roll, rollends, nomatch,  : 
# typeof x.int (integer) != typeof i.char (character)

## character - integer
test <- dt[dt, cols, on = "char==int", nomatch = 0L, with = FALSE] 
# Error in bmerge(i, x, leftcols, rightcols, io, xo, roll, rollends, nomatch,  : 
# x.'char' is a character column being joined to i.'int' which is type 'integer'. Character columns must join to factor or character columns.)

## integer - factor 
test <- dt[dt, cols, on = "int==fact", nomatch = 0L, with = FALSE] 
class(test$i.fact)
# "factor"
typeof(test$i.fact)
# "integer"
test
#    x.int x.num x.char x.bool x.fact i.int i.num i.char i.bool i.fact
# 1:     1   0.5      1  FALSE      a     1   0.5      1  FALSE      a
# 2:     2   1.0      2   TRUE      b     2   1.0      2   TRUE      b
# 3:     3   1.5      3   TRUE      c     3   1.5      3   TRUE      c
# 4:     4   2.0      4   TRUE      d     4   2.0      4   TRUE      d
# 5:     5   2.5      5   TRUE      e     5   2.5      5   TRUE      e
# 6:     6   3.0      6   TRUE      f     6   3.0      6   TRUE      f
# 7:     7   3.5      7   TRUE      g     7   3.5      7   TRUE      g
# 8:     8   4.0      8   TRUE      h     8   4.0      8   TRUE      h
# 9:     9   4.5      9   TRUE      i     9   4.5      9   TRUE      i
# 10:    10  5.0     10   TRUE      j    10   5.0     10   TRUE      j
## no type coercion, but internally factors are just integers, 
## so that in this case i.fact = a is joined to i.int = 1, which clearly is a bug.


## factor - integer 
test <- dt[dt, cols, on = "fact==int", nomatch = 0L, with = FALSE] 
# Error in bmerge(i, x, leftcols, rightcols, io, xo, roll, rollends, nomatch,  : 
# x.'fact' is a factor column being joined to i.'int' which is type 'integer'. Factor columns must join to factor or character columns.


## numeric - logical
test <- dt[dt, cols, on = "num==bool", nomatch = 0L, with = FALSE]
class(test$i.bool)
# "numeric"
test
#    x.int x.num x.char x.bool x.fact i.int i.num i.char i.bool i.fact
# 1:     2     1      2   TRUE      b     2   1.0      2      1      b
# 2:     2     1      2   TRUE      b     3   1.5      3      1      c
# 3:     2     1      2   TRUE      b     4   2.0      4      1      d
# 4:     2     1      2   TRUE      b     5   2.5      5      1      e
# 5:     2     1      2   TRUE      b     6   3.0      6      1      f
# 6:     2     1      2   TRUE      b     7   3.5      7      1      g
# 7:     2     1      2   TRUE      b     8   4.0      8      1      h
# 8:     2     1      2   TRUE      b     9   4.5      9      1      i
# 9:     2     1      2   TRUE      b    10   5.0     10      1      j


## logical - numeric
test <- dt[dt, cols, on = "bool==num", nomatch = 0L, with = FALSE]
# Error in bmerge(i, x, leftcols, rightcols, io, xo, roll, rollends, nomatch,  : 
# typeof x.bool (logical) != typeof i.num (double)


## numeric - character
test <- dt[dt, cols, on = "num==char", nomatch = 0L, with = FALSE]
# Error in bmerge(i, x, leftcols, rightcols, io, xo, roll, rollends, nomatch,  : 
# typeof x.num (double) != typeof i.char (character)


## character - numeric
test <- dt[dt, cols, on = "char==num", nomatch = 0L, with = FALSE]
# Error in bmerge(i, x, leftcols, rightcols, io, xo, roll, rollends, nomatch,  : 
# x.'char' is a character column being joined to i.'num' which is type 'double'. Character columns must join to factor or character columns.


## numeric - factor
test <- dt[dt, cols, on = "num==fact", nomatch = 0L, with = FALSE]
# Error in bmerge(i, x, leftcols, rightcols, io, xo, roll, rollends, nomatch,  : 
# typeof x.num (double) != typeof i.fact (integer)


## factor - numeric
test <- dt[dt, cols, on = "fact==num", nomatch = 0L, with = FALSE]
# Error in bmerge(i, x, leftcols, rightcols, io, xo, roll, rollends, nomatch,  : 
# x.'fact' is a factor column being joined to i.'num' which is type 'double'. Factor columns must join to factor or character columns.


## logical - character
test <- dt[dt, cols, on = "bool==char", nomatch = 0L, with = FALSE]
# Error in bmerge(i, x, leftcols, rightcols, io, xo, roll, rollends, nomatch,  : 
# typeof x.bool (logical) != typeof i.char (character)


## character - logical
test <- dt[dt, cols, on = "char==bool", nomatch = 0L, with = FALSE]
# Error in bmerge(i, x, leftcols, rightcols, io, xo, roll, rollends, nomatch,  : 
# x.'char' is a character column being joined to i.'bool' which is type 'logical'. Character columns must join to factor or character columns.


## logical - factor
test <- dt[dt, cols, on = "bool==fact", nomatch = 0L, with = FALSE]
# Error in bmerge(i, x, leftcols, rightcols, io, xo, roll, rollends, nomatch,  : 
# typeof x.bool (logical) != typeof i.fact (integer)r)


## factor - logical
test <- dt[dt, cols, on = "fact==bool", nomatch = 0L, with = FALSE]
# Error in bmerge(i, x, leftcols, rightcols, io, xo, roll, rollends, nomatch,  : 
# x.'fact' is a factor column being joined to i.'bool' which is type 'logical'. Factor columns must join to factor or character columns.


## character - factor
test <- dt[dt, cols, on = "char==fact", nomatch = 0L, with = FALSE]
class(test$i.fact)
# "character"
test
# Empty data.table (0 rows) of 10 cols: x.int,x.num,x.char,x.bool,x.fact,i.int...


## factor - character
test <- dt[dt, cols, on = "fact==char", nomatch = 0L, with = FALSE]
class(test$i.char)
# "factor"
test
# Empty data.table (0 rows) of 10 cols: x.int,x.num,x.char,x.bool,x.fact,i.int...


library(bit64)
dt[, int64 := as.integer64(c(1:9, 2.5e10))]
cols <- c("int", "num", "char", "bool", "fact", "int64")
cols <- c(paste0("x.", cols), paste0("i.", cols))

## int64 - integer
test <- dt[dt, cols, on = "int64==int", nomatch = 0L, with = FALSE]
class(test$i.int)
# "integer64"
test
#    x.int x.num x.char x.bool x.fact x.int64 i.int i.num i.char i.bool i.fact i.int64
# 1:     1   0.5      1  FALSE      a       1     1   0.5      1  FALSE      a       1
# 2:     2   1.0      2   TRUE      b       2     2   1.0      2   TRUE      b       2
# 3:     3   1.5      3   TRUE      c       3     3   1.5      3   TRUE      c       3
# 4:     4   2.0      4   TRUE      d       4     4   2.0      4   TRUE      d       4
# 5:     5   2.5      5   TRUE      e       5     5   2.5      5   TRUE      e       5
# 6:     6   3.0      6   TRUE      f       6     6   3.0      6   TRUE      f       6
# 7:     7   3.5      7   TRUE      g       7     7   3.5      7   TRUE      g       7
# 8:     8   4.0      8   TRUE      h       8     8   4.0      8   TRUE      h       8
# 9:     9   4.5      9   TRUE      i       9     9   4.5      9   TRUE      i       9


## integer - int64
test <- dt[dt, cols, on = "int==int64", nomatch = 0L, with = FALSE]
class(test$i.int64)
# "integer"
test
#    x.int x.num x.char x.bool x.fact x.int64 i.int i.num i.char i.bool i.fact i.int64
# 1:     1   0.5      1  FALSE      a       1     1   0.5      1  FALSE      a       1
# 2:     2   1.0      2   TRUE      b       2     2   1.0      2   TRUE      b       2
# 3:     3   1.5      3   TRUE      c       3     3   1.5      3   TRUE      c       3
# 4:     4   2.0      4   TRUE      d       4     4   2.0      4   TRUE      d       4
# 5:     5   2.5      5   TRUE      e       5     5   2.5      5   TRUE      e       5
# 6:     6   3.0      6   TRUE      f       6     6   3.0      6   TRUE      f       6
# 7:     7   3.5      7   TRUE      g       7     7   3.5      7   TRUE      g       7
# 8:     8   4.0      8   TRUE      h       8     8   4.0      8   TRUE      h       8
# 9:     9   4.5      9   TRUE      i       9     9   4.5      9   TRUE      i       9    


## int64 - logical
test <- dt[dt, cols, on = "int64==bool", nomatch = 0L, with = FALSE]
class(test$i.bool)
# "integer64"
test
#    x.int x.num x.char x.bool x.fact x.int64 i.int i.num i.char i.bool i.fact     i.int64
# 1:     1   0.5      1  FALSE      a       1     2   1.0      2      1      b           2
# 2:     1   0.5      1  FALSE      a       1     3   1.5      3      1      c           3
# 3:     1   0.5      1  FALSE      a       1     4   2.0      4      1      d           4
# 4:     1   0.5      1  FALSE      a       1     5   2.5      5      1      e           5
# 5:     1   0.5      1  FALSE      a       1     6   3.0      6      1      f           6
# 6:     1   0.5      1  FALSE      a       1     7   3.5      7      1      g           7
# 7:     1   0.5      1  FALSE      a       1     8   4.0      8      1      h           8
# 8:     1   0.5      1  FALSE      a       1     9   4.5      9      1      i           9
# 9:     1   0.5      1  FALSE      a       1    10   5.0     10      1      j 25000000000


## logical - int64
test <- dt[dt, cols, on = "bool==int64", nomatch = 0L, with = FALSE]
# Error in bmerge(i, x, leftcols, rightcols, io, xo, roll, rollends, nomatch,  : 
# typeof x.bool (logical) != typeof i.int64 (double)


## int64 - numeric
test <- dt[dt, cols, on = "int64==num", nomatch = 0L, with = FALSE]
class(test$i.num)
# "integer64"
test
#    x.int x.num x.char x.bool x.fact x.int64 i.int i.num i.char i.bool i.fact     i.int64
# 1:     1   0.5      1  FALSE      a       1     2     1      2   TRUE      b           2
# 2:     1   0.5      1  FALSE      a       1     3     1      3   TRUE      c           3
# 3:     2   1.0      2   TRUE      b       2     4     2      4   TRUE      d           4
# 4:     2   1.0      2   TRUE      b       2     5     2      5   TRUE      e           5
# 5:     3   1.5      3   TRUE      c       3     6     3      6   TRUE      f           6
# 6:     3   1.5      3   TRUE      c       3     7     3      7   TRUE      g           7
# 7:     4   2.0      4   TRUE      d       4     8     4      8   TRUE      h           8
# 8:     4   2.0      4   TRUE      d       4     9     4      9   TRUE      i           9
# 9:     5   2.5      5   TRUE      e       5    10     5     10   TRUE      j 25000000000
## same bug as for integer - numeric: conversion to integer means that 1.5 is joined to 1, etc.


## numeric - int64
test <- dt[dt, cols, on = "num==int64", nomatch = 0L, with = FALSE]
class(test$i.int64)
# "numeric"
test
#    x.int x.num x.char x.bool x.fact     x.int64 i.int i.num i.char i.bool i.fact i.int64
# 1:     2     1      2   TRUE      b           2     1   0.5      1  FALSE      a       1
# 2:     4     2      4   TRUE      d           4     2   1.0      2   TRUE      b       2
# 3:     6     3      6   TRUE      f           6     3   1.5      3   TRUE      c       3
# 4:     8     4      8   TRUE      h           8     4   2.0      4   TRUE      d       4
# 5:    10     5     10   TRUE      j 25000000000     5   2.5      5   TRUE      e       5


## int64 - character
test <- dt[dt, cols, on = "int64==char", nomatch = 0L, with = FALSE]
# Error in bmerge(i, x, leftcols, rightcols, io, xo, roll, rollends, nomatch,  : 
# typeof x.int64 (double) != typeof i.char (character)


## character - int64
test <- dt[dt, cols, on = "char==int64", nomatch = 0L, with = FALSE]
# Error in bmerge(i, x, leftcols, rightcols, io, xo, roll, rollends, nomatch,  : 
# x.'char' is a character column being joined to i.'int64' which is type 'double'. Character columns must join to factor or character columns.


## int64 - factor
test <- dt[dt, cols, on = "int64==fact", nomatch = 0L, with = FALSE]
# Error in bmerge(i, x, leftcols, rightcols, io, xo, roll, rollends, nomatch,  : 
# typeof x.int64 (double) != typeof i.fact (integer)


## factor - int64
test <- dt[dt, cols, on = "fact==int64", nomatch = 0L, with = FALSE]
# Error in bmerge(i, x, leftcols, rightcols, io, xo, roll, rollends, nomatch,  : 
# x.'fact' is a factor column being joined to i.'int64' which is type 'double'. Factor columns must join to factor or character columns.

All 10 comments

Good spot and agree. With the caveat that it should only coerce to double if isReallyReal(). Otherwise, coerce the not-really-real double to int (which is the more common case in mind). Both with warning rather than just verbose.

I have done a bit more extensive testing on joins with non-matching types. The table below lists the findings and my proposed change of behaviour. Code for examples can be found at the end of this post.

|type of x column|type of i column|behaviour|recommended behaviour| comment|
|---|---|---|---|---|
|logical|integer|throws error| don't change, unify error messages||
|logical|numeric|throws error|don't change, unify error messages||
|logical|character|throws error|don't change, unify error messages||
|logical|factor|throws error|don't change, unify error messages||
|integer|logical|coercion to integer so that 1 joins to TRUE, 0 to FALSE, but 2 to nothing.|definitely add a warning|no idea if this is "correct"|
|integer|numeric|bug: coercion to integer so that 1.5 joins to 1| If the numeric column in i isReallyReal, coercion to numeric with warning. If the column in i is not reallyReal, coercion to integer with warning| see example in code below|
|integer|character|throws error|don't change, unify style of error messages||
|integer|factor|bug: joins so that integer 1 joins to the first level of the factor.| Throw error| see
|numeric|logical|similar to integer - logical|add warning|no idea if this is "correct"|
|numeric|integer|coercion to numeric| don't change, add warning||
|numeric|character|throws error|don't change, unify error messages||
|numeric|factor|throws error|don't change, unify error messages||
|character|logical|throws error|don't change, unify error messages||
|character|integer|throws error (different style)|don't change, unify error messages||
|character|numeric|throws error|don't change, unify error messages||
|character|factor|coercion to character|don't change, add warning||
|factor|logical|throws error|don't change, unify error messages||
|factor|integer|throws error|don't change, unify error messages||
|factor|numeric|throws error|don't change, unify error messages||
|factor|character|coercion to factor| don't change, add warning||
|int64|integer|coercion to integer64|don't change, add warning||
|integer|int64|coercion to integer (with overflow leading to NA if int64 is too large)|don't change, add warning||
|int64|logical|coercion to integer64|don't change, add warning||
|logical|int64|throws error|don't change, unify error message||
|int64|numeric|same bug as with integer|If the numeric column in i isReallyReal, coercion to numeric with warning. If the column in i is not reallyReal, coercion to integer64 with warning||
|numeric|int64|coercion to numeric|don't change, add warning||
|int64|character|throws error|don't change, unify error message||
|character|int64|throws error|don't change, unify error message||
|int64|factor|throws error|don't change, unify error message||
|factor|int64|throws error|don't change, unify error message||

Here is the code with examples for all cases:

library(data.table)
dt <- data.table(int = 1L:10L, 
                 num = seq(0.5,5, 0.5),
                 char = as.character(1:10),
                 bool = as.logical(0:9),
                 fact = factor(letters[1:10]))

cols <- c("int", "num", "char", "bool", "fact")
cols <- c(paste0("x.", cols), paste0("i.", cols))


## integer - logical
test <- dt[dt, cols, on = "int==bool", nomatch = 0L, with = FALSE]
class(test$i.bool)
# "integer"
test
#    x.int x.num x.char x.bool x.fact i.int i.num i.char i.bool i.fact
# 1:     1   0.5      1  FALSE      a     2   1.0      2      1      b
# 2:     1   0.5      1  FALSE      a     3   1.5      3      1      c
# 3:     1   0.5      1  FALSE      a     4   2.0      4      1      d
# 4:     1   0.5      1  FALSE      a     5   2.5      5      1      e
# 5:     1   0.5      1  FALSE      a     6   3.0      6      1      f
# 6:     1   0.5      1  FALSE      a     7   3.5      7      1      g
# 7:     1   0.5      1  FALSE      a     8   4.0      8      1      h
# 8:     1   0.5      1  FALSE      a     9   4.5      9      1      i
# 9:     1   0.5      1  FALSE      a    10   5.0     10      1      j


## logical - integer
test <- dt[dt, cols, on = "bool==int", nomatch = 0L, with = FALSE]
# Error in bmerge(i, x, leftcols, rightcols, io, xo, roll, rollends, nomatch,  : 
# typeof x.bool (logical) != typeof i.int (integer)


## integer - numeric
test <- dt[dt, cols, on = "int==num", nomatch = 0L, with = FALSE] 
class(test$i.num)
# "integer"
test
#    x.int x.num x.char x.bool x.fact i.int i.num i.char i.bool i.fact
# 1:     1   0.5      1  FALSE      a     2     1      2   TRUE      b
# 2:     1   0.5      1  FALSE      a     3     1      3   TRUE      c
# 3:     2   1.0      2   TRUE      b     4     2      4   TRUE      d
# 4:     2   1.0      2   TRUE      b     5     2      5   TRUE      e
# 5:     3   1.5      3   TRUE      c     6     3      6   TRUE      f
# 6:     3   1.5      3   TRUE      c     7     3      7   TRUE      g
# 7:     4   2.0      4   TRUE      d     8     4      8   TRUE      h
# 8:     4   2.0      4   TRUE      d     9     4      9   TRUE      i
# 9:     5   2.5      5   TRUE      e    10     5     10   TRUE      j
## we see that i.num 1 and 1.5 both join to x.int = 1 as they are coerced to integer. This is a bug


## numeric - integer
test <- dt[dt, cols, on = "num==int", nomatch = 0L, with = FALSE] 
class(test$i.int)
# "integer"
test
#    x.int x.num x.char x.bool x.fact i.int i.num i.char i.bool i.fact
# 1:     2     1      2   TRUE      b     1   0.5      1  FALSE      a
# 2:     4     2      4   TRUE      d     2   1.0      2   TRUE      b
# 3:     6     3      6   TRUE      f     3   1.5      3   TRUE      c
# 4:     8     4      8   TRUE      h     4   2.0      4   TRUE      d
# 5:    10     5     10   TRUE      j     5   2.5      5   TRUE      e


## integer - character
test <- dt[dt, cols, on = "int==char", nomatch = 0L, with = FALSE] 
# Error in bmerge(i, x, leftcols, rightcols, io, xo, roll, rollends, nomatch,  : 
# typeof x.int (integer) != typeof i.char (character)

## character - integer
test <- dt[dt, cols, on = "char==int", nomatch = 0L, with = FALSE] 
# Error in bmerge(i, x, leftcols, rightcols, io, xo, roll, rollends, nomatch,  : 
# x.'char' is a character column being joined to i.'int' which is type 'integer'. Character columns must join to factor or character columns.)

## integer - factor 
test <- dt[dt, cols, on = "int==fact", nomatch = 0L, with = FALSE] 
class(test$i.fact)
# "factor"
typeof(test$i.fact)
# "integer"
test
#    x.int x.num x.char x.bool x.fact i.int i.num i.char i.bool i.fact
# 1:     1   0.5      1  FALSE      a     1   0.5      1  FALSE      a
# 2:     2   1.0      2   TRUE      b     2   1.0      2   TRUE      b
# 3:     3   1.5      3   TRUE      c     3   1.5      3   TRUE      c
# 4:     4   2.0      4   TRUE      d     4   2.0      4   TRUE      d
# 5:     5   2.5      5   TRUE      e     5   2.5      5   TRUE      e
# 6:     6   3.0      6   TRUE      f     6   3.0      6   TRUE      f
# 7:     7   3.5      7   TRUE      g     7   3.5      7   TRUE      g
# 8:     8   4.0      8   TRUE      h     8   4.0      8   TRUE      h
# 9:     9   4.5      9   TRUE      i     9   4.5      9   TRUE      i
# 10:    10  5.0     10   TRUE      j    10   5.0     10   TRUE      j
## no type coercion, but internally factors are just integers, 
## so that in this case i.fact = a is joined to i.int = 1, which clearly is a bug.


## factor - integer 
test <- dt[dt, cols, on = "fact==int", nomatch = 0L, with = FALSE] 
# Error in bmerge(i, x, leftcols, rightcols, io, xo, roll, rollends, nomatch,  : 
# x.'fact' is a factor column being joined to i.'int' which is type 'integer'. Factor columns must join to factor or character columns.


## numeric - logical
test <- dt[dt, cols, on = "num==bool", nomatch = 0L, with = FALSE]
class(test$i.bool)
# "numeric"
test
#    x.int x.num x.char x.bool x.fact i.int i.num i.char i.bool i.fact
# 1:     2     1      2   TRUE      b     2   1.0      2      1      b
# 2:     2     1      2   TRUE      b     3   1.5      3      1      c
# 3:     2     1      2   TRUE      b     4   2.0      4      1      d
# 4:     2     1      2   TRUE      b     5   2.5      5      1      e
# 5:     2     1      2   TRUE      b     6   3.0      6      1      f
# 6:     2     1      2   TRUE      b     7   3.5      7      1      g
# 7:     2     1      2   TRUE      b     8   4.0      8      1      h
# 8:     2     1      2   TRUE      b     9   4.5      9      1      i
# 9:     2     1      2   TRUE      b    10   5.0     10      1      j


## logical - numeric
test <- dt[dt, cols, on = "bool==num", nomatch = 0L, with = FALSE]
# Error in bmerge(i, x, leftcols, rightcols, io, xo, roll, rollends, nomatch,  : 
# typeof x.bool (logical) != typeof i.num (double)


## numeric - character
test <- dt[dt, cols, on = "num==char", nomatch = 0L, with = FALSE]
# Error in bmerge(i, x, leftcols, rightcols, io, xo, roll, rollends, nomatch,  : 
# typeof x.num (double) != typeof i.char (character)


## character - numeric
test <- dt[dt, cols, on = "char==num", nomatch = 0L, with = FALSE]
# Error in bmerge(i, x, leftcols, rightcols, io, xo, roll, rollends, nomatch,  : 
# x.'char' is a character column being joined to i.'num' which is type 'double'. Character columns must join to factor or character columns.


## numeric - factor
test <- dt[dt, cols, on = "num==fact", nomatch = 0L, with = FALSE]
# Error in bmerge(i, x, leftcols, rightcols, io, xo, roll, rollends, nomatch,  : 
# typeof x.num (double) != typeof i.fact (integer)


## factor - numeric
test <- dt[dt, cols, on = "fact==num", nomatch = 0L, with = FALSE]
# Error in bmerge(i, x, leftcols, rightcols, io, xo, roll, rollends, nomatch,  : 
# x.'fact' is a factor column being joined to i.'num' which is type 'double'. Factor columns must join to factor or character columns.


## logical - character
test <- dt[dt, cols, on = "bool==char", nomatch = 0L, with = FALSE]
# Error in bmerge(i, x, leftcols, rightcols, io, xo, roll, rollends, nomatch,  : 
# typeof x.bool (logical) != typeof i.char (character)


## character - logical
test <- dt[dt, cols, on = "char==bool", nomatch = 0L, with = FALSE]
# Error in bmerge(i, x, leftcols, rightcols, io, xo, roll, rollends, nomatch,  : 
# x.'char' is a character column being joined to i.'bool' which is type 'logical'. Character columns must join to factor or character columns.


## logical - factor
test <- dt[dt, cols, on = "bool==fact", nomatch = 0L, with = FALSE]
# Error in bmerge(i, x, leftcols, rightcols, io, xo, roll, rollends, nomatch,  : 
# typeof x.bool (logical) != typeof i.fact (integer)r)


## factor - logical
test <- dt[dt, cols, on = "fact==bool", nomatch = 0L, with = FALSE]
# Error in bmerge(i, x, leftcols, rightcols, io, xo, roll, rollends, nomatch,  : 
# x.'fact' is a factor column being joined to i.'bool' which is type 'logical'. Factor columns must join to factor or character columns.


## character - factor
test <- dt[dt, cols, on = "char==fact", nomatch = 0L, with = FALSE]
class(test$i.fact)
# "character"
test
# Empty data.table (0 rows) of 10 cols: x.int,x.num,x.char,x.bool,x.fact,i.int...


## factor - character
test <- dt[dt, cols, on = "fact==char", nomatch = 0L, with = FALSE]
class(test$i.char)
# "factor"
test
# Empty data.table (0 rows) of 10 cols: x.int,x.num,x.char,x.bool,x.fact,i.int...


library(bit64)
dt[, int64 := as.integer64(c(1:9, 2.5e10))]
cols <- c("int", "num", "char", "bool", "fact", "int64")
cols <- c(paste0("x.", cols), paste0("i.", cols))

## int64 - integer
test <- dt[dt, cols, on = "int64==int", nomatch = 0L, with = FALSE]
class(test$i.int)
# "integer64"
test
#    x.int x.num x.char x.bool x.fact x.int64 i.int i.num i.char i.bool i.fact i.int64
# 1:     1   0.5      1  FALSE      a       1     1   0.5      1  FALSE      a       1
# 2:     2   1.0      2   TRUE      b       2     2   1.0      2   TRUE      b       2
# 3:     3   1.5      3   TRUE      c       3     3   1.5      3   TRUE      c       3
# 4:     4   2.0      4   TRUE      d       4     4   2.0      4   TRUE      d       4
# 5:     5   2.5      5   TRUE      e       5     5   2.5      5   TRUE      e       5
# 6:     6   3.0      6   TRUE      f       6     6   3.0      6   TRUE      f       6
# 7:     7   3.5      7   TRUE      g       7     7   3.5      7   TRUE      g       7
# 8:     8   4.0      8   TRUE      h       8     8   4.0      8   TRUE      h       8
# 9:     9   4.5      9   TRUE      i       9     9   4.5      9   TRUE      i       9


## integer - int64
test <- dt[dt, cols, on = "int==int64", nomatch = 0L, with = FALSE]
class(test$i.int64)
# "integer"
test
#    x.int x.num x.char x.bool x.fact x.int64 i.int i.num i.char i.bool i.fact i.int64
# 1:     1   0.5      1  FALSE      a       1     1   0.5      1  FALSE      a       1
# 2:     2   1.0      2   TRUE      b       2     2   1.0      2   TRUE      b       2
# 3:     3   1.5      3   TRUE      c       3     3   1.5      3   TRUE      c       3
# 4:     4   2.0      4   TRUE      d       4     4   2.0      4   TRUE      d       4
# 5:     5   2.5      5   TRUE      e       5     5   2.5      5   TRUE      e       5
# 6:     6   3.0      6   TRUE      f       6     6   3.0      6   TRUE      f       6
# 7:     7   3.5      7   TRUE      g       7     7   3.5      7   TRUE      g       7
# 8:     8   4.0      8   TRUE      h       8     8   4.0      8   TRUE      h       8
# 9:     9   4.5      9   TRUE      i       9     9   4.5      9   TRUE      i       9    


## int64 - logical
test <- dt[dt, cols, on = "int64==bool", nomatch = 0L, with = FALSE]
class(test$i.bool)
# "integer64"
test
#    x.int x.num x.char x.bool x.fact x.int64 i.int i.num i.char i.bool i.fact     i.int64
# 1:     1   0.5      1  FALSE      a       1     2   1.0      2      1      b           2
# 2:     1   0.5      1  FALSE      a       1     3   1.5      3      1      c           3
# 3:     1   0.5      1  FALSE      a       1     4   2.0      4      1      d           4
# 4:     1   0.5      1  FALSE      a       1     5   2.5      5      1      e           5
# 5:     1   0.5      1  FALSE      a       1     6   3.0      6      1      f           6
# 6:     1   0.5      1  FALSE      a       1     7   3.5      7      1      g           7
# 7:     1   0.5      1  FALSE      a       1     8   4.0      8      1      h           8
# 8:     1   0.5      1  FALSE      a       1     9   4.5      9      1      i           9
# 9:     1   0.5      1  FALSE      a       1    10   5.0     10      1      j 25000000000


## logical - int64
test <- dt[dt, cols, on = "bool==int64", nomatch = 0L, with = FALSE]
# Error in bmerge(i, x, leftcols, rightcols, io, xo, roll, rollends, nomatch,  : 
# typeof x.bool (logical) != typeof i.int64 (double)


## int64 - numeric
test <- dt[dt, cols, on = "int64==num", nomatch = 0L, with = FALSE]
class(test$i.num)
# "integer64"
test
#    x.int x.num x.char x.bool x.fact x.int64 i.int i.num i.char i.bool i.fact     i.int64
# 1:     1   0.5      1  FALSE      a       1     2     1      2   TRUE      b           2
# 2:     1   0.5      1  FALSE      a       1     3     1      3   TRUE      c           3
# 3:     2   1.0      2   TRUE      b       2     4     2      4   TRUE      d           4
# 4:     2   1.0      2   TRUE      b       2     5     2      5   TRUE      e           5
# 5:     3   1.5      3   TRUE      c       3     6     3      6   TRUE      f           6
# 6:     3   1.5      3   TRUE      c       3     7     3      7   TRUE      g           7
# 7:     4   2.0      4   TRUE      d       4     8     4      8   TRUE      h           8
# 8:     4   2.0      4   TRUE      d       4     9     4      9   TRUE      i           9
# 9:     5   2.5      5   TRUE      e       5    10     5     10   TRUE      j 25000000000
## same bug as for integer - numeric: conversion to integer means that 1.5 is joined to 1, etc.


## numeric - int64
test <- dt[dt, cols, on = "num==int64", nomatch = 0L, with = FALSE]
class(test$i.int64)
# "numeric"
test
#    x.int x.num x.char x.bool x.fact     x.int64 i.int i.num i.char i.bool i.fact i.int64
# 1:     2     1      2   TRUE      b           2     1   0.5      1  FALSE      a       1
# 2:     4     2      4   TRUE      d           4     2   1.0      2   TRUE      b       2
# 3:     6     3      6   TRUE      f           6     3   1.5      3   TRUE      c       3
# 4:     8     4      8   TRUE      h           8     4   2.0      4   TRUE      d       4
# 5:    10     5     10   TRUE      j 25000000000     5   2.5      5   TRUE      e       5


## int64 - character
test <- dt[dt, cols, on = "int64==char", nomatch = 0L, with = FALSE]
# Error in bmerge(i, x, leftcols, rightcols, io, xo, roll, rollends, nomatch,  : 
# typeof x.int64 (double) != typeof i.char (character)


## character - int64
test <- dt[dt, cols, on = "char==int64", nomatch = 0L, with = FALSE]
# Error in bmerge(i, x, leftcols, rightcols, io, xo, roll, rollends, nomatch,  : 
# x.'char' is a character column being joined to i.'int64' which is type 'double'. Character columns must join to factor or character columns.


## int64 - factor
test <- dt[dt, cols, on = "int64==fact", nomatch = 0L, with = FALSE]
# Error in bmerge(i, x, leftcols, rightcols, io, xo, roll, rollends, nomatch,  : 
# typeof x.int64 (double) != typeof i.fact (integer)


## factor - int64
test <- dt[dt, cols, on = "fact==int64", nomatch = 0L, with = FALSE]
# Error in bmerge(i, x, leftcols, rightcols, io, xo, roll, rollends, nomatch,  : 
# x.'fact' is a factor column being joined to i.'int64' which is type 'double'. Factor columns must join to factor or character columns.

I have updated the table with type conversions in the last post to reflect all combinations.

If type is NA in dt2 and let say character in dt1 (or any other type really for dt1), should the merge not failed and still work , assuming NA in dt2 is of type of dt1 ?

Sometimes do not know you are getting NA, and there is no reason that column with type NA is of type logical if not NA ?

Is that supposed to fail ?

merge(dt1, dt2, by = intersect(names(dt1), names(dt2)), all = TRUE, sort = FALSE)
dt1 <- data.table("a" = 1, b = "test")
 dt2 <- data.table("a" = 2, b = NA)
Error in bmerge(i, x, leftcols, rightcols, xo, roll, rollends, nomatch,  : 
  typeof x.b (logical) != typeof i.b (character)

@Fablepongiste Thanks for reporting.
Definitely NA should be coerced to NA_character_ transparently to user. I am not sure but #2734 might already cover that.
Lets add to that also special case of both fields being NAs of different type

dt1 <- data.table("a" = 1, b = NA_character_)
dt2 <- data.table("a" = 2, b = NA_integer_)
merge(dt1, dt2, by = intersect(names(dt1), names(dt2)), all = TRUE, sort = FALSE)

@jangorecki Do you expect this fix to be released soon at least in develop branch ?
Have seen issue #2734 but seems not updated from long time ?

@Fablepongiste cannot promise but we are trying to clear out PR queue, so it will be definitely on our list

Interesting case @Fablepongiste. We will have to check, whether PR #2734 covers that and amend if not. Unfortunately, the PR is very old, so merging will be painful and I have very limited time at the moment. I will nonetheless try to see after it in the next weeks.

I managed to take a look. #2734 now covers all-NA coercion. Thanks @Fablepongiste for raising.

Was this page helpful?
0 / 5 - 0 ratings