Data.table: Aggregation fails for list columns with closures that don't have class "function"

Created on 18 Nov 2020  路  8Comments  路  Source: Rdatatable/data.table

f <- function(x) x
f2 <- f
class(f2) <- "x"
dt <- data.table(f, f2)
f <- function(x) x
f2 <- f
class(f2) <- "x"
dt <- data.table(id = 1, f, f2)
dt[, .(f), by = id]
#>    id             f
#> 1:  1 <function[1]>
dt[, .(f2), by = id]
#> Error in as.vector(x, "list") : 
#>   cannot coerce type 'closure' to vector of type 'list'

I would expect the result of .(f2) aggregation to be similar to the .(f) aggregation.

(Maybe this is related to the fact that as.list(f) gives a result and as.list(f2) fails? But I don't think it should fail since the content of the f column is not the result of as.list(f).)

> sessionInfo()
R Under development (unstable) (2020-11-14 r79432)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Fedora 32 (Thirty Two)

Matrix products: default
BLAS:   /home/user/R-devel/lib64/R/lib/libRblas.so
LAPACK: /home/user/R-devel/lib64/R/lib/libRlapack.so

locale:
 [1] LC_CTYPE=en_US.utf8       LC_NUMERIC=C             
 [3] LC_TIME=en_US.utf8        LC_COLLATE=en_US.utf8    
 [5] LC_MONETARY=en_US.utf8    LC_MESSAGES=en_US.utf8   
 [7] LC_PAPER=en_US.utf8       LC_NAME=C                
 [9] LC_ADDRESS=C              LC_TELEPHONE=C           
[11] LC_MEASUREMENT=en_US.utf8 LC_IDENTIFICATION=C      

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] data.table_1.13.2

loaded via a namespace (and not attached):
[1] compiler_4.1.0
non-atomic column

All 8 comments

> traceback()
10: as.list.default(X)
9: as.list(X)
8: lapply(x, runlock, current_depth = current_depth + 1L)
7: FUN(X[[i]], ...)
6: lapply(x, runlock, current_depth = current_depth + 1L)
5: FUN(X[[i]], ...)
4: lapply(x, runlock, current_depth = current_depth + 1L)
3: runlock(ans)
2: `[.data.table`(dt, , .(f2), by = id)
1: dt[, .(f2), by = id]

The runlock function:

  MAX_DEPTH = 5L
  runlock = function(x, current_depth = 1L) {
    if (is.recursive(x) && current_depth <= MAX_DEPTH) {
      if (inherits(x, 'data.table')) .Call(C_unlock, x)
      else return(lapply(x, runlock, current_depth = current_depth + 1L))
    }
    return(invisible())
  }
  runlock(ans)

Now this illustrates the root cause:

lapply(f, print)
#x
#$x
#
#
#[[2]]
#x
lapply(f2, print)
#Error in as.vector(x, "list") : 
 # cannot coerce type 'closure' to vector of type 'list'

Closures are recursive. I think you need a !is.function(x) in the if condition.

Or unclass(x)?

MRE:

as.list(f)
# $x
#  [[2]]
# x
as.list(f2)
# Error in as.vector(x, "list") : 
#   cannot coerce type 'closure' to vector of type 'list'

This issue is that you've erased function from the classes of your object so S3 dispatch has failed:

class(f2) = c("x", "function")
dt <- data.table(id = 1, f, f2)
dt[, .(f2), by = id]
#       id     f2
#    <num> <list>
# 1:     1 <x[1]>

as.list.function was being applied to f while as.list.default was being applied to f2. as.list is being applied via lapply which is how we descend recursive objects in the j return looking for data.tables that need to be unlocked. Chalk this up as another case where re-writing runlock in C might make sense :thinking:

Still I think it is low overhead for us to add !is.function(x) to runlock so I'll do that.

e = environment()
class(e) = "foo"

dt = data.table(id = 1, funs = list(e))
dt[, list(funs), by = "id"]

throws a similar error:

Error in as.vector(x, "list") : 
  cannot coerce type 'environment' to vector of type 'list'

I think these issues can all be avoided by using a for loop instead of lapply.

@Coorsaa could you please check if related PR fixed your use case?

hi @jangorecki ,
I installed from the PR's underlying branch runlock-function which didn't work, so I don't think so.

Yes agree a loop instead of lapply would be better, especially at C level. For another day.
I added @Coorsaa's environment example, thanks, to the test, and fixed it.

Was this page helpful?
0 / 5 - 0 ratings