Data.table: Fails when data.table is passed to a function in package that does not use it (thus not import it)

Created on 18 Aug 2014  路  9Comments  路  Source: Rdatatable/data.table

Suppose the following function is defined in a package named, for instance, testPkg:

#' @export
test1 <- function(data, fun, ...) {
  fun(data,...)
}

And nothing in data.table is used in the test package.
When the user attaches this test packages as well as data.table, and dplyr, like

library(testPkg)
library(dplyr)
library(data.table)

When a data.table is passed to test() it fails with dplyr functions like

> test1(as.data.table(mtcars), filter, mpg <= mean(mpg))
Error in `[.data.frame`(x, i, j) : object 'mpg' not found

I viewed http://stackoverflow.com/questions/10527072/using-data-table-package-inside-my-own-package/10529888#10529888. But in their cases, they want to actually use data.table inside their package, but here in this case it does not use data.table inside but provides a function user might pass in a data.table object and want it to work. If only this potential requires the package to declare dependency on data.table, it looks a bit weird and unnecessary.

Is it true that all packages that want to be compatible with data.table (even though they don't use it anywhere inside but the users might pass in some data.table) must declare dependency on it?

Is there a better walk-around particularly for this case where the package does not use data.table at all?

Most helpful comment

However, when testPkg declares Depends: data.table in DESCRIPTION, the problem no longer exists. It just looks weird when testPkg uses nothing from data.table.

All 9 comments

Is it true that all packages that want to be compatible with data.table (even though they don't use it anywhere inside but the users might pass in some data.table) must declare dependency on it?

No, that's not the case.

I think the issue is that dplyr neither _imports_ nor _depends_ on data.table, rather suggests it. And, my guess is that it doesn't have .datatable.aware=TRUE set as well.

Many thanks for your quick reply, @arunsrinivasan! Some reference may be useful:

See also:
Original problem: https://github.com/renkun-ken/pipeR/issues/28
Discussion in dplyr: https://github.com/hadley/dplyr/issues/548

However, when testPkg declares Depends: data.table in DESCRIPTION, the problem no longer exists. It just looks weird when testPkg uses nothing from data.table.

Hm, din't look much in detail, but Pipe() seems to be return an environment, and all further evaluations seem to take place in that environment? If so, checkout Matt's recent comment here.

Basically, try:

assignInNamespace("cedta.override", c(data.table:::cedta.override,"pipeR"), "data.table")

If, after doing this, things work smoothly, then you'll need to add .datatable.aware=TRUE to your namespace (or) we'll need to whitelist pipeR for overlooking it in data.table:::cedta() (as Matt explained in that post).

... Pipe() seems to be return an environment, and all further evaluations seem to take place in that environment ...

In fact, Pipe() is designed to return a Pipe object that stores a value and implements $ which finds function in the parent environment to allow chaining commands by piping to its first argument like

Pipe(rnorm(100))$
  density(kernel = "gaussian")$
  plot(col = "red")

It indeed evaluates the dynamically produced closures (e.g. $density(...) and $plot(...)) in non-global environment.

Good news is that I tried assignInNamespace and .datatable.aware = TRUE and it compiles and works quite well, but R CMD CHECK does not pass like

* checking whether package 'pipeR' can be installed ... ERROR
Installation failed.
See 'C:/Users/Kun/Documents/Workspaces/pipeR.Rcheck/00install.out' for details.
Warning: running command '"C:/PROGRA~1/R/R-31~1.1/bin/x64/Rcmd.exe" INSTALL -l "C:/Users/Kun/Documents/Workspaces/pipeR.Rcheck" --no-html "C:\Users\Kun\DOCUME~1\WORKSP~1\PIPER~1.RCH\00_PKG~1\pipeR"' had status 1
Error: Command failed (1)
Execution halted

Exited with status 1.

I also tried only declare .datatable.aware <- TRUE in my namespace but not assignInNamespace , neither did I import data.table, and it also works fine. Does it mean the problem is solved or assignInNamespace is necessary for some reason?

assignInNamespace() is only for user's use if they wish to add to data.table's whitelist for example. Then we add that package name to the whitelist in the next release and they can remove that call.
You didn't include the 00install.out output but I guess that contained an error about use of assignInNamepace, which makes sense then.
Is it all ok and passing R CMD check now?
So it's Pipe() and $ now ... looks good!

I see that the issue has been posted on three different project pages. This makes it quite frustrating to keep following all threads. Why not create the issue on _one_ project and just tag people?

My previous comments weren't completely correct, sorry about that. Had a look at pipeR and dplyr again. Firstly, dplyr doesn't need to be datatable.aware. It evaluates data.table expressions in its parent.frame(). That leads to your case, which is that (secondly), the parent.frame() is the pipeR namespace, which _is not_ data.table aware, for which Matt's comment in the link above is precisely the solution.

Briefly, either we need to whitelist your package or you need to tell us that your namespace is aware (by setting datatable.aware = TRUE in your namespace). I hope this clarifies things a bit.

@mattdowle seems so.

Thanks for your detailed explanations!

Was this page helpful?
0 / 5 - 0 ratings