Drake: Incorrectly outdated targets after loadd()

Created on 6 Mar 2018  路  13Comments  路  Source: ropensci/drake

Calling loadd() changes outdated(drake_config(plan)), but not outdated(read_drake_config()). This doesn't happen if x is created from a constant such as 42.

writeLines("test", "test.txt")

pkgload::load_all("testpkg")
#> Loading testpkg
library(drake)
plan <- drake_plan(
  x = readLines(file_in("test.txt")),
  strings_in_dots = "literals"
)
make(plan)
#> cache /tmp/RtmpArc3UM/.drake
#> connect 1 import: plan
#> connect 1 target: x
#> check 2 items: file "test.txt", readLines
#> check 1 item: x
#> target x
outdated(read_drake_config())
#> cache /tmp/RtmpArc3UM/.drake
#> check 2 items: file "test.txt", readLines
#> unload 1 item: x
#> check 1 item: x
#> character(0)
outdated(drake_config(plan))
#> cache /tmp/RtmpArc3UM/.drake
#> connect 1 import: plan
#> connect 1 target: x
#> check 2 items: file "test.txt", readLines
#> check 1 item: x
#> character(0)
loadd()
#> cache /tmp/RtmpArc3UM/.drake
outdated(read_drake_config())
#> cache /tmp/RtmpArc3UM/.drake
#> check 2 items: file "test.txt", readLines
#> unload 1 item: x
#> check 1 item: x
#> character(0)
outdated(drake_config(plan))
#> cache /tmp/RtmpArc3UM/.drake
#> connect 10 imports: close, file "testpkg/DESCRIPTION", UseMethod, is.cha...
#> connect 1 target: x
#> check 5 items: .Internal, file "test.txt", is.character, on.exit, UseMethod
#> check 2 items: close, file
#> check 1 item: readLines
#> check 1 item: x
#> [1] "x"
rm(x)
#> Warning in rm(x): object 'x' not found
outdated(read_drake_config())
#> cache /tmp/RtmpArc3UM/.drake
#> check 2 items: file "test.txt", readLines
#> check 1 item: x
#> character(0)
outdated(drake_config(plan))
#> cache /tmp/RtmpArc3UM/.drake
#> connect 10 imports: close, file "testpkg/DESCRIPTION", UseMethod, is.cha...
#> connect 1 target: x
#> check 5 items: .Internal, file "test.txt", is.character, on.exit, UseMethod
#> check 2 items: close, file
#> check 1 item: readLines
#> check 1 item: x
#> [1] "x"
make(plan)
#> cache /tmp/RtmpArc3UM/.drake
#> connect 10 imports: close, file "testpkg/DESCRIPTION", UseMethod, is.cha...
#> connect 1 target: x
#> check 5 items: .Internal, file "test.txt", is.character, on.exit, UseMethod
#> check 2 items: close, file
#> check 1 item: readLines
#> check 1 item: x
#> target x
outdated(read_drake_config())
#> cache /tmp/RtmpArc3UM/.drake
#> check 5 items: .Internal, file "test.txt", is.character, on.exit, UseMethod
#> unload 1 item: x
#> check 2 items: close, file
#> check 1 item: readLines
#> check 1 item: x
#> character(0)

Created on 2018-03-06 by the reprex package (v0.2.0).

bug

All 13 comments

It seems that loadd() is loading too much in this example:

writeLines("test", "test.txt")

pkgload::load_all("testpkg")
#> Loading testpkg
library(drake)
plan <- drake_plan(
  x = readLines(file_in("test.txt")),
  strings_in_dots = "literals"
)
make(plan)
#> cache /tmp/RtmpArc3UM/.drake
#> connect 1 import: plan
#> connect 1 target: x
#> check 2 items: file "test.txt", readLines
#> check 1 item: x
#> All targets are already up to date.
read_drake_config()$graph
#> cache /tmp/RtmpArc3UM/.drake
#> IGRAPH 906ff10 DN-- 3 2 -- 
#> + attr: name (v/c)
#> + edges from 906ff10 (vertex names):
#> [1] "test.txt"->x readLines ->x
drake_config(plan)$graph
#> cache /tmp/RtmpArc3UM/.drake
#> connect 1 import: plan
#> connect 1 target: x
#> IGRAPH 59c7ca4 DN-- 3 2 -- 
#> + attr: name (v/c)
#> + edges from 59c7ca4 (vertex names):
#> [1] "test.txt"->x readLines ->x
ls(all.names = TRUE)
#> [1] ".Random.seed" "plan"
loadd()
#> cache /tmp/RtmpArc3UM/.drake
ls(all.names = TRUE)
#>  [1] ".Internal"               ".Random.seed"           
#>  [3] "\"test.txt\""            "\"testpkg/DESCRIPTION\""
#>  [5] "close"                   "file"                   
#>  [7] "fun"                     "is.character"           
#>  [9] "on.exit"                 "plan"                   
#> [11] "readLines"               "UseMethod"              
#> [13] "x"
read_drake_config()$graph
#> cache /tmp/RtmpArc3UM/.drake
#> IGRAPH 906ff10 DN-- 3 2 -- 
#> + attr: name (v/c)
#> + edges from 906ff10 (vertex names):
#> [1] "test.txt"->x readLines ->x
drake_config(plan)$graph
#> cache /tmp/RtmpArc3UM/.drake
#> Unloading targets from environment:
#>   x
#> connect 10 imports: close, file "testpkg/DESCRIPTION", UseMethod, is.cha...
#> connect 1 target: x
#> IGRAPH eb2ffc6 DN-- 9 9 -- 
#> + attr: name (v/c)
#> + edges from eb2ffc6 (vertex names):
#> [1] UseMethod   ->close     is.character->readLines .Internal   ->file     
#> [4] .Internal   ->readLines file        ->readLines on.exit     ->readLines
#> [7] "test.txt"  ->x         close       ->readLines readLines   ->x

Created on 2018-03-07 by the reprex package (v0.2.0).

I can work around for now using drake::possible_targets(). (Why is this function not mentioned in the pkgdown reference?)

writeLines("test", "test.txt")

library(drake)
plan <- drake_plan(
  x = readLines(file_in("test.txt")),
  strings_in_dots = "literals"
)
make(plan)
#> cache /tmp/Rtmp8RItSU/.drake
#> connect 1 import: plan
#> connect 1 target: x
#> check 2 items: file "test.txt", readLines
#> check 1 item: x
#> All targets are already up to date.
read_drake_config()$graph
#> cache /tmp/Rtmp8RItSU/.drake
#> IGRAPH 9911508 DN-- 3 2 -- 
#> + attr: name (v/c)
#> + edges from 9911508 (vertex names):
#> [1] "test.txt"->x readLines ->x
drake_config(plan)$graph
#> cache /tmp/Rtmp8RItSU/.drake
#> connect 1 import: plan
#> connect 1 target: x
#> IGRAPH b798a3a DN-- 3 2 -- 
#> + attr: name (v/c)
#> + edges from b798a3a (vertex names):
#> [1] "test.txt"->x readLines ->x
ls(all.names = TRUE)
#> [1] ".Random.seed" "plan"
loadd(list = drake::possible_targets())
#> cache /tmp/Rtmp8RItSU/.drake
#> cache /tmp/Rtmp8RItSU/.drake
ls(all.names = TRUE)
#> [1] ".Random.seed" "plan"         "x"
read_drake_config()$graph
#> cache /tmp/Rtmp8RItSU/.drake
#> IGRAPH 9911508 DN-- 3 2 -- 
#> + attr: name (v/c)
#> + edges from 9911508 (vertex names):
#> [1] "test.txt"->x readLines ->x
drake_config(plan)$graph
#> cache /tmp/Rtmp8RItSU/.drake
#> Unloading targets from environment:
#>   x
#> connect 1 import: plan
#> connect 1 target: x
#> IGRAPH ad34ca2 DN-- 3 2 -- 
#> + attr: name (v/c)
#> + edges from ad34ca2 (vertex names):
#> [1] "test.txt"->x readLines ->x

Created on 2018-03-07 by the reprex package (v0.2.0).

I will get to this properly after some things on my end calm down. For now, I will just say that I recommend in the docs that users recompute the config object on a regular basis. The cached one sometimes lags behind.

This seems to be triggered when imports are loaded from the cache (which loadd() does). This sort of thing should not happen because changes to functions are detected using deparse(). Weird. Will need to do more digging, maybe with the help of dependency_profile().

writeLines("test", "test.txt")
library(drake)
plan <- drake_plan(x = readLines(file_in("test.txt")), strings_in_dots = "literals")
make(plan, verbose = FALSE)
outdated()
#> cache /tmp/RtmpONwMiv/.drake
#> character(0)
outdated(drake_config(plan, verbose = FALSE))
#> character(0)
loadd(readLines)
#> cache /tmp/RtmpONwMiv/.drake
outdated()
#> cache /tmp/RtmpONwMiv/.drake
#> character(0)
outdated(drake_config(plan, verbose = FALSE))
#> [1] "x"
rm(readLines)
outdated()
#> cache /tmp/RtmpONwMiv/.drake
#> character(0)
outdated(drake_config(plan, verbose = FALSE))
#> character(0)

Also, I am starting to think loadd() with no arguments should exclude imports.

From 9870627e64a9fccafa5c4765aea6f8e92e46c9c4, loadd() with empty ... and list arguments now just loads the targets and ignores the imports. But loading an up-to-date import should not put anything out of date, so there is still a bug.

A wall of reprexes might not be that helpful after all ;-) loadd() seems to load imports into the global environment, I'm not sure why.

Haven't tried, I'm referring to the reprex wall. Maybe the fix is sufficient for the most common case? Although it appears that we might need to put the filter even before drake_select() to support loading everything().?

I think I found the problem. In the first make(), readLines() is in the base environment but not config$envir. That means drake does not dive into readLines() in the code analysis. In other words, drake thinks readLines() has no dependencies.

library(drake)
clean(destroy = TRUE, cache = FALSE)
writeLines("test", "test.txt")
plan <- drake_plan(x = readLines(file_in("test.txt")), strings_in_dots = "literals")
make(plan, verbose = FALSE)
config <- drake_config(plan, verbose = FALSE)
outdated(config)
#> character(0)
dependency_profile("readLines", config)$hashes_of_dependencies
#> named list()

But when readLines() is loaded into config$envir, drake dives into the body of readLines() to find dependencies.

loadd(readLines, envir = config$envir, verbose = FALSE)
config <- drake_config(plan, verbose = FALSE)  # Rebuild the graph.
outdated(config)
#> [1] "x"
dependency_profile("readLines", config)$hashes_of_dependencies
#>          .Internal              close               file 
#> "d1fc219b74330132" "c3d4f6a20262b2df" "4023e7e2dc540c1c" 
#>       is.character            on.exit 
#> "72e8a7b072741188" "7a6478afbc362d4e"

I think the easiest solution is to disallow imports from being loadded. Users should still be able to readd() them, but loadd() should probably just be for targets. Thoughts?

On second thought, refusing to loadd() imports may cause problems in R Markdown reports, where you may actually want to do that sort of thing for a good reason. I think it is enough to generate a warning when an imported function is loaded with loadd().

We should probably also change how drake searches for imports. The change would put some workflows out of date, but I don't think it will otherwise break anything. Current behavior:

  1. First try to get the import from config$envir.
  2. If that fails, if the import is namespaced, try to get it from its package namespace.
  3. Still no luck? Try get(target) with no other arguments. (Unwise, I know, but I wrote that part of the code a long time ago.)

For (3), we should change that bit to get(target, envir = config$envir). Yes, strictly speaking, target could be found in globalenv() this way, but only if it is an ancestor of config$envir. This is the closest thing to a breaking change here.

As @krlmlr suggested, the last major change to fix this issue should be to resist loading functions from package namespaces. We could do that by checking isNamespace(environment(value)) in eager_load_target(), promise_load_target(), and bind_load_target().

A minor comment: loadd() is more complicated internally than I would like it to be. It's a bit messy. Maybe we just have to live with that, I don't know.

Looking at the internals, I think it is actually better to add/use extra metadata to indicate whether a given object was found in envir at the time make() was called. In other words, we should distinguish between native imports (in envir) versus foreign imports (found in an environment that envir inherits from). I will have a PR soon. Adds a little clutter, but given how drake handles environments that may inherit from globalenv(), this is probably the safest option.

Was this page helpful?
0 / 5 - 0 ratings