Drake: Symbols loaded after make()

Created on 14 Feb 2018 · 21Comments · Source: ropensci/drake

When running make() for a target, it is available in the global environment only if it is built during this run of make() (with the default config). Can we implement a more consistent way of handling built targets?

library(drake)

plan <- drake_plan(
  target1 = 5
)

make(plan)
#> cache /tmp/RtmpMxWAKq/.drake
#> connect 1 import: plan
#> connect 1 target: target1
#> check 1 item: target1
#> target target1
target1
#> [1] 5

make(plan)
#> cache /tmp/RtmpMxWAKq/.drake
#> Unloading targets from environment:
#>   target1
#> connect 1 import: plan
#> connect 1 target: target1
#> check 1 item: target1
#> All targets are already up to date.
target1
#> Error in eval(expr, envir, enclos): object 'target1' not found

Created on 2018-02-14 by the reprex package (v0.2.0).

advanced

Source

krlmlr

All 21 comments

For now, users can get around this by creating a custom envir for make(). Going forward, we can definitely address this. By default, drake operates in the user's workspace. Early in development, I tried creating a special isolated execution environment for drake, but I could not solve the lexical scoping problems (nested imported functions were not seeing each other). We might be able to try again with something like expose_imports(). A little extra overhead, but it could be worth it. In fact, this would mean we no longer have to test everything separately for the global and custom environments, so it would cut the scaled-up testing workflow in half. cc @dapperjapper.

wlandau on 14 Feb 2018

I think this specific issue could be solved by just loadding any targets in make(plan, targets) as a convenience to the user when all targets are already up to date and make was run in a way that would modify the global environment (parallelism != Makefile). Alternatively, drake could do the "Unloading targets from environment" thing to clean up after all make runs. I can see a downside to both options, because they require a little more overhead in pursuit of consistency. But I'm fine with that.

I'm not sure about the question of environments. I like way future deals with environments and scope, because it meshes with the normal use of environments and scope in R really naturally, by definition and design. The nature of drake (targets are stored on a flat key-value level, I can't see anyone using #228) precludes some of the complexity found in future, but I think the question of how drake deals with environments is very existentially central. (if that makes sense??)

dapperjapper on 14 Feb 2018

I think this specific issue could be solved by just loadding any targets in make(plan, targets) as a convenience to the user when all targets are already up to date and make was run in a way that would modify the global environment (parallelism != Makefile).

I'm not sure I agree. Some of these targets get really big, and automatically loading things could get really expensive. Sometimes people want to run make() just to check for outdated targets.

Alternatively, drake could do the "Unloading targets from environment" thing to clean up after all make runs.

Unfortunately, that needs to happen first because sometimes targets conflict with imports, especially when users load targets from the cache to interact with the results.

I'm not sure about the question of environments. I like way future deals with environments and scope, because it meshes with the normal use of environments and scope in R really naturally, by definition and design. The nature of drake (targets are stored on a flat key-value level, I can't see anyone using #228) precludes some of the complexity found in future, but I think the question of how drake deals with environments is very existentially central. (if that makes sense??)

Flatness is a good way to put it. It's a source source of a lot of struggle for drake, and thinking that way is keeping me from wrapping my head around #228 (both in picturing what users might want and trying to guess if it could work).

wlandau on 14 Feb 2018

Unfortunately, that needs to happen first because sometimes targets conflict with imports, especially when users load targets from the cache to interact with the results.

Sorry, I meant both before and after all make runs!

dapperjapper on 14 Feb 2018

I thought about this one more, and I think our best options are the following (ranked according to my preference):

Unload all the targets at the end of make(). Consistent, but possibly inconvenient if you want to loadd() things all over again. Then again, this approach would help get things ready for a possible next make().
Leave any in-memory targets alone at the end of make() (current behavior). Possibly convenient, but inconsistent. And the next make() has more prep work to do.
Load the final targets at the end of make(), including ones that were skipped. It would be difficult to figure out which targets to load, and loading for the sake of loading may not be worth the time for large targets.

Just say the word and I will implement the first one. I may need more convincing on the other two.

wlandau on 24 Feb 2018

Consistency is king, but it looks like the first option might break existing user code. Maybe a variant of option 3 combined with lazy loading (=active bindings)? It doesn't matter if we load more than necessary as long as we're consistent in what we load.

Anyway, this feels sufficiently low prio to me at this point, labeled accordingly.

krlmlr on 24 Feb 2018

Maybe active bindings for all the targets in the cache with no revdeps?

wlandau on 24 Feb 2018

I'm really not sure what's best at this point.

krlmlr on 28 Feb 2018

At the very least, this is documented in the caution.Rmd vignette.

What about a console message that lists the newly-loaded targets?

wlandau on 2 Mar 2018

I'm leaning towards loading always with active bindings. Do the active bindings work with caching, so that the second access is faster?

krlmlr on 2 Mar 2018

You mean in-memory caching, right? I just made that happen in 2fc5932503f00d4651e66009a8a3b03b6d0e5f36.

Creating active bindings for all non-file targets in the cache should cover it. But awkwardly, the bindings would need to be destroyed/unloaded before the next make() in the same session so we don't confuse targets with imports.

wlandau on 2 Mar 2018

But unloading bindings is cheap, and reloading them too if they are in the in-memory cache?

krlmlr on 2 Mar 2018

I think so.

wlandau on 2 Mar 2018

At this point, my only concern is the read-only nature of active bindings. Is there a way to relax that with bindr?

> make(drake_plan(x = 1))
> loadd(x, lazy = "bind")
> x <- 2
Error: Binding is read-only.

wlandau on 2 Mar 2018

Edit: just saw r-lib/R6#99. Not sure how users will feel about that. What about lazy loading with promises instead?

wlandau on 2 Mar 2018

Except that delayedAssign() fails if we clean the cache and make() again. The second make() tries to fulfill a promise based on a cache that may not even exist anymore.

> x <- drake_plan(a = 1)
> make(x)
> clean(destroy = TRUE)
> make(x)
Error: key 'a' ('objects') not found

So promises with delayedAssign() don't seem to be an option either.

wlandau on 2 Mar 2018

The theory of functional programming, and also much of drake's philosophy, suggests that these variables should be read-only. But this turns the problem into something we may wish to do for a major release. (Also, I don't see how to fully support assignment with bindr. The variable would remain an active binding, and we'd have to store the value the user gave us somewhere. Yikes.)

krlmlr on 2 Mar 2018

If we implement #296, no targets will be automatically loaded anymore, which will give us consistency here.

wlandau on 6 Mar 2018

👍1

Closing in favor of #296. I do not think it will be a major hit to back compatibility. Will reopen if we cannot implement #296.

wlandau on 8 Mar 2018

👍1

FYI: as a side effect of #369, the symbols loaded after make() are likely to go away for mclapply and parLapply parallelism when jobs > 1. In this case, I think it's worth the sudden change. I have a preference for also removing these symbols when jobs is 1.

wlandau on 2 May 2018

Good news. Agree it's good to be consistent regardless of the jobs value.

krlmlr on 3 May 2018

👍1

Was this page helpful?

0 / 5 - 0 ratings

Related issues

Errors store large amounts of data

billdenney · 9Comments

Prohibit dynamic branching over unbranched dynamic files.

wlandau · 4Comments

AppVeyor CI disconnected

wlandau · 9Comments

Incorporate unit testing with drake pipelines

rsangole · 3Comments

Combining `workflowr` and `drake`: A perfect match

pat-s · 5Comments