When running make() for a target, it is available in the global environment only if it is built during this run of make() (with the default config). Can we implement a more consistent way of handling built targets?
library(drake)
plan <- drake_plan(
target1 = 5
)
make(plan)
#> cache /tmp/RtmpMxWAKq/.drake
#> connect 1 import: plan
#> connect 1 target: target1
#> check 1 item: target1
#> target target1
target1
#> [1] 5
make(plan)
#> cache /tmp/RtmpMxWAKq/.drake
#> Unloading targets from environment:
#> target1
#> connect 1 import: plan
#> connect 1 target: target1
#> check 1 item: target1
#> All targets are already up to date.
target1
#> Error in eval(expr, envir, enclos): object 'target1' not found
Created on 2018-02-14 by the reprex package (v0.2.0).
For now, users can get around this by creating a custom envir for make(). Going forward, we can definitely address this. By default, drake operates in the user's workspace. Early in development, I tried creating a special isolated execution environment for drake, but I could not solve the lexical scoping problems (nested imported functions were not seeing each other). We might be able to try again with something like expose_imports(). A little extra overhead, but it could be worth it. In fact, this would mean we no longer have to test everything separately for the global and custom environments, so it would cut the scaled-up testing workflow in half. cc @dapperjapper.
I think this specific issue could be solved by just loadding any targets in make(plan, targets) as a convenience to the user when all targets are already up to date and make was run in a way that would modify the global environment (parallelism != Makefile). Alternatively, drake could do the "Unloading targets from environment" thing to clean up after all make runs. I can see a downside to both options, because they require a little more overhead in pursuit of consistency. But I'm fine with that.
I'm not sure about the question of environments. I like way future deals with environments and scope, because it meshes with the normal use of environments and scope in R really naturally, by definition and design. The nature of drake (targets are stored on a flat key-value level, I can't see anyone using #228) precludes some of the complexity found in future, but I think the question of how drake deals with environments is very existentially central. (if that makes sense??)
I think this specific issue could be solved by just loadding any targets in make(plan, targets) as a convenience to the user when all targets are already up to date and make was run in a way that would modify the global environment (parallelism != Makefile).
I'm not sure I agree. Some of these targets get really big, and automatically loading things could get really expensive. Sometimes people want to run make() just to check for outdated targets.
Alternatively, drake could do the "Unloading targets from environment" thing to clean up after all make runs.
Unfortunately, that needs to happen first because sometimes targets conflict with imports, especially when users load targets from the cache to interact with the results.
I'm not sure about the question of environments. I like way future deals with environments and scope, because it meshes with the normal use of environments and scope in R really naturally, by definition and design. The nature of drake (targets are stored on a flat key-value level, I can't see anyone using #228) precludes some of the complexity found in future, but I think the question of how drake deals with environments is very existentially central. (if that makes sense??)
Flatness is a good way to put it. It's a source source of a lot of struggle for drake, and thinking that way is keeping me from wrapping my head around #228 (both in picturing what users might want and trying to guess if it could work).
Unfortunately, that needs to happen first because sometimes targets conflict with imports, especially when users load targets from the cache to interact with the results.
Sorry, I meant both before and after all make runs!
I thought about this one more, and I think our best options are the following (ranked according to my preference):
make(). Consistent, but possibly inconvenient if you want to loadd() things all over again. Then again, this approach would help get things ready for a possible next make().make() (current behavior). Possibly convenient, but inconsistent. And the next make() has more prep work to do.make(), including ones that were skipped. It would be difficult to figure out which targets to load, and loading for the sake of loading may not be worth the time for large targets.Just say the word and I will implement the first one. I may need more convincing on the other two.
Consistency is king, but it looks like the first option might break existing user code. Maybe a variant of option 3 combined with lazy loading (=active bindings)? It doesn't matter if we load more than necessary as long as we're consistent in what we load.
Anyway, this feels sufficiently low prio to me at this point, labeled accordingly.
Maybe active bindings for all the targets in the cache with no revdeps?
I'm really not sure what's best at this point.
At the very least, this is documented in the caution.Rmd vignette.
What about a console message that lists the newly-loaded targets?
I'm leaning towards loading always with active bindings. Do the active bindings work with caching, so that the second access is faster?
You mean in-memory caching, right? I just made that happen in 2fc5932503f00d4651e66009a8a3b03b6d0e5f36.
Creating active bindings for all non-file targets in the cache should cover it. But awkwardly, the bindings would need to be destroyed/unloaded before the next make() in the same session so we don't confuse targets with imports.
But unloading bindings is cheap, and reloading them too if they are in the in-memory cache?
I think so.
At this point, my only concern is the read-only nature of active bindings. Is there a way to relax that with bindr?
> make(drake_plan(x = 1))
> loadd(x, lazy = "bind")
> x <- 2
Error: Binding is read-only.
Edit: just saw r-lib/R6#99. Not sure how users will feel about that. What about lazy loading with promises instead?
Except that delayedAssign() fails if we clean the cache and make() again. The second make() tries to fulfill a promise based on a cache that may not even exist anymore.
> x <- drake_plan(a = 1)
> make(x)
> clean(destroy = TRUE)
> make(x)
Error: key 'a' ('objects') not found
So promises with delayedAssign() don't seem to be an option either.
The theory of functional programming, and also much of drake's philosophy, suggests that these variables should be read-only. But this turns the problem into something we may wish to do for a major release. (Also, I don't see how to fully support assignment with bindr. The variable would remain an active binding, and we'd have to store the value the user gave us somewhere. Yikes.)
If we implement #296, no targets will be automatically loaded anymore, which will give us consistency here.
Closing in favor of #296. I do not think it will be a major hit to back compatibility. Will reopen if we cannot implement #296.
FYI: as a side effect of #369, the symbols loaded after make() are likely to go away for mclapply and parLapply parallelism when jobs > 1. In this case, I think it's worth the sudden change. I have a preference for also removing these symbols when jobs is 1.
Good news. Agree it's good to be consistent regardless of the jobs value.