Drake: Fool-proof version of clean()

Created on 26 Feb 2019  ·  15Comments  ·  Source: ropensci/drake

clean() is great but a (ahem) friend of mine once accidentally hit return before typing in the target name, wiping out his whole cache in a single errant stroke 🤦‍♂️

Is there a way to prevent fools like me my friend from making such a costly mistake?

intermediate api use case

All 15 comments

We could prompt the user if

  1. The session is interactive, and
  2. No targets are supplied.

Related: reprex:::yep()

That seems like an elegant solution

I have decided to appropriate this issue for the Chicago R Unconference. Sorry @tiernanmartin, but this will delay the implementation, and it means the feature may not make it into the next CRAN release.

Current behavior

In an interactive session:

library(drake)
load_mtcars_example()
make(my_plan)
clean()
cached()
#> character(0)
# Oh no!

Desired behavior

library(drake)
load_mtcars_example()
make(my_plan)
clean()
#> Really delete everything in the drake cache?
#>
#> 1: yes
#> 2: no
#> 
#> Selection:

Test

Please run in an interactive session. The test will fail in batch mode.

library(drake)
library(testthat)
drake:::test_with_dir("clean() in an interactive session pauses with a menu", {
  # Must run interactively.
  load_mtcars_example()
  make(my_plan)
  expect_equal(sort(cached()), sort(my_plan$target))
  clean() # Select 1.
  expect_equal(cached(), character(0))
  make(my_plan)
  expect_equal(sort(cached()), sort(my_plan$target))
  clean() # Select 2.
  expect_equal(sort(cached()), sort(my_plan$target))
  clean(prompt = FALSE)
  expect_equal(cached(), character(0))
})

Implementation

  • [ ] Pause with a menu if the user supplies no targets and the session is interactive. From there, the user should have the choice to either proceed (remove everything from the cache) or return early without cleaning anything. Functions interactive() and utils::menu() should be enough, and it should suffice to augment the following conditional.

https://github.com/ropensci/drake/blob/89904c845689d6f9894d2b47bff120582db6270a/R/api-clean.R#L121-L123

  • [ ] Add the suggested test inside the if (FALSE) block of tests/testthat/test-always-skipped.R. (We need to save the test for posterity, but because of the interactivity, we cannot fully automate it.)
  • [ ] In api-package.R, add menu() to the collection of functions imported from the utils package.

https://github.com/ropensci/drake/blob/89904c845689d6f9894d2b47bff120582db6270a/R/api-package.R#L52-L53

All good - thanks, @wlandau !

As of 55685c65311b9eca123d41321f62b6ad907a618d, clean() with no arguments pauses with a menu once per interactive session. Let's see how that goes. For now, I hesitate to interrupt users more frequently than that, but I could be persuaded to be more aggressive.

By the way: clean() without garbage collection does not actually delete your data. Recovery may still be possible. See drake_history() and especially #952.

If I merge #952, do you think we still need the menu?

I just read through ?drake_history() and while it seems like a very useful tool I don't think it totally solves the problem that this issue originally addressed.

But I may be mistaken so perhaps you can help me understand the following scenario:

Let's say I have a 2-step plan:

drake_plan(
  five_hour_target = make_five_hour_target(), # this function takes ~5 hours to run
  fht_1 = five_hour_target + 1
)

... and then I accidentally run clean(), removing the references for both target but leaving the cached data untouched.

How would I use drake_history() to recreate my plan _without remaking_ five_hour_target?

I was actually referring to recovery, not history.

library(drake)

make_five_hour_target <- function() {
  Sys.sleep(5) # Just take 5 seconds for a reprex!
}

plan <- drake_plan(
  five_hour_target = make_five_hour_target(),
  fht_1 = five_hour_target + 1
)

make(plan)
#> target five_hour_target
#> target fht_1

# Oops!
clean() # But no garbage collection...

# Recover everything for which we still have data.
make(plan, recover = TRUE) # Dependencies/triggers must match some run from the past.
#> recover five_hour_target
#> recover fht_1

make(plan)
#> All targets are already up to date.

Created on 2019-07-20 by the reprex package (v0.3.0)

Thank you for the explanation. Now that I've seen the recovery process in action I think that you're probably right that the menu is no longer necessary.

I would suggest adding a message to clean() that reminds users that they can recover targets using make(plan, recover = TRUE).

PS: I like that recovery operations are styled differently from freshly made targets – nice UI touch!

image

Thanks. I will plan on removing the menu and inserting that message after #957.

Hmm...let's keep the menu if garbage collection is activated.

I would suggest adding a message to clean() that reminds users that they can recover targets using make(plan, recover = TRUE).

I like that. It's also a good way to advertise data recovery.

Implemented in https://github.com/ropensci/drake/commit/2acddb7337aeacc31ddd1469c0035ff14f1c7476. Message:

library(drake)
plan <- drake_plan(x = 1)
make(plan)
#> target x
clean()
#> Undo clean(garbage_collection = FALSE) with make(recovery = TRUE). Also builds unrecoverable targets. Message shown once per session if options(drake_clean_recovery_msg) is not FALSE.
clean()

Also related: #1014 and the new which_clean() function.

Was this page helpful?
0 / 5 - 0 ratings

Related issues

htlin picture htlin  ·  3Comments

pat-s picture pat-s  ·  5Comments

htlin picture htlin  ·  4Comments

bart1 picture bart1  ·  7Comments

boshek picture boshek  ·  6Comments