Drake: Reproducibility with random numbers

Created on 10 Mar 2018  路  7Comments  路  Source: ropensci/drake

I think this is more a documentation issue then a real problem but I'm trying to figure out how drake deals with random numbers. I was implementing my own seed setting system for simulations. Now reading through the github pages this all seems to be solved among others with #56 and quick testing seems to indicate this also works in version 5.0.0. But searching through the manual for both seed and random does not produce any results (besides random tip). I think it would be great to have some improved documentation (either vignette or man pages ) so people that do think about this can inform them self. Now I'm unsure to what extend I do or dont need to worry if I'm working with random numbers

faq

Most helpful comment

Reproducible pseudo-randomness with drake

The global seed

On your first make(), you have the opportunity to set a global RNG seed for your project. The seed is 0 unless you provide a different one. To ensure reproducibility under pseudo-randomness, subsequent make()s use this same global seed unless you completely destroy the cache and pick another seed. Drake is too opinionated about reproducibility to let you pick another seed unless you destroy the cache and start from scratch. Use read_drake_seed() or read_drake_config()$seed to get the global seed of your project.

Target-level seeds

Drake uses the global seed to generate a separate seed for every target. Each target gets a different seed, and the seed is always the same given the same global seed and target name. Drake builds each target with its seed using withr::with_seed(). To retrieve the seed used to build a given target, call diagnose(your_target)$seed.

Example

Modified from @krlmlr's online example.

library(drake)
clean(destroy = TRUE)
random <- function(...) {
  list(...)
  runif(1)
}
plan <- drake_plan(random1 = random(), random2 = random(random1), random3 = random(random2), 
  random4 = random(random2, random3))
make(plan)
#> target random1
#> target random2
#> target random3
#> target random4
readd(random2)
#> [1] 0.2083636
readd(random4)
#> [1] 0.08183787
# Remove random2 from the cache.
clean(random2)
# Now, we will need to make random2 all over again.  If the value of random2
# changes, we will also need to re-make random3 and random4.  But random2
# will not change because we preserved the seed.
make(plan)
#> Unloading targets from environment:
#>   random4
#>   random3
#>   random2
#> target random2
readd(random2)
#> [1] 0.2083636
readd(random4)
#> [1] 0.08183787
# Start over from scratch with a new seed.
clean(destroy = TRUE)
make(plan, seed = 1)
#> Unloading targets from environment:
#>   random2
#>   random1
#> target random1
#> target random2
#> target random3
#> target random4
readd(random2)
#> [1] 0.07983418
readd(random4)
#> [1] 0.6617457

Caveat

So far, everything I have said only applies to the development version of drake. The CRAN release is a little behind right now, and it will catch up when version 5.1.0 rolls out.

Thanks

  • @AlexAxthelm originally brought up the issue (#56).
  • @aedobbyn discovered that the first implementation did not work correctly (#218).
  • @krlmlr wrote a reproducible example and worked with me in-depth on the current solution (#218).

All 7 comments

You bring up a good point. I'm not sure it would fit into the main vignettes, but an explanation should be added. I think I will elaborate here on this thread and then reference it from the FAQ.

This FAQ could become its own vignette at some point, because this problem feels important enough, and if we cover it in all detail, it might become too large for a FAQ.

Thanks, keep up the good work not having to deal with seeds greatly simplifies my code.

@krlmlr The FAQ is its own vignette already, but it is an automatically-generated stub that links to all the issues tagged "frequently asked question". I believe you mentioned that we might expand it at some point. To avoid redundant work, that might involve scraping specifically-marked comments from the thread.

@bart1 #218 is also relevant here because it helps explain how drake should handle pseudo-randomness. It describes an unexpected problem that is fixed in the development version and will be included in the next CRAN release.

Reproducible pseudo-randomness with drake

The global seed

On your first make(), you have the opportunity to set a global RNG seed for your project. The seed is 0 unless you provide a different one. To ensure reproducibility under pseudo-randomness, subsequent make()s use this same global seed unless you completely destroy the cache and pick another seed. Drake is too opinionated about reproducibility to let you pick another seed unless you destroy the cache and start from scratch. Use read_drake_seed() or read_drake_config()$seed to get the global seed of your project.

Target-level seeds

Drake uses the global seed to generate a separate seed for every target. Each target gets a different seed, and the seed is always the same given the same global seed and target name. Drake builds each target with its seed using withr::with_seed(). To retrieve the seed used to build a given target, call diagnose(your_target)$seed.

Example

Modified from @krlmlr's online example.

library(drake)
clean(destroy = TRUE)
random <- function(...) {
  list(...)
  runif(1)
}
plan <- drake_plan(random1 = random(), random2 = random(random1), random3 = random(random2), 
  random4 = random(random2, random3))
make(plan)
#> target random1
#> target random2
#> target random3
#> target random4
readd(random2)
#> [1] 0.2083636
readd(random4)
#> [1] 0.08183787
# Remove random2 from the cache.
clean(random2)
# Now, we will need to make random2 all over again.  If the value of random2
# changes, we will also need to re-make random3 and random4.  But random2
# will not change because we preserved the seed.
make(plan)
#> Unloading targets from environment:
#>   random4
#>   random3
#>   random2
#> target random2
readd(random2)
#> [1] 0.2083636
readd(random4)
#> [1] 0.08183787
# Start over from scratch with a new seed.
clean(destroy = TRUE)
make(plan, seed = 1)
#> Unloading targets from environment:
#>   random2
#>   random1
#> target random1
#> target random2
#> target random3
#> target random4
readd(random2)
#> [1] 0.07983418
readd(random4)
#> [1] 0.6617457

Caveat

So far, everything I have said only applies to the development version of drake. The CRAN release is a little behind right now, and it will catch up when version 5.1.0 rolls out.

Thanks

  • @AlexAxthelm originally brought up the issue (#56).
  • @aedobbyn discovered that the first implementation did not work correctly (#218).
  • @krlmlr wrote a reproducible example and worked with me in-depth on the current solution (#218).

FYI: I just updated the FAQ vignette and the accompanying page on the pkgdown site.

Was this page helpful?
0 / 5 - 0 ratings

Related issues

wlandau-lilly picture wlandau-lilly  路  7Comments

AlexAxthelm picture AlexAxthelm  路  8Comments

wlandau picture wlandau  路  4Comments

wlandau picture wlandau  路  9Comments

rsangole picture rsangole  路  7Comments