I think this is more a documentation issue then a real problem but I'm trying to figure out how drake deals with random numbers. I was implementing my own seed setting system for simulations. Now reading through the github pages this all seems to be solved among others with #56 and quick testing seems to indicate this also works in version 5.0.0. But searching through the manual for both seed and random does not produce any results (besides random tip). I think it would be great to have some improved documentation (either vignette or man pages ) so people that do think about this can inform them self. Now I'm unsure to what extend I do or dont need to worry if I'm working with random numbers
You bring up a good point. I'm not sure it would fit into the main vignettes, but an explanation should be added. I think I will elaborate here on this thread and then reference it from the FAQ.
This FAQ could become its own vignette at some point, because this problem feels important enough, and if we cover it in all detail, it might become too large for a FAQ.
Thanks, keep up the good work not having to deal with seeds greatly simplifies my code.
@krlmlr The FAQ is its own vignette already, but it is an automatically-generated stub that links to all the issues tagged "frequently asked question". I believe you mentioned that we might expand it at some point. To avoid redundant work, that might involve scraping specifically-marked comments from the thread.
@bart1 #218 is also relevant here because it helps explain how drake should handle pseudo-randomness. It describes an unexpected problem that is fixed in the development version and will be included in the next CRAN release.
drakeOn your first make(), you have the opportunity to set a global RNG seed for your project. The seed is 0 unless you provide a different one. To ensure reproducibility under pseudo-randomness, subsequent make()s use this same global seed unless you completely destroy the cache and pick another seed. Drake is too opinionated about reproducibility to let you pick another seed unless you destroy the cache and start from scratch. Use read_drake_seed() or read_drake_config()$seed to get the global seed of your project.
Drake uses the global seed to generate a separate seed for every target. Each target gets a different seed, and the seed is always the same given the same global seed and target name. Drake builds each target with its seed using withr::with_seed(). To retrieve the seed used to build a given target, call diagnose(your_target)$seed.
Modified from @krlmlr's online example.
library(drake)
clean(destroy = TRUE)
random <- function(...) {
list(...)
runif(1)
}
plan <- drake_plan(random1 = random(), random2 = random(random1), random3 = random(random2),
random4 = random(random2, random3))
make(plan)
#> target random1
#> target random2
#> target random3
#> target random4
readd(random2)
#> [1] 0.2083636
readd(random4)
#> [1] 0.08183787
# Remove random2 from the cache.
clean(random2)
# Now, we will need to make random2 all over again. If the value of random2
# changes, we will also need to re-make random3 and random4. But random2
# will not change because we preserved the seed.
make(plan)
#> Unloading targets from environment:
#> random4
#> random3
#> random2
#> target random2
readd(random2)
#> [1] 0.2083636
readd(random4)
#> [1] 0.08183787
# Start over from scratch with a new seed.
clean(destroy = TRUE)
make(plan, seed = 1)
#> Unloading targets from environment:
#> random2
#> random1
#> target random1
#> target random2
#> target random3
#> target random4
readd(random2)
#> [1] 0.07983418
readd(random4)
#> [1] 0.6617457
So far, everything I have said only applies to the development version of drake. The CRAN release is a little behind right now, and it will catch up when version 5.1.0 rolls out.
FYI: I just updated the FAQ vignette and the accompanying page on the pkgdown site.
Most helpful comment
Reproducible pseudo-randomness with
drakeThe global seed
On your first
make(), you have the opportunity to set a global RNG seed for your project. The seed is 0 unless you provide a different one. To ensure reproducibility under pseudo-randomness, subsequentmake()s use this same global seed unless you completely destroy the cache and pick another seed.Drakeis too opinionated about reproducibility to let you pick another seed unless you destroy the cache and start from scratch. Useread_drake_seed()orread_drake_config()$seedto get the global seed of your project.Target-level seeds
Drakeuses the global seed to generate a separate seed for every target. Each target gets a different seed, and the seed is always the same given the same global seed and target name.Drakebuilds each target with its seed usingwithr::with_seed(). To retrieve the seed used to build a given target, calldiagnose(your_target)$seed.Example
Modified from @krlmlr's online example.
Caveat
So far, everything I have said only applies to the development version of
drake. The CRAN release is a little behind right now, and it will catch up when version 5.1.0 rolls out.Thanks