Drake: Does drake run autoclean in between dynamic subtargets?

Created on 22 Jun 2020  Â·  19Comments  Â·  Source: ropensci/drake

Prework

  • [x] Read and abide by drake's code of conduct.
  • [x] Search for duplicates among the existing issues, both open and closed.
  • [x] If you think your question has a quick and definite answer, consider posting to Stack Overflow under the drake-r-package tag. (If you anticipate extended follow-up and discussion, you are already in the right place!)

Question

I had an HPC job that failed due to memory after 15ish subtargets that all had identical memory requirements. Does autoclean run between subtargets? If not, could we add an autoclean_subtarget option that does run between subtargets?

question

All 19 comments

As a refresher, I ran a little experiment: at the bottom of manage_memory(), I added some print statements to show which targets and sub-targets are still in memory after memory management.

manage_memory <- function(target, config, downstream = NULL, jobs = 1) {
  stopifnot(length(target) == 1L)
  memory_strategy <- config$spec[[target]]$memory_strategy
  if (is.null(memory_strategy) || is.na(memory_strategy)) {
    memory_strategy <- config$settings$memory_strategy
  }
  class(target) <- memory_strategy
  if (!is_subtarget(target, config)) {
    clear_envir_subtargets(target = target, config = config)
  }
  manage_deps(
    target = target,
    config = config,
    downstream = downstream,
    jobs = jobs
  )
  sync_envir_dynamic(target, config)
  if (config$settings$garbage_collection) {
    gc()
  }

  print(paste(c("target:", target), collapse = " "))
  print(paste(c("loaded targets:", names(config$envir_targets)), collapse = " "))
  print(paste(c("loaded subtargets:", names(config$envir_subtargets)), collapse = " "))
  print("")

  invisible()
}

It looks like autoclean is unloading sub-targets correctly.

library(drake)
plan <- drake_plan(
  x = 1:2,
  y = target(x, dynamic = map(x)),
  z = target(y, dynamic = map(y)),
  w = target(z, dynamic = map(z))
)

make(plan)
#> â–¶ target x
#> [1] "target: x"
#> [1] "loaded targets:"
#> [1] "loaded subtargets:"
#> [1] ""
#> â–¶ dynamic y
#> > subtarget y_0b3474bd
#> [1] "target: y_0b3474bd"
#> [1] "loaded targets: x"
#> [1] "loaded subtargets: x"
#> [1] ""
#> > subtarget y_b2a5c9b8
#> [1] "target: y_b2a5c9b8"
#> [1] "loaded targets: x"
#> [1] "loaded subtargets: x"
#> [1] ""
#> â–  finalize y
#> [1] "target: y"
#> [1] "loaded targets: x"
#> [1] "loaded subtargets:"
#> [1] ""
#> â–¶ dynamic z
#> > subtarget z_0b3474bd
#> [1] "target: z_0b3474bd"
#> [1] "loaded targets: x y y_0b3474bd"
#> [1] "loaded subtargets: y"
#> [1] ""
#> > subtarget z_b2a5c9b8
#> [1] "target: z_b2a5c9b8"
#> [1] "loaded targets: x y y_b2a5c9b8 y_0b3474bd"
#> [1] "loaded subtargets: y"
#> [1] ""
#> â–  finalize z
#> [1] "target: z"
#> [1] "loaded targets: x y y_b2a5c9b8 y_0b3474bd"
#> [1] "loaded subtargets:"
#> [1] ""
#> â–¶ dynamic w
#> > subtarget w_0b3474bd
#> [1] "target: w_0b3474bd"
#> [1] "loaded targets: z_0b3474bd x y z y_b2a5c9b8 y_0b3474bd"
#> [1] "loaded subtargets: z"
#> [1] ""
#> > subtarget w_b2a5c9b8
#> [1] "target: w_b2a5c9b8"
#> [1] "loaded targets: z_0b3474bd x y z z_b2a5c9b8 y_b2a5c9b8 y_0b3474bd"
#> [1] "loaded subtargets: z"
#> [1] ""
#> â–  finalize w
#> [1] "target: w"
#> [1] "loaded targets: z_0b3474bd x y z z_b2a5c9b8 y_b2a5c9b8 y_0b3474bd"
#> [1] "loaded subtargets:"
#> [1] ""

clean()

make(plan, memory_strategy = "autoclean")
#> â–¶ target x
#> [1] "target: x"
#> [1] "loaded targets:"
#> [1] "loaded subtargets:"
#> [1] ""
#> â–¶ dynamic y
#> > subtarget y_0b3474bd
#> [1] "target: y_0b3474bd"
#> [1] "loaded targets: x"
#> [1] "loaded subtargets: x"
#> [1] ""
#> > subtarget y_b2a5c9b8
#> [1] "target: y_b2a5c9b8"
#> [1] "loaded targets: x"
#> [1] "loaded subtargets: x"
#> [1] ""
#> â–  finalize y
#> [1] "target: y"
#> [1] "loaded targets: x"
#> [1] "loaded subtargets:"
#> [1] ""
#> â–¶ dynamic z
#> > subtarget z_0b3474bd
#> [1] "target: z_0b3474bd"
#> [1] "loaded targets: y_0b3474bd"
#> [1] "loaded subtargets: y"
#> [1] ""
#> > subtarget z_b2a5c9b8
#> [1] "target: z_b2a5c9b8"
#> [1] "loaded targets: y_b2a5c9b8"
#> [1] "loaded subtargets: y"
#> [1] ""
#> â–  finalize z
#> [1] "target: z"
#> [1] "loaded targets: y"
#> [1] "loaded subtargets:"
#> [1] ""
#> â–¶ dynamic w
#> > subtarget w_0b3474bd
#> [1] "target: w_0b3474bd"
#> [1] "loaded targets: z_0b3474bd"
#> [1] "loaded subtargets: z"
#> [1] ""
#> > subtarget w_b2a5c9b8
#> [1] "target: w_b2a5c9b8"
#> [1] "loaded targets: z_b2a5c9b8"
#> [1] "loaded subtargets: z"
#> [1] ""
#> â–  finalize w
#> [1] "target: w"
#> [1] "loaded targets: z"
#> [1] "loaded subtargets:"
#> [1] ""

Created on 2020-06-22 by the reprex package (v0.3.0)

Ran this below and never saw any browsing happen (sorry posted at the same time). What did I do wrong?

library(broom)
library(drake)
library(gapminder)
library(tidyverse)

# Split the Gapminder data by continent.
gapminder_continents <- function() {
  gapminder %>%
    mutate(gdpPercap = scale(gdpPercap)) %>%
    split(f = .$continent)
}

# Fit a model to a continent.
fit_model <- function(continent_data) {
  data <- continent_data[[1]]
  data %>%
    lm(formula = gdpPercap ~ year) %>%
    tidy() %>%
    mutate(continent = data$continent[1]) %>%
    dplyr::select(continent, term, statistic, p.value)
}

plan <- drake_plan(
  continents = gapminder_continents(),
  model = target(fit_model(continents), dynamic = map(continents))
)

clean(model)
debug(drake:::discard_dynamic)
make(plan, memory_strategy = "autoclean")
#> â–¶ target continents
#> â–¶ dynamic model
#> > subtarget model_c56e5407
#> > subtarget model_706a1529
#> > subtarget model_da843806
#> > subtarget model_862f8003
#> > subtarget model_ebb41f51
#> â–  finalize model

Created on 2020-06-23 by the reprex package (v0.3.0)

#> > subtarget z_0b3474bd
#> [1] "target: z_0b3474bd"
#> [1] "loaded targets: x y y_0b3474bd"
#> [1] "loaded subtargets: y"
#> [1] ""

Not sure I understand this output - shouldn't it only have y_0b3474bd loaded? Not x, y, or the second y?

Objects in the sub-target environment get the names of the parents. So loaded subtargets: x really just refers to the slice of x that y_0b3474bd needs. But x the static target needs to be in memory too under "loaded targets" because x is an irreducible static target.

For the purposes of dynamic branching internals, the "y" under "loaded targets" is a vector of hashes, which needs to be loaded to select sub-targets. When the user references whole aggregates of dynamic targets, all the values are aggregated and put in a different environment called config$envir_dynamic. But that only happens if absolutely necessary.

In https://github.com/ropensci/drake/issues/1284#issuecomment-647780508, every model sub-target always needs continents, so there is nothing superfluous to discard. If you want to launch into the debugger, you could add dynamic targets downstream of model.

I should also point out that I was finding memory accumulation in a single dynamic target with many subtargets

The output in https://github.com/ropensci/drake/issues/1284#issuecomment-647784061 is using memory_strategy = "speed", which does not attempt to unload anything.

General reports of memory issues are difficult for me to reproduce, so a reprex would really help.

Also, I wonder if #1257 is related. cc @Plebejer

sorry I was reading your code too fast! I will try and see if I can reproduce

This looks pretty good:

library(drake)
plan <- drake_plan(
  x = 1:6,
  y = target({
    print(pryr::mem_used())
    x*1:1e+07
  }, dynamic = map(x)),
)
clean(y)
make(plan, memory_strategy = "autoclean")
#> Registered S3 method overwritten by 'pryr':
#>   method      from
#>   print.bytes Rcpp
#> ℹ Consider drake::r_make() to improve robustness.
#> â–¶ target x
#> â–¶ dynamic y
#> > subtarget y_0b3474bd
#> 54.1 MB
#> > subtarget y_b2a5c9b8
#> 54.3 MB
#> > subtarget y_71f311ad
#> 54.3 MB
#> > subtarget y_98cf3c11
#> 54.3 MB
#> > subtarget y_0a86c9cb
#> 54.3 MB
#> > subtarget y_cb15b01f
#> 54.3 MB
#> â–  finalize y

Created on 2020-06-23 by the reprex package (v0.3.0)

I'm going to try running pryr::mem_used() at the start of building each subtarget to see if I can see memory accumulating in the actual example. Lot's of differences in the actual call:

make(myplan, 
     verbose = 1,
     jobs_preprocess = 4,
     packages = c("tidyverse",
                  "drake",
                  "sf",
                  "weatherdata",
                  "mapview",
                  "lubridate",
                  "terra",
                  "progress"),
     log_make = "make_all.log",
     jobs = 100,
     # jobs = 400,
     caching = "worker",
     lazy_load = "promise",
     targets = "vcsn_daily_data",
     memory_strategy = "autoclean",
     parallelism = "clustermq",
     template = list(
       memory = round(20*1024), # MBs
       walltime = 120, # minutes
       log_file = "make.log",
       partition = "large"),
     garbage_collection = TRUE,
     keep_going = TRUE)

So I did see some accumulation:

1.61 GB
Warning: target vcsn_daily_data_b419b3b1 warnings:
  The `x` argument of `as_tibble.matrix()` must have column names if `.name_repair` is omitted as of tibble 2.0.0.
Using compatibility `.name_repair`.
This warning is displayed once every 8 hours.
Call `lifecycle::last_warnings()` to see where this warning was generated.
ℹ target vcsn_daily_data_b419b3b1 messages:
  Registered S3 method overwritten by 'pryr':
  method      from
  print.bytes Rcpp
2020-06-23 11:08:43.906330 | eval'd: drake::cmq_buildtargetmetadepsspecconfig_tmpconfig
2020-06-23 11:08:44.086579 | > DO_CALL (0.001s wait)
2.78 GB
✖ fail vcsn_daily_data_e59aa2a3
2020-06-23 11:15:52.284250 | eval'd: drake::cmq_buildtargetmetadepsspecconfig_tmpconfig
2020-06-23 11:15:52.335304 | > DO_CALL (0.002s wait)
2.78 GB
slurmstepd: error: *** JOB 13179027 ON wbn222 CANCELLED AT 2020-06-22T23:16:47 ***

Gonna also try out a minimal example

Minimal example looks good - gonna try move it closer to the real one

Was the minimal one also running the same HPC settings? And is there a possibility that your data structures are carrying stowaway environments along with them, like with ggplot2 objects?

Same HPC settings and no ggplot or lm objects stowing away environments.

In https://github.com/ropensci/drake/issues/1284#issuecomment-647821699, it looks like the worker was using 1.61 GB, then 2.78 GB, and then 2.78 GB again for three targets. As long as memory is leveling off, autoclean and garbage collection are doing their jobs. I usually expect some accumulation to happen because of the dependencies and the return values have to exist simultaneously in memory some of the time. What I'm really concerned about is if memory increases steadily without bound as in #1257.

By the way, which custom formats are you using for the large targets? fst's multithreading takes a lot of memory. So if you are using mostly fst-based formats, maybe try fst::threads_fst(1) in prework or in the target commands themselves? Not sure if the same considerations are relvant to qs.

Just did a smaller one that could go further than 3 targets and I was seeing it level off as well - I was using fst_tbl so it might spike at that stage in a way that's a bit random. I will see how it goes with a bit more memory room but will close here for now. Thanks heaps for help troubleshooting!

Was this page helpful?
0 / 5 - 0 ratings

Related issues

htlin picture htlin  Â·  4Comments

rsangole picture rsangole  Â·  3Comments

bart1 picture bart1  Â·  7Comments

wlandau picture wlandau  Â·  4Comments

AlexAxthelm picture AlexAxthelm  Â·  8Comments