Drake: How to keep track of static inputs to transformation in dynamic targets?

Created on 1 May 2020 · 3Comments · Source: ropensci/drake

Prework

[x] Read and abide by drake's code of conduct.
[x] Search for duplicates among the existing issues, both open and closed.
[x] If you think your question has a quick and definite answer, consider posting to Stack Overflow under the drake-r-package tag. (If you anticipate extended follow-up and discussion, you are already in the right place!)

Question

I have a dynamic target where some of the input variables are static in the sense that I know their values ahead of runtime. I want to keep track of the static input variables, throughout the whole plan. I managed to do this using read_trace, but the solution looks very verbose to me.

So I just would like to know if drake offers a simpler way to keep track of whatever input is provided to the dynamic transformations (cross, map, group).

Here is an example. (I could refactor this specific example to use static branching, but I cannot do so in my use case.)

plan <- drake::drake_plan(
  base = 2:3, # Assume these values are "dynamic" (not known before runtime)
  exponent = 1:3, # Assume these value "static" (known before runtime)

  exponential = target(
    base^exponent,
    dynamic = cross(
      base,
      exponent,
      .trace = c(base, exponent)
    )
  ),

  base_tr = read_trace('base', exponential),
  exp_tr = read_trace('exponent', exponential),

  exponential_plus_one = target(
    as.data.frame(
      list(
        base = base_tr,
        exponent = exp_tr,
        final_result = exponential + 1
      )
    ),
    dynamic = map(exponential, base_tr, exp_tr)
  )
)

drake::make(plan)
readd(exponential_plus_one)

question

Source

djbirke

All 3 comments

Yeah, I regret implementing traces in dynamic branching because they turned out to be inelegant. I recommend choosing data frames or tibbles to store base and exponent as indicator columns so you can keep track of inputs. If you have more complicated objects than numerics, you can store those results in list columns instead of numeric columns. If you do not need list columns, you can take advantage of format = "fst_tbl" to write and read large data frames faster.

library(drake)
library(tidyverse)

plus_one <- function(exponential) {
  mutate(exponential, result = result + 1)
}

plan <- drake_plan(
  base = 2:3,
  exponent = 1:3,
  exponential = target(
    tibble(result = base ^ exponent, base = base, exponent = exponent),
    dynamic = cross(base, exponent),
    format = "fst_tbl" # Writes and reads data frames faster if no columns are list columns.
  ),
  exponential_plus_one = target(
    plus_one(exponential),
    dynamic = map(exponential),
    format = "fst_tbl" # Writes and reads data frames faster if no columns are list columns.
  )
)

make(plan)
#> ▶ target exponent
#> ▶ target base
#> ▶ dynamic exponential
#> > subtarget exponential_692231c3
#> > subtarget exponential_ba77a266
#> > subtarget exponential_9851881a
#> > subtarget exponential_e7bdf64f
#> > subtarget exponential_2b317cd8
#> > subtarget exponential_f0968d38
#> ■ finalize exponential
#> ▶ dynamic exponential_plus_one
#> > subtarget exponential_plus_one_f896675f
#> > subtarget exponential_plus_one_46de53a9
#> > subtarget exponential_plus_one_174981ce
#> > subtarget exponential_plus_one_1db0e1ae
#> > subtarget exponential_plus_one_c2d13182
#> > subtarget exponential_plus_one_b7d84cba
#> ■ finalize exponential_plus_one

readd(exponential)
#> # A tibble: 6 x 3
#>   result  base exponent
#>    <dbl> <int>    <int>
#> 1      2     2        1
#> 2      4     2        2
#> 3      8     2        3
#> 4      3     3        1
#> 5      9     3        2
#> 6     27     3        3

readd(exponential_plus_one)
#> # A tibble: 6 x 3
#>   result  base exponent
#>    <dbl> <int>    <int>
#> 1      3     2        1
#> 2      5     2        2
#> 3      9     2        3
#> 4      4     3        1
#> 5     10     3        2
#> 6     28     3        3

^{Created on 2020-05-01 by the reprex package (v0.3.0)}

Another reason data frames also go really well with drake is that dynamic branching maps over rows.

library(drake)
library(tidyverse)

plus_one <- function(exponential) {
  mutate(exponential, result = result + 1)
}

plan <- drake_plan(
  base = 2:3,
  exponent = 1:3,
  exponential = target(
    tibble(result = base ^ exponent, base = base, exponent = exponent),
    dynamic = cross(base, exponent),
    format = "fst_tbl" # Writes and reads data frames faster if no columns are lists.
  ),
  exponential_plus_one = target(
    plus_one(exponential),
    dynamic = map(exponential), # drake maps over the rows of data frames.
    format = "fst_tbl" # Writes and reads data frames faster if no columns are lists.
  )
)

make(plan)
#> ▶ target exponent
#> ▶ target base
#> ▶ dynamic exponential
#> > subtarget exponential_692231c3
#> > subtarget exponential_ba77a266
#> > subtarget exponential_9851881a
#> > subtarget exponential_e7bdf64f
#> > subtarget exponential_2b317cd8
#> > subtarget exponential_f0968d38
#> ■ finalize exponential
#> ▶ dynamic exponential_plus_one
#> > subtarget exponential_plus_one_f896675f
#> > subtarget exponential_plus_one_46de53a9
#> > subtarget exponential_plus_one_174981ce
#> > subtarget exponential_plus_one_1db0e1ae
#> > subtarget exponential_plus_one_c2d13182
#> > subtarget exponential_plus_one_b7d84cba
#> ■ finalize exponential_plus_one

readd(exponential)
#> # A tibble: 6 x 3
#>   result  base exponent
#>    <dbl> <int>    <int>
#> 1      2     2        1
#> 2      4     2        2
#> 3      8     2        3
#> 4      3     3        1
#> 5      9     3        2
#> 6     27     3        3

readd(exponential_plus_one)
#> # A tibble: 6 x 3
#>   result  base exponent
#>    <dbl> <int>    <int>
#> 1      3     2        1
#> 2      5     2        2
#> 3      9     2        3
#> 4      4     3        1
#> 5     10     3        2
#> 6     28     3        3

^{Created on 2020-05-01 by the reprex package (v0.3.0)}

wlandau on 1 May 2020

A disadvantage to the format above is if exponent is really a Keras model and it needs format = "keras", you cannot wrap a data frame around the value. In that case, you can create a metadata target to go alongside exponential and use cross on the same variables.

library(drake)
library(tidyverse)

plus_one <- function(exponential, meta) {
  bind_cols(meta, tibble(result = exponential + 1))
}

plan <- drake_plan(
  base = 2:3,
  exponent = 1:3,
  exponential = target(
    base ^ exponent,
    dynamic = cross(base, exponent)
  ),
  meta = target(
    tibble(base = base, exponent = exponent),
    dynamic = cross(base, exponent),
    format = "fst_tbl" # Writes and reads data frames faster if no columns are lists.
  ),
  exponential_plus_one = target(
    plus_one(exponential, meta),
    dynamic = map(exponential, meta), # drake maps over the rows of data frames.
    format = "fst_tbl" # Writes and reads data frames faster if no columns are lists.
  )
)

make(plan)
#> ▶ target exponent
#> ▶ target base
#> ▶ dynamic meta
#> > subtarget meta_692231c3
#> > subtarget meta_ba77a266
#> > subtarget meta_9851881a
#> > subtarget meta_e7bdf64f
#> > subtarget meta_2b317cd8
#> > subtarget meta_f0968d38
#> ■ finalize meta
#> ▶ dynamic exponential
#> > subtarget exponential_692231c3
#> > subtarget exponential_ba77a266
#> > subtarget exponential_9851881a
#> > subtarget exponential_e7bdf64f
#> > subtarget exponential_2b317cd8
#> > subtarget exponential_f0968d38
#> ■ finalize exponential
#> ▶ dynamic exponential_plus_one
#> > subtarget exponential_plus_one_a1a73847
#> > subtarget exponential_plus_one_3e509f9f
#> > subtarget exponential_plus_one_8721db51
#> > subtarget exponential_plus_one_f6bcdfc8
#> > subtarget exponential_plus_one_444eec91
#> > subtarget exponential_plus_one_32136d52
#> ■ finalize exponential_plus_one

readd(exponential)
#> [1]  2  4  8  3  9 27

readd(meta)
#> # A tibble: 6 x 2
#>    base exponent
#>   <int>    <int>
#> 1     2        1
#> 2     2        2
#> 3     2        3
#> 4     3        1
#> 5     3        2
#> 6     3        3

readd(exponential_plus_one)
#> # A tibble: 6 x 3
#>    base exponent result
#>   <int>    <int>  <dbl>
#> 1     2        1      3
#> 2     2        2      5
#> 3     2        3      9
#> 4     3        1      4
#> 5     3        2     10
#> 6     3        3     28

^{Created on 2020-05-01 by the reprex package (v0.3.0)}

wlandau on 1 May 2020

👍1

Thank you for your quicks answers, they were very helpful. Based on your advice I ended up refactoring my code: I replaced .trace with respective columns in a tibble, and indeed arrived at a much cleaner solution.

djbirke on 3 May 2020

👍1

Was this page helpful?

0 / 5 - 0 ratings