Drake: Fully embrace vctrs for dynamic branching

Created on 10 Dec 2019  路  4Comments  路  Source: ropensci/drake

Prework

Description

After writing out https://github.com/ropensci/drake/issues/1087#issuecomment-563656956, I think we should fully embrace vctrs for dynamic branching. It will technically be a breaking change for workflows that use readd() on dynamic targets in plans, but dynamic branching is so new that I am willing to let it slide. In the end, we gain type more type stability as described at https://vctrs.r-lib.org/articles/stability.html.

We want readd() and loadd() to be fully type-stable. Current behavior:

library(drake)
plan <- drake_plan(
  x = seq_len(2),
  y = mtcars[x, ],
  z = target(y, dynamic = group(y, .by = x))
)
make(plan)
#> In drake, consider r_make() instead of make(). r_make() runs make() in a fresh R session for enhanced robustness and reproducibility.
#> target x
#> target y
#> dynamic z
#> subtarget z_25909f33
#> subtarget z_7fb9021c
#> aggregate z

readd(z)
#> [[1]]
#>           mpg cyl disp  hp drat   wt  qsec vs am gear carb
#> Mazda RX4  21   6  160 110  3.9 2.62 16.46  0  1    4    4
#> 
#> [[2]]
#>               mpg cyl disp  hp drat    wt  qsec vs am gear carb
#> Mazda RX4 Wag  21   6  160 110  3.9 2.875 17.02  0  1    4    4

Created on 2019-12-10 by the reprex package (v0.3.0)

Desired behavior:

library(drake)
plan <- drake_plan(
  x = seq_len(2),
  y = mtcars[x, ],
  z = target(y, dynamic = group(y, .by = x))
)
make(plan)
#> In drake, consider r_make() instead of make(). r_make() runs make() in a fresh R session for enhanced robustness and reproducibility.
#> target x
#> target y
#> dynamic z
#> subtarget z_25909f33
#> subtarget z_7fb9021c
#> aggregate z

readd(z)
#>               mpg cyl disp  hp drat    wt  qsec vs am gear carb
#> Mazda RX4      21   6  160 110  3.9 2.620 16.46  0  1    4    4
#> Mazda RX4 Wag  21   6  160 110  3.9 2.875 17.02  0  1    4    4

We should be able to achieve similar type-stable aggregation across all types if we use vctrs::vec_c() in loadd() and readd(). Similarly, dynamic_subvalue() should just call vctrs::vec_slice(). And as long as we depend on vctrs, we should set meta$size to vctrs::vec_size() instead of NROW().

advanced priority

Most helpful comment

I like it, and I think it'll make dynamic branching simpler to use.

All 4 comments

cc @kendonB, @billdenney, @brendanf

I like it, and I think it'll make dynamic branching simpler to use.

Awesome!

Rethinking vec_size(). NROW() is faster, it agrees with vec_size() for the sensible cases, and does not error out when we have non-vectors.

Good news: with vec_slice() seems to be a lot faster than the existing dynamic_subvalue().

Was this page helpful?
0 / 5 - 0 ratings