Drake: How to create a jagged cross() transform

Created on 31 Jan 2019 · 4Comments · Source: ropensci/drake

I am hoping to do a cross() transform but I wouldn't want a complete cross product - rather a jagged version instead, e.g.:

plan <- drake_plan(
  s_load = target(
    load_csv(group, rep),
    transform = cross(
      group = c("G1", "G2"),
      rep = c("R1", "R2", "R3", "R4", "R5", "R6")
    )
  )
)

For example, my group G1 has rep R1-R6, but G2 only has R1-R4 which is missing R5-R6.
My function load_csv is searching for input files to read, in this case Gx_Ry.csv for example, but I don't have G2_R5.csv and G2_R6.csv and so it fails with files not found for those two targets.
Any recommendations would be appreciated, thanks!

api faq

Source

htlin

All 4 comments

Another nice one for the FAQ. Fortunately, this is straightforward if you create your own grid in advance and then use map().

library(drake)
library(tidyverse)

grid <- crossing(
  group = c("G1", "G2"),
  rep = c("R1", "R2", "R3", "R4", "R5", "R6")
) %>%
  filter(!(group == "G2" & rep %in% c("R5", "R6")))

drake_plan(
  s_load = target(
    load_csv(group, rep),
    transform = map(
      group = !!grid$group,
      rep = !!grid$rep
    )
  )
)
#> # A tibble: 10 x 2
#>    target           command                   
#>    <chr>            <chr>                     
#>  1 s_load_.G1._.R1. "load_csv(\"G1\", \"R1\")"
#>  2 s_load_.G1._.R2. "load_csv(\"G1\", \"R2\")"
#>  3 s_load_.G1._.R3. "load_csv(\"G1\", \"R3\")"
#>  4 s_load_.G1._.R4. "load_csv(\"G1\", \"R4\")"
#>  5 s_load_.G1._.R5. "load_csv(\"G1\", \"R5\")"
#>  6 s_load_.G1._.R6. "load_csv(\"G1\", \"R6\")"
#>  7 s_load_.G2._.R1. "load_csv(\"G2\", \"R1\")"
#>  8 s_load_.G2._.R2. "load_csv(\"G2\", \"R2\")"
#>  9 s_load_.G2._.R3. "load_csv(\"G2\", \"R3\")"
#> 10 s_load_.G2._.R4. "load_csv(\"G2\", \"R4\")"

Created on 2019-01-31 by the reprex package (v0.2.1.9000)

wlandau on 31 Jan 2019

👍1

Nice! Thanks for the solution.
Another thought I have now is that, can I make a target that tries to find all available files, and then dynamically generate (like yield in Python perhaps) named targets accordingly?

htlin on 31 Jan 2019

Sounds like #685, which many people have requested. In drake the plan needs to be fully written out before you call make(), which may limit what I think you are describing.

But if the files you mention are all available before you write the plan, then yes, you can write a plan whose target names are automatically generated.

library(drake)
files <- list.files("dir")
plan <- drake_plan(s_load = target(load_csv(file), transform = map(file = !!files)))

wlandau on 1 Feb 2019

👍1

720 will make custom grids easier. Check this out:

library(drake)
library(tidyverse)

grid <- crossing(
  group = c("G1", "G2"),
  rep = c("R1", "R2", "R3", "R4", "R5", "R6")
) %>%
  filter(!(group == "G2" & rep %in% c("R5", "R6")))

drake_plan(
  s_load = target(
    load_csv(group, rep),
    transform = map(.data = !!grid)
  )
)
#> # A tibble: 10 x 2
#>    target           command             
#>    <chr>            <expr>              
#>  1 s_load_.G1._.R1. load_csv("G1", "R1")
#>  2 s_load_.G1._.R2. load_csv("G1", "R2")
#>  3 s_load_.G1._.R3. load_csv("G1", "R3")
#>  4 s_load_.G1._.R4. load_csv("G1", "R4")
#>  5 s_load_.G1._.R5. load_csv("G1", "R5")
#>  6 s_load_.G1._.R6. load_csv("G1", "R6")
#>  7 s_load_.G2._.R1. load_csv("G2", "R1")
#>  8 s_load_.G2._.R2. load_csv("G2", "R2")
#>  9 s_load_.G2._.R3. load_csv("G2", "R3")
#> 10 s_load_.G2._.R4. load_csv("G2", "R4")

^{Created on 2019-02-07 by the reprex package (v0.2.1.9000)}

wlandau on 7 Feb 2019

🎉1

Was this page helpful?

0 / 5 - 0 ratings

Related issues

Combining `workflowr` and `drake`: A perfect match

pat-s · 5Comments

Search branch until finding something out of date in parallel

kendonB · 10Comments

Interested in trying the rOpenSci pkgdown template?

maelle · 8Comments

Prohibit dynamic branching over unbranched dynamic files.

wlandau · 4Comments

Fully embrace vctrs for dynamic branching

wlandau · 4Comments