I'm trying to use drake to write files to a non-working directory. Unfortunately, there isn't a great way to do this right now, other than passing a literal string of the fully realized path at the time I'm writing a plan. I might be able to gimick something together using evaluate_plan(), but this seems unpleasant, espescially when the file_ family is already a series of useful exceptions.
library("tidyverse")
library("drake")
setwd(tempdir())
dir.create("outfiles")
packageVersion("drake") #installed from commit 04c6dfc
path_var <- file.path("outfiles")
file_name_var <- file.path(path_var, "file_name_var.csv")
this_plan <- drake_plan(
data = mtcars,
write_csv(x = data, path = file_out("rootdir.csv")), # cool
write_csv(x = data, path = file_out("outfiles/singlestring.csv")), # cool
write_csv(x = data, path = file_out(file.path("outfiles", "filepath.csv"))), # bad, coerces filename
write_csv(x = data, path = file_out(file.path(path_var, "path_var.csv"))), # bad, coerces filename
pathname = file.path("outfiles", "drake_obj.csv"),
write_csv(x = data, path = file_out(pathname)), # bad, thinks file_out is empty
write_csv(x = data, path = file_out(file_name_var)) # bad, doesn't know what to do
)
make(this_plan)
Suggested behavior: drake_plan should try to evaluate the path argument of file_out (& friends) if it is not already a single character string, or, more accurately, if the argument is a function call.
This is technically a duplicate of #347, but I like your explanation. I still need convincing, though. It would be difficult to evaluate calls to file.path() inside file_out() in-place, and it may mislead users into thinking they can define path_var and file_name_var as targets in the workflow plan. I do not think we will see this capability until we have the delayed evaluation we need for #233 and #304.
For now, I think wildcards have the intended result.
library(drake)
library(magrittr)
drake_plan(
data = mtcars,
write_csv(data, file_out("DIR1/file1.csv")),
write_csv(data, file_out("DIR2/file2.csv"))
) %>%
evaluate_plan(
rules = list(DIR1 = "out1", DIR2 = "out2"),
expand = FALSE
)
#> # A tibble: 3 x 2
#> target command
#> <chr> <chr>
#> 1 data mtcars
#> 2 "\"out1/file1.csv\"" "write_csv(data, file_out(\"out1/file1.csv\"))"
#> 3 "\"out2/file2.csv\"" "write_csv(data, file_out(\"out2/file2.csv\"))"
Just stumbled on that To me, this was unexpected. An alternative might be to use a special class
for files to distinguish them from literal strings, but I don't have enough understanding of the drake internals to have an educated opinion about that.
I did not know about evaluate_plan() though. I was just wondering if it is really a good idea to replace all text that matches the rule with some other text in evaluate_plan()
Maybe it is worth establishing a convention to use something like using stars around all "variables", e.g. *KEY*. If you have a large plan, it may be dangerous to do a simple text replacement, as it may have unexpected consequences:
library(drake)
library(magrittr)
drake_plan(
data = mtcars,
a = write_csv(data, file_out("DIR1/file1.csv")),
b = write_csv(data, file_out("DIR2/file2.csv")),
c = i_have_a_function_with_DIR1(x)
) %>%
evaluate_plan(
rules = list(DIR1 = "out1", DIR2 = "out2"),
expand = FALSE
)
#> # A tibble: 4 x 2
#> target command
#> <chr> <chr>
#> 1 data mtcars
#> 2 "\"*out1*/file1.csv\"" write_csv(data, file_out('out1/file1.csv'))
#> 3 "\"*out2*/file2.csv\"" write_csv(data, file_out('out2/file2.csv'))
#> 4 c i_have_a_function_with_out1(x)
Which could be avoided with
drake_plan(
data = mtcars,
a = write_csv(data, file_out("*DIR1*/file1.csv")),
b = write_csv(data, file_out("*DIR2*/file2.csv")),
c = i_have_a_function_with_DIR1(x)
) %>%
evaluate_plan(
rules = list(`*DIR1*` = "out1", `*DIR2*` = "out2"),
expand = FALSE
)
#> # A tibble: 4 x 2
#> target command
#> <chr> <chr>
#> 1 data mtcars
#> 2 "\"out1/file1.csv\"" write_csv(data, file_out('out1/file1.csv'))
#> 3 "\"out2/file2.csv\"" write_csv(data, file_out('out2/file2.csv'))
#> 4 c i_have_a_function_with_DIR1(x)
So using such a convention may be safer. One could adapt the help file to encourage the use of this convention.
Regarding a special file name class, @krlmlr has suggested this too, and rightly so. But at this point, the current convention is so embedded in the internals that it is not likely to change anytime soon.
As for the convention around wildcards, that's a great point. I think we need explicit guidance in the docs, and probably an entire vignette on wildcard templating. I do, however, want users to be able to define any wildcards they want, since I think enforcing strict restrictions would add friction for newcomers and might not solve the problem.
In plan_analyses() and plan_summaries(), the wildcards are analysis__ and dataset___. Personally, I am a fan of the trailing double underscore. People coming from snakemake might prefer curly braces, though.
I do, however, want users to be able to define any wildcards they want, since I think enforcing strict restrictions would add friction for newcomers and might not solve the problem.
I think that's reasonable. Also, for consistency, suggesting analysis__ and dataset__soundes good too.
I agree that in the short term at least, adding a note to best practices vignette might be a good call. Personally I use TK_variable_TK, since I know that’s pretty much _never_ going to come up in actual useful code (for me).
From: Lorenz Walthert [mailto:[email protected]]
Sent: Sunday, May 20, 2018 5:36 PM
To: ropensci/drake drake@noreply.github.com
Cc: Axthelm, Alex (CHE) AAxthelm@che.in.gov; Author author@noreply.github.com
Subject: Re: [ropensci/drake] evaluate file.path and variables in file_out and friends (#353)
* This is an EXTERNAL email. Exercise caution. DO NOT open attachments or click links from unknown senders or unexpected email. *
Just stumbled on that To me, this was unexpected. An alternative might be to use a special class
for files to distinguish them from literal strings, but I don't have enough understanding of the drake internals to have an educated opinion about that.
I did not know about evaluate_plan() though. I was just wondering if it is really a good idea to replace all text that matches the rule with some other text in evaluate_plan()
Maybe it is worth establishing a convention to use something like using stars around all "variables", e.g. KEY. If you have a large plan, it may be dangerous to do a simple text replacement, as it may have unexpected consequences:
library(drake)
library(magrittr)
drake_plan(
data = mtcars,
write_csv(data, file_out("DIR1/filDIR1e1.csv")),
write_csv(data, file_out("DIR2/file2.csv"))
) %>%
evaluate_plan(
rules = list(DIR1 = "out1", DIR2 = "out2"),
expand = FALSE
)
So using such a convention may be safer.
library(drake)
library(magrittr)
drake_plan(
data = mtcars,
write_csv(data, file_out("*DIR1*/filDIR1e1.csv")),
write_csv(data, file_out("*DIR2*/file2.csv"))
) %>%
evaluate_plan(
rules = list(`*DIR1*` = "out1", `*DIR2*` = "out2"),
expand = FALSE
)
One could adapt the help file to encourage the use of this convention.
—
You are receiving this because you authored the thread.
Reply to this email directly, view it on GitHubhttps://github.com/ropensci/drake/issues/353#issuecomment-390514716, or mute the threadhttps://github.com/notifications/unsubscribe-auth/AC1kDvJ0ztGGbfjJL4GSp3OG0ykyeuinks5t0eGvgaJpZM4TFgW2.
Is there a nice solution to this you can think of with the new DSL? A colleague came up against this today.
Tidy evaluation now handles this. !! inside file_out() etc. works the same as in the DSL.
library(drake)
path_var <- file.path("outfiles")
file_name_var <- file.path(path_var, "file_name_var.csv")
drake_plan(
data = mtcars,
write_csv(data, file_out("rootdir.csv")),
write_csv(data, file_out("outfiles/singlestring.csv")),
write_csv(data, file_out(!!file.path("outfiles", "filepath.csv"))),
write_csv(data, file_out(!!file.path(path_var, "path_var.csv"))),
write_csv(data, file_out(!!file_name_var))
)
#> # A tibble: 6 x 2
#> target command
#> <chr> <expr>
#> 1 data mtcars
#> 2 drake_target_1 write_csv(data, file_out("rootdir.csv"))
#> 3 drake_target_2 write_csv(data, file_out("outfiles/singlestring.csv"))
#> 4 drake_target_3 write_csv(data, file_out("outfiles/filepath.csv"))
#> 5 drake_target_4 write_csv(data, file_out("outfiles/path_var.csv"))
#> 6 drake_target_5 write_csv(data, file_out("outfiles/file_name_var.csv"))
Created on 2019-05-07 by the reprex package (v0.2.1)
Onlookers may be interested in #1178.
Most helpful comment
Tidy evaluation now handles this.
!!insidefile_out()etc. works the same as in the DSL.Created on 2019-05-07 by the reprex package (v0.2.1)