Drake: Control optional columns via `drake_plan()`

Created on 11 Mar 2018 · 13Comments · Source: ropensci/drake

From @krlmlr via #299.

Original suggestion

The following could be equivalent plans.

plan1 <- drake_plan(x = {1; always()})
plan2 <- tibble(target = "x", command = "1", trigger = "always")

Do we want to optionally all triggers this way?

If so, we also need functions any(), command(), depends(), file(), and missing() (everything from triggers()).
Which trigger should be used for tibble(target = "x", command = "{1; always()}", trigger = "depends")? I would think"depends"`, but I want to make sure.

What about the other optional workflow plan columns?

Triggers are not the only optional pieces of the workflow plan. We may want to consider something like

drake_plan(x = {1; column("trigger", "always")})

A drake::column() would conflict with shiny::column(), so we would definitely need another name.

help or input api

Source

wlandau

Most helpful comment

An optional drake_target() is my favorite option so far. And we can make the following all equivalent.

drake_plan(
  x = drake_target(
    target = x,
    command = 1 + 2
    trigger = "always",
    user_column_1 = 1,
    user_column_2 = "some text"
  )
)

drake_plan(
  x = drake_target(
    command = 1 + 2
    trigger = "always",
    user_column_1 = 1,
    user_column_2 = "some text"
  )
)

drake_plan(
  drake_target(
    target = x,
    command = 1 + 2
    trigger = "always",
    user_column_1 = 1,
    user_column_2 = "some text"
  )
)

wlandau on 12 Mar 2018

👍2

All 13 comments

I'd think the API would look like drake_plan(x = trigger_always(1)). Or perhaps:

drake_plan(
  x = drake_command(
    1,
    trigger = "always"
  )
)

Precedence is a matter of documenting it, I don't have a strong opinion.

krlmlr on 11 Mar 2018

At the very least, if using the first {1, any()} syntax, I would suggest something along the lines of drake_any() or trigger_any(), to avoid conflicts with base functions. Alternately, could we use something like plan1 <- drake_plan(x = {1; triggers("always")})?

To throw an entirely different idea into the ring, what if it was something along the lines of

plan1 <- drake_plan(x_triggeralways = 1)

where a new argument to drake_plan would be something like

trigger_substitution = list(
  any = "_triggerany",
  command = "_triggercommand",
  depends = "_triggerdepends",
  file = "_triggerfile",
  missing = "_triggermissing"
)

Then make or plan could strip the target tags, and leave the real target name. I don't know if this is putting too much meta into the target name, but we could expand to arbitrary columns by using a column_substitution instead.

AlexAxthelm on 11 Mar 2018

I like the idea of trigger("always") (triggers(), plural, already conflicts with a drake function). I am less of a fan of keyword suffixes because of the surprises they may cause. Remake has a target_name keyword to avoid graph loops, and even that makes me nervous. (Drake just removes loops because there is no reason to keep them: #216 and #222).

wlandau on 12 Mar 2018

An implementation as follow might be future proof as well as open many possibilities for many additional usage of drake.

drake_plan(
  x = drake_target(
    target = x,
    command = 1 + 2
    trigger = "always",
    user_column_1 = 1,
    user_column_2 = "some text"
  )
)

So the function drake_target() has two compulsary arguments (target and command) while the others are filled using default values. In addition, One can specify additional columns, which could accessible during the make and be passed on to a plugin interface in the make function. This could enable plugins so that the user can specify additional functionality e.g. before or after each build step (e.g. uploading of compiled targets to a server, making graphs, ...). Also, it could make the make process more transparent.

rkrug on 12 Mar 2018

👍1

An optional drake_target() is my favorite option so far. And we can make the following all equivalent.

drake_plan(
  x = drake_target(
    target = x,
    command = 1 + 2
    trigger = "always",
    user_column_1 = 1,
    user_column_2 = "some text"
  )
)

drake_plan(
  x = drake_target(
    command = 1 + 2
    trigger = "always",
    user_column_1 = 1,
    user_column_2 = "some text"
  )
)

drake_plan(
  drake_target(
    target = x,
    command = 1 + 2
    trigger = "always",
    user_column_1 = 1,
    user_column_2 = "some text"
  )
)

wlandau on 12 Mar 2018

👍2

I see only one problem, if all are allowed, namely in the following case

drake_plan(
  x = drake_target(
   target = differentName,
    command = 1 + 2
    trigger = "always",
    user_column_1 = 1,
    user_column_2 = "some text"
  )
)

what will be the name of the target - x or differentName? One precedence over the other and warning or error?

rkrug on 12 Mar 2018

@wlandau By the way - how do you get the syntax highlighting?

rkrug on 12 Mar 2018

My inclination is to go with differentName with a warning, unless command has file_out("filename"), in which case "filename" becomes the target name automatically. For R syntax highlighting, use

```r

</code></pre>

rather than 

<pre><code>```

wlandau on 12 Mar 2018

Thanks for the tip with the syntax highlighting.

As you mention the case with the "filename", I would suggest that the name in drake_target() gets precedence over the name on the left of the = as otherwise one will have a situation which is depending on what is in the command.

rkrug on 12 Mar 2018

2 thoughts:

first, the target argument to drake_target could be powerful/dangerous when used in tandem with delayed expansion (#233). But that would allow me to build a reasonable sub-plan and expand it in a useful way.

second, as an option which could be implemented quickly (without #233), we could add an argument trigger_rules or similar, which takes a named list:

drake_plan(
  x = 1,
  y = 2,
  z = x + y,
  a = z + z
  trigger_rules = list(z = "always")
)

with the idea that the named list would immediately expand and be joined as a column, so that as the plan gets expanded and gathered, the trigger would follow the command (and derivatives). Not sure if this would lead to scope creep on drake_plan though.

AlexAxthelm on 12 Mar 2018

Thinking more about drake_target(), I no longer think target should be an argument. Since we already have a convenient way to assign target names, it seems to add more ambiguity than value.

I have a similar hesitation about trigger_rules. I'm glad you brought it up, but it adds complications. With @rkrug's drake_target(), the trigger is right next to all the other information about the target, which makes things much easier to read. I think the spatial locality is more important than being able to see all your triggers in one place.

wlandau on 13 Mar 2018

I agree we should be supporting only the following syntax and not offer a target argument:

drake_plan(
  x = drake_target(
    command = 1 + 2
    trigger = "always",
    user_column_1 = 1,
    user_column_2 = "some text"
  )
)

krlmlr on 15 Mar 2018

I think just target() is enough, rather than drake_target(). I think the lack of a drake_ prefix is consistent with other drake_plan() functions: knitr_in(), ignore(), etc.

By the way: I have a PR coming up soon. Really excited about this feature!

wlandau on 30 Mar 2018

Was this page helpful?

0 / 5 - 0 ratings