Drake: Command that writes a file always runs

Created on 8 Feb 2018  路  3Comments  路  Source: ropensci/drake

One of the steps in my project requires the creation of a spatial data file in the geopackage file format (.gpkg).

Every time I make() the project plan this file gets re-written, which slows down the project considerably because it's a huge file.

This reprex shows this behavior - essentially, the nc.gpkg target is never up-to-date:

library(drake)
library(sf)
## Linking to GEOS 3.6.1, GDAL 2.2.0, proj.4 4.9.3


nc <- read_sf(system.file("shape/nc.shp", package = "sf"))

plan <- drake_plan(
  nc.gpkg = overwrite_gpkg(nc, "nc.gpkg")
)

overwrite_gpkg <- function(obj, dsn) {
  st_write(obj, dsn, layer_options = "OVERWRITE=true")
}


make(plan)
## cache C:\Users\lexi\AppData\Local\Temp\RtmpeUCpo1\.drake
## connect 3 imports: nc, overwrite_gpkg, plan
## connect 1 target: nc.gpkg
## Warning: missing input files:
##   nc.gpkg
## check 3 items: 'nc.gpkg', nc, st_write
## Warning: File 'nc.gpkg' was built or processed,
## but the file itself does not exist.
## check 1 item: overwrite_gpkg
## check 1 item: nc.gpkg
## target nc.gpkg
## Writing layer `nc' to data source `nc.gpkg' using driver `GPKG'
## options:        OVERWRITE=true 
## features:       100
## fields:         14
## geometry type:  Multi Polygon

# make the plan again - `nc.gpkg` _should_ be up-to-date
make(plan)
## cache C:/Users/lexi/AppData/Local/Temp/RtmpeUCpo1/.drake
## Unloading targets from environment:
##   nc.gpkg
## connect 3 imports: nc, overwrite_gpkg, plan
## connect 1 target: nc.gpkg
## check 3 items: 'nc.gpkg', nc, st_write
## check 1 item: overwrite_gpkg
## check 1 item: nc.gpkg
## target nc.gpkg
## Updating layer `nc' to data source `C:\Users\lexi\AppData\Local\Temp\RtmpeUCpo1\nc.gpkg' using driver `GPKG'
## options:        OVERWRITE=true 
## features:       100
## fields:         14
## geometry type:  Multi Polygon

# make the plan again ...
make(plan)
## cache C:/Users/lexi/AppData/Local/Temp/RtmpeUCpo1/.drake
## Unloading targets from environment:
##   nc.gpkg
## connect 3 imports: nc, overwrite_gpkg, plan
## connect 1 target: nc.gpkg
## check 3 items: 'nc.gpkg', nc, st_write
## check 1 item: overwrite_gpkg
## check 1 item: nc.gpkg
## target nc.gpkg
## Updating layer `nc' to data source `C:\Users\lexi\AppData\Local\Temp\RtmpeUCpo1\nc.gpkg' using driver `GPKG'
## options:        OVERWRITE=true 
## features:       100
## fields:         14
## geometry type:  Multi Polygon

Session info

devtools::session_info()
## Session info -------------------------------------------------------------
##  setting  value                       
##  version  R version 3.4.2 (2017-09-28)
##  system   x86_64, mingw32             
##  ui       RTerm                       
##  language (EN)                        
##  collate  English_United States.1252  
##  tz       America/Los_Angeles         
##  date     2018-02-07
## Packages -----------------------------------------------------------------
##  package      * version    date       source                           
##  backports      1.1.1      2017-09-25 CRAN (R 3.4.1)                   
##  base         * 3.4.2      2017-09-28 local                            
##  class          7.3-14     2015-08-30 CRAN (R 3.4.2)                   
##  classInt       0.1-24     2017-04-16 CRAN (R 3.4.2)                   
##  codetools      0.2-15     2016-10-05 CRAN (R 3.4.2)                   
##  compiler       3.4.2      2017-09-28 local                            
##  crayon         1.3.4      2017-11-16 Github (r-lib/crayon@b5221ab)    
##  datasets     * 3.4.2      2017-09-28 local                            
##  DBI            0.7        2017-06-18 CRAN (R 3.4.1)                   
##  devtools       1.13.4     2017-11-09 CRAN (R 3.4.2)                   
##  digest         0.6.15     2018-01-28 CRAN (R 3.4.3)                   
##  drake        * 5.0.1.9001 2018-02-07 Github (ropensci/drake@bcca469)  
##  e1071          1.6-8      2017-02-02 CRAN (R 3.4.2)                   
##  evaluate       0.10.1     2017-06-24 CRAN (R 3.4.2)                   
##  formatR        1.5        2017-04-25 CRAN (R 3.4.3)                   
##  future         1.6.2      2017-10-16 CRAN (R 3.4.3)                   
##  future.apply   0.1.0      2018-01-15 CRAN (R 3.4.3)                   
##  globals        0.11.0     2018-01-10 CRAN (R 3.4.3)                   
##  graphics     * 3.4.2      2017-09-28 local                            
##  grDevices    * 3.4.2      2017-09-28 local                            
##  grid           3.4.2      2017-09-28 local                            
##  htmltools      0.3.6      2017-04-28 CRAN (R 3.4.1)                   
##  htmlwidgets    1.0        2018-01-20 CRAN (R 3.4.3)                   
##  igraph         1.1.2      2017-07-21 CRAN (R 3.4.3)                   
##  jsonlite       1.5        2017-06-01 CRAN (R 3.4.1)                   
##  knitr          1.19       2018-01-29 CRAN (R 3.4.3)                   
##  listenv        0.6.0      2015-12-28 CRAN (R 3.4.3)                   
##  lubridate      1.7.1      2017-11-03 CRAN (R 3.4.2)                   
##  magrittr       1.5        2014-11-22 CRAN (R 3.4.1)                   
##  memoise        1.1.0      2017-04-21 CRAN (R 3.4.2)                   
##  methods      * 3.4.2      2017-09-28 local                            
##  parallel       3.4.2      2017-09-28 local                            
##  pillar         1.1.0      2018-01-14 CRAN (R 3.4.3)                   
##  pkgconfig      2.0.1      2017-03-21 CRAN (R 3.4.1)                   
##  plyr           1.8.4      2016-06-08 CRAN (R 3.4.1)                   
##  R.methodsS3    1.7.1      2016-02-16 CRAN (R 3.4.1)                   
##  R.oo           1.21.0     2016-11-01 CRAN (R 3.4.1)                   
##  R.utils        2.6.0      2017-11-05 CRAN (R 3.4.2)                   
##  R6             2.2.2      2017-06-17 CRAN (R 3.4.1)                   
##  Rcpp           0.12.15    2018-01-20 CRAN (R 3.4.3)                   
##  rlang          0.1.6      2017-12-21 CRAN (R 3.4.3)                   
##  rmarkdown      1.8        2017-11-17 CRAN (R 3.4.2)                   
##  rprojroot      1.3-2      2018-01-03 CRAN (R 3.4.3)                   
##  sf           * 0.6-1      2018-01-23 Github (r-spatial/sf@349afa8)    
##  stats        * 3.4.2      2017-09-28 local                            
##  storr          1.1.3      2017-12-15 CRAN (R 3.4.3)                   
##  stringi        1.1.6      2017-11-17 CRAN (R 3.4.2)                   
##  stringr        1.2.0      2017-02-18 CRAN (R 3.4.1)                   
##  testthat       2.0.0      2017-12-13 CRAN (R 3.4.3)                   
##  tibble         1.4.1.9000 2018-01-17 Github (tidyverse/tibble@64fedbd)
##  tools          3.4.2      2017-09-28 local                            
##  udunits2       0.13       2016-11-17 CRAN (R 3.4.1)                   
##  units          0.5-1      2018-01-08 CRAN (R 3.4.3)                   
##  utils        * 3.4.2      2017-09-28 local                            
##  visNetwork     2.0.3      2018-01-09 CRAN (R 3.4.3)                   
##  withr          2.1.1.9000 2018-01-17 Github (jimhester/withr@df18523) 
##  yaml           2.1.14     2016-11-12 CRAN (R 3.4.1)

question

Most helpful comment

Your suggested fix works!

Now that I know what's going on it will be easy to avoid making the same mistake, at least until those improvements come online and it is no longer a concern.


A brief word of encouragement: I think you're providing a solution to a _major_ problem in many users' workflows. It has been a treat to watch the package go through a rapid evolution over the past few days. Keep up the great work and I'm sure you'll have many more grateful users in the coming months 馃憤

All 3 comments

I expected this to arise sooner or later. Single quotes denote reproducibly-tracked files, and double quotes are literal strings. drake_plan() does not have total control over parsing, so it errs on the side of turning quotes into single quotes. In the plan you have, drake thinks nc.gpkg is an imported file, not an output file target.

plan <- drake_plan(
  nc.gpkg = overwrite_gpkg(nc, "nc.gpkg")
)

plan
##    target                       command
## 1 nc.gpkg overwrite_gpkg(nc, 'nc.gpkg')

vis_drake_graph(drake_config(plan)) # Squares are file targets/imports

screenshot_20180207_203649

I think what you want is this:

plan <- drake_plan(
  nc.gpkg = overwrite_gpkg(nc, "nc.gpkg"),
  file_targets = TRUE,
  strings_in_dots = "literals"
)

plan
##      target                       command
## 1 'nc.gpkg' overwrite_gpkg(nc, "nc.gpkg")

vis_drake_graph(drake_config(plan))

screenshot_20180207_203913

Please let me know if that works for you.

I know it's a weird interface. I did my best to document it, but there is still confusion. Things will improve in #233 and especially #232.

Your suggested fix works!

Now that I know what's going on it will be easy to avoid making the same mistake, at least until those improvements come online and it is no longer a concern.


A brief word of encouragement: I think you're providing a solution to a _major_ problem in many users' workflows. It has been a treat to watch the package go through a rapid evolution over the past few days. Keep up the great work and I'm sure you'll have many more grateful users in the coming months 馃憤

I am glad to hear the solution worked.

Your support means a lot to me. drake has been my favorite project ever since its inception, and it is so wonderful to see the uptake.

Was this page helpful?
0 / 5 - 0 ratings

Related issues

kendonB picture kendonB  路  27Comments

wlandau-lilly picture wlandau-lilly  路  41Comments

wlandau picture wlandau  路  29Comments

wlandau-lilly picture wlandau-lilly  路  29Comments

tmastny picture tmastny  路  27Comments