I recently found out that, when saving to GPKG using sf::st_write(), if the target file already exists the current default setting is to append features to the already existing ones, such as in :
nc <- sf::st_read(system.file("shape/nc.shp", package="sf"), quiet = TRUE)
dim(nc)
[1] 100 15
sf::st_write(nc, "/home/lb/tmp/nc.gpkg")
Updating layer `nc' to data source `/home/lb/tmp/nc.gpkg' using driver `GPKG'
Writing 100 features with 14 fields and geometry type Multi Polygon.
sf::st_write(nc, "/home/lb/tmp/nc.gpkg")
Updating layer `nc' to data source `/home/lb/tmp/nc.gpkg' using driver `GPKG'
Updating existing layer nc
Writing 100 features with 14 fields and geometry type Multi Polygon.
dim(sf::st_read("/home/lb/tmp/nc.gpkg", quiet = TRUE))
[1] 200 15
while usually, when saving to "vector files" in other formats (e.g., json, shp, kml), the default is to fail on attempts to write to an existing file, unless delete_layer/delete_dsn is specified .
Although this is clearly documented, this took me by surprise and led to problems in my current analysis which (and I guess it could surprise also other users).
I was thus wondering if it could make sense to:
write_sf() defaults; OR/AND,
I am against considering GPKG as a non-database drivers. I can imagine to issue a warning if updating will take place _without_ update = TRUE being specified, although this will hurt users counting on the current behaviour. Opinions?
I would suggest issuing a message rather than a warning if an existing layer is being updated. I've been bitten by this, and, like Lorenzo, see GPKG as a file format first and foremost, like SQLite files, because the files are portable and need no server. My preference would be for the split to go between database formats needing a server, and file formats that don't need a user-visible server.
But in terms of user experience, a message is no different from the current output, is it?
but it tells them that what they are doing is having probably unintended consequences. I'd also suggest an option, which factory fresh would be what you prefer, but would be settable for the 99% of users who never run database servers to block overwrites, forcing them to delete_dsn=TRUE if they meet the problem. Think knitting an Rmd repeatedly and writing to GPKG ... and the file gets longer each time round, not what was intended.
I agree with Roger. The message on "append" should be in my opinion more verbose and informative. I think the current "Updating existing layer" is too succinct and may leave space for (mis)interpretation ("update" could even be interpreted as "overwrite", in this case).
I'd suggest a message such as:
"Adding new features to those already existing in layer nc - If you wish to overwrite it, use delete_layer = TRUE"
, or something similar.
On a side note, I was also wondering why it is considered "good practice" to allow "immediate" append to an existing DB table (that is, without "explicit" request by the user through manually setting update to TRUE). I can understand that the default on an existing table would be to append instead of overwrite, but it would still seem more "failproof" to me to always require explicit confirmation in case of write attempts on existing files/DB tables (in short, why not setting to FALSE by default also the update argument, instead of having it depending on the driver)?
HTH.
I don't like driver specific code, but if you agree on how this needs to be done then I'll accept a PR.
Writing to a database with odbc or RPostgres will raise an error if the table exists and overwrite = FALSE or append = FALSE. This is the default from the driver. Is the default for gpkg to append = TRUE?
@etiennebr
current behaviour is that st_write will automatically get an "update = TRUE" in case the driver is one of those included in db_drivers:
st_write.sf = function(obj, dsn, layer = NULL, ...,
driver = guess_driver_can_write(dsn),
dataset_options = NULL, layer_options = NULL, quiet = FALSE, factorsAsCharacter = TRUE,
update = driver %in% db_drivers, delete_dsn = FALSE, delete_layer = FALSE,
fid_column_name = NULL)
, with db_drivers being:
db_drivers
couchdb db2odbc dods gft mssql mysql oci
"CouchDB" "DB2ODBC" "DODS" "GFT" "MSSQLSpatial" "MySQL" "OCI"
odbc pg sde
"ODBC" "PostgreSQL" "SDE" "GPKG" "SQLite"
, thus "automatically" appending to existing GPKGs files and maybe other formats?) unless the user manually specifies update = FALSE (or `delete_layer = TRUE if he wants to overwrite).
@edzer In my opinion, the "best" way to go here could be to change the default value for update, so that the user always has to specifically decide what to do in case a layer already exists. This would "solve" the original issue, because then no "accidental" modifications would occur, and the message issued when trying to write on an existing GPKG is already clear and informative:
sf::st_write(nc, "/home/lb/tmp/nc.gpkg", update = FALSE)
Dataset /home/lb/tmp/nc.gpkg already exists: remove first, use update=TRUE to append,
delete_layer=TRUE to delete layer, or delete_dsn=TRUE to remove the entire data source before writing.
Error in CPL_write_ogr(obj, dsn, layer, driver, as.character(dataset_options), :
Dataset already exists.
(though maybe the "remove first" part of the message could be removed).
I recon this would be a breaking change, but maybe it could be worth implementing it...
@rsbivand What do you think?
I would remove "GPKG" from db_drivers <- c(unlist(prefix_map), "GPKG", "SQLite"), line 650 in R/read.R. Getting people to see GPKG as the new shapefile will become much harder if it doesn't behave like a file. I'm unsure about "SQLite" in the same catenation, but would prefer only prefix_map as db_drivers.
@rsbivand That would be ok for me, too. Still, I wonder if and why update should be "forced" to TRUE for some formats/drivers (absolutely possible/probable I may be missing something here - I seldom work on DBs...).
IMO the update = TRUE default is inconsistent with most other write methods (and st_write itself when we pass a DBI connection for dsn) and is likely to surprise users. I agree with @lbusett that we should simply change these exceptions for database drivers and force the users to be explicit about what they want to do if the data already exists.
I'm with @etiennebr and @lbusett : require update to be specified when the layer already exists. Letting GPKG files behave like shapefiles gives the wrong message: we want people to see that this is a format with other capabilities, like update/append.
Do you mean specified in the user-specified command, or by default based on the driver? The current default is very dangerous, and nobody will guess what is going on. Think of re-running an Rmd file and having your GPKG just growing (my case). It needs to be set up so that the user has to make a specific choice to update, overwrite or face an error by default.
We have write_sf which has delete_layer = TRUE, so always overwrites, like all other write_xxx functions do. That makes sense IMO. Future st_write will give an error when the layer exists, indicating the user should set update to TRUE or FALSE.
Works for me!
@edzer: Would you like a PR, on this, or are you going to implement it yourself?
did it myself, it's a tricky part. a review would be very welcome from you, @lbusett - note this is in branch update.
Demo script:
library(sf)
# Linking to GEOS 3.8.0, GDAL 3.0.2, PROJ 6.2.1
demo(nc, ask = FALSE, echo = FALSE)
# Reading layer `nc.gpkg' from data source `/home/edzer/R/x86_64-pc-linux-gnu-library/3.6/sf/gpkg/nc.gpkg' using driver `GPKG'
# Simple feature collection with 100 features and 14 fields
# Attribute-geometry relationship: 0 constant, 8 aggregate, 6 identity
# geometry type: MULTIPOLYGON
# dimension: XY
# bbox: xmin: -84.32385 ymin: 33.88199 xmax: -75.45698 ymax: 36.58965
# epsg (SRID): 4267
# proj4string: +proj=longlat +datum=NAD27 +no_defs
unlink("nc.gpkg")
st_write(nc, "nc.gpkg")
# Writing layer `nc' to data source `nc.gpkg' using driver `GPKG'
# Writing 100 features with 14 fields and geometry type Multi Polygon.
try(st_write(nc, "nc.gpkg")) # write twice: error
# Dataset nc.gpkg already exists: remove first, use update=TRUE to append, update=FALSE to replace,
# delete_layer=TRUE to delete layer, or delete_dsn=TRUE to remove the entire data source before writing.
# Error in CPL_write_ogr(obj, dsn, layer, driver, as.character(dataset_options), :
# Dataset already exists.
try(st_write(nc, "nc.gpkg", update = NA)) # write twice: error
# Dataset nc.gpkg already exists: remove first, use update=TRUE to append, update=FALSE to replace,
# delete_layer=TRUE to delete layer, or delete_dsn=TRUE to remove the entire data source before writing.
# Error in CPL_write_ogr(obj, dsn, layer, driver, as.character(dataset_options), :
# Dataset already exists.
st_write(nc, "nc.gpkg", update = TRUE) # now append
# Updating layer `nc' to data source `nc.gpkg' using driver `GPKG'
# Updating existing layer nc
# Writing 100 features with 14 fields and geometry type Multi Polygon.
read_sf("nc.gpkg") %>% nrow # 200
# [1] 200
st_write(nc, "nc.gpkg", update = FALSE) # now overwrite
# Writing layer `nc' to data source `nc.gpkg' using driver `GPKG'
# Writing 100 features with 14 fields and geometry type Multi Polygon.
read_sf("nc.gpkg") %>% nrow # 100
# [1] 100
Note that all checks pass clean for me, on
> library(sf)
Linking to GEOS 3.8.0, GDAL 3.0.2, PROJ 6.2.1
where on travis they fail with:
โโ testthat results โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
3967[ OK: 1071 | SKIPPED: 20 | WARNINGS: 6 | FAILED: 2 ]
39681. Failure: delete and update work (#304) (@test_write.R#47)
39692. Error: delete and update work (#304) (@test_write.R#49)
https://travis-ci.org/r-spatial/sf/jobs/654367754?utm_medium=notification&utm_source=email
@edzer
I just had a look. The examples you provided work ok. However, I would still be surprised by the behaviour of the modified code, when we get an overwrite of an existing layer with update = FALSE and no specifications on delete_layer/delete_dsn, and even with an explicit delete_layer=FALSE.
library(sf)
demo(nc, ask = FALSE, echo = FALSE)
unlink("nc.gpkg")
st_write(nc, "nc.gpkg")
#> Writing layer `nc' to data source `nc.gpkg' using driver `GPKG'
#> Writing 100 features with 14 fields and geometry type Multi Polygon.
st_write(nc, "nc.gpkg", update = NA)
#> Dataset nc.gpkg already exists: remove first, use update=TRUE to append, update=FALSE to replace,
#> delete_layer=TRUE to delete layer, or delete_dsn=TRUE to remove the entire data source before writing.
#> Error in CPL_write_ogr(obj, dsn, layer, driver, as.character(dataset_options), : Dataset already exists.
st_write(nc, "nc.gpkg", update = FALSE)
#> Writing layer `nc' to data source `nc.gpkg' using driver `GPKG'
#> Writing 100 features with 14 fields and geometry type Multi Polygon.
st_write(nc, "nc.gpkg", update = FALSE, delete_layer = FALSE)
#> Writing layer `nc' to data source `nc.gpkg' using driver `GPKG'
#> Writing 100 features with 14 fields and geometry type Multi Polygon.
In my opinion, overwrite should never happen unless explicitly setting delete_layer to TRUE, and for sure it should not happen with an explicit FALSE setting.
Created on 2020-02-24 by the reprex package (v0.3.0)
Please come with a _simple_ proposal then. AFAICT, you gave a thumbs up to this proposal, above.
@edzer, regarding the failing tests on travis, it reminds me of problems we've had with temporary files on travis before. Around that https://github.com/r-spatial/sf/commit/82c609f9befc1781d2848e007501439bdf21a0b2, probably. It isn't very helpful, but I'll try to investigate a bit.
@edzer Sorry, then I probably misunderstood your previous message. I did not mean to criticize: current changes clearly already improve much the behaviour and avoid "unintended" appending.
I was just thinking that no overwriting should happen if delete_layer is FALSE. Thus, the delete_layer setting should somehow "supersede" the update one, so that if delete_layer is FALSE, then nothing should happen, irrespectively of update settings.
To be more clear:
# overwrite on delete_layer = TRUE
st_write(nc, "nc.gpkg", delete_layer = TRUE)
#> Writing layer `nc' to data source `nc.gpkg' using driver `GPKG'
#> Writing 100 features with 14 fields and geometry type Multi Polygon.
# Do not overwite if delete_layer = FALSE, even if update =FALSE
st_write(nc, "nc.gpkg", update = FALSE, delete_layer = FALSE)
#> Dataset nc.gpkg already exists: remove first, use update=TRUE to append
#> delete_layer=TRUE to delete layer, or delete_dsn=TRUE to remove the entire data source before writing.
#> Error in CPL_write_ogr(obj, dsn, layer, driver, as.character(dataset_options), : Dataset already exists.
# Similarly, do not overwite if update =FALSE and delete_layer not set (since delete_layer defaults to FALSE)
st_write(nc, "nc.gpkg", update = FALSE)
#> Dataset nc.gpkg already exists: remove first, use update=TRUE to append
#> delete_layer=TRUE to delete layer, or delete_dsn=TRUE to remove the entire data source before writing.
#> Error in CPL_write_ogr(obj, dsn, layer, driver, as.character(dataset_options), : Dataset already exists.
# Overwrite if update = TRUE BUT delete_layer = TRUE (delete_layer supersedes update)
st_write(nc, "nc.gpkg", update = TRUE, delete_layer = TRUE)
#>Deleting layer `nc' using driver `GPKG'
#>Updating layer `nc' to data source `nc.gpkg' using driver `GPKG'
#>Writing 100 features with 14 fields and geometry type Multi Polygon.
# Append if update = TRUE and delete_layer = FALSE (or not set - thus defaulting to FALSE)
st_write(nc, "nc.gpkg", update = TRUE, delete_layer = FALSE)
#>Updating layer `nc' to data source `nc.gpkg' using driver `GPKG'
#>Updating existing layer nc
#>Writing 100 features with 14 fields and geometry type Multi Polygon.
Clearly, this is just an opinion based on my "expectation" for the behaviour. If you do not think this makes sense, no problem.
Otherwise, a possible solution maybe could be setting the default of update to FALSE and have something like this check at the beginning of st_write():
if (!delete_layer && !update) {
update = NA
}
I did a rapid check, and it seems to work (though apparently I was not able to run all tests - what should I do to run all tests? Currently I simply used the RStudio "test package" functionality):
โโ Results โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
Duration: 5.8 s
OK: 713
Failed: 0
Warnings: 5
Skipped: 64
HTH, and thanks (as always) for your work on the package!
Thanks, @lbusett ; can we, as an alternative, get away with delete_layer entirely, dropping this block? It seems to duplicate update in this new form (update = FALSE should imply delete_layer: if we re-create a layer it's also erased).
branch: update2
Please forget about branch update2 and my last message above; it will always delete the dataset when update=FALSE, removing all other layers ...
So, continuing with branch update:
delete_layer is now !is.na(update) && !update, so that specifying update=FALSE has the layer deleted first by default (if existing)write_sf now has update=FALSE rather than delete_layer=TRUE, as defaultWe should stop using delete_layer, and start using update.
library(sf)
# Linking to GEOS 3.8.0, GDAL 3.0.2, PROJ 6.2.1
demo(nc, ask = FALSE, echo = FALSE)
# Reading layer `nc.gpkg' from data source `/home/edzer/R/x86_64-pc-linux-gnu-library/3.6/sf/gpkg/nc.gpkg' using driver `GPKG'
# Simple feature collection with 100 features and 14 fields
# Attribute-geometry relationship: 0 constant, 8 aggregate, 6 identity
# geometry type: MULTIPOLYGON
# dimension: XY
# bbox: xmin: -84.32385 ymin: 33.88199 xmax: -75.45698 ymax: 36.58965
# epsg (SRID): 4267
# proj4string: +proj=longlat +datum=NAD27 +no_defs
unlink("nc.gpkg")
st_write(nc, "nc.gpkg")
# Writing layer `nc' to data source `nc.gpkg' using driver `GPKG'
# Writing 100 features with 14 fields and geometry type Multi Polygon.
try(st_write(nc, "nc.gpkg")) # write twice: error
# Dataset nc.gpkg already exists: remove first, use update=TRUE to append, update=FALSE to replace,
# delete_layer=TRUE to delete layer, or delete_dsn=TRUE to remove the entire data source before writing.
# Error in CPL_write_ogr(obj, dsn, layer, driver, as.character(dataset_options), :
# Dataset already exists.
try(st_write(nc, "nc.gpkg", update = NA)) # write twice: error
# Dataset nc.gpkg already exists: remove first, use update=TRUE to append, update=FALSE to replace,
# delete_layer=TRUE to delete layer, or delete_dsn=TRUE to remove the entire data source before writing.
# Error in CPL_write_ogr(obj, dsn, layer, driver, as.character(dataset_options), :
# Dataset already exists.
st_write(nc, "nc.gpkg", update = TRUE) # now append
# Updating layer `nc' to data source `nc.gpkg' using driver `GPKG'
# Updating existing layer nc
# Writing 100 features with 14 fields and geometry type Multi Polygon.
st_layers("nc.gpkg") # 200
# Driver: GPKG
# Available layers:
# layer_name geometry_type features fields
# 1 nc Multi Polygon 200 14
st_write(nc, "nc.gpkg", update = FALSE) # now overwrite
# Deleting layer `nc' using driver `GPKG'
# Writing layer `nc' to data source `nc.gpkg' using driver `GPKG'
# Writing 100 features with 14 fields and geometry type Multi Polygon.
st_layers("nc.gpkg")
# Driver: GPKG
# Available layers:
# layer_name geometry_type features fields
# 1 nc Multi Polygon 100 14
write_sf(nc, "nc.gpkg", layer = "bar") # default: update = FALSE
st_layers("nc.gpkg")
# Driver: GPKG
# Available layers:
# layer_name geometry_type features fields
# 1 nc Multi Polygon 100 14
# 2 bar Multi Polygon 100 14
write_sf(nc, "nc.gpkg", layer = "bar", update = TRUE)
st_layers("nc.gpkg") # added: 200
# Driver: GPKG
# Available layers:
# layer_name geometry_type features fields
# 1 nc Multi Polygon 100 14
# 2 bar Multi Polygon 200 14
write_sf(nc, "nc.gpkg", layer = "bar", update = TRUE, delete_layer = TRUE)
st_layers("nc.gpkg") # new: 100
# Driver: GPKG
# Available layers:
# layer_name geometry_type features fields
# 1 nc Multi Polygon 100 14
# 2 bar Multi Polygon 100 14
write_sf(nc, "nc.gpkg", layer = "bar", update = FALSE, delete_layer = FALSE) # must error:
# Error in st_write.sf(..., quiet = quiet, update = update) :
# cannot replace a layer if delete_layer is FALSE
# Calls: write_sf -> st_write -> st_write.sf
# Execution halted
I like the improvement. Maybe we could simplify it even more? I see two points of possible confusion:
Is update the right name for this option? Maybe we could use append to make it consistent with DBI options for tables. And I might be wrong in my interpretation, but think that it probably refers to ogr -update flag, which would also require the -append flag.
Here's my proposal:
try(st_write(nc, "nc.gpkg")) # write twice: error
# Dataset nc.gpkg already exists, either:
# - Append to the existing table with `append = TRUE`
# - Overwrite the table with `overwrite = TRUE` [this would change the option name]
# - Delete the entire data source before writing with `delete_dsn = TRUE`
# Error in CPL_write_ogr(obj, dsn, layer, driver, as.character(dataset_options), :
# Dataset already exists.
Also I find it dangerous to overwrite without the user explicitly asking for it:
st_write(nc, "nc.gpkg", update = FALSE) # now overwrite
I'd be happy to contribute code to clarify my proposal.
Although the current proposal in the update branch has the advantage of simplifying behaviour (we'd only really need the update argument), I personally like @etiennebr proposal more.
This because I feel it would be more understandable to the user: I'd feel a bit counterintuitive that to allow overwriting, I'd have to set the update argument to FALSE. If I understand it right, in @etiennebr proposal we would however still need to determine which argument should take precedence in case both overwrite and append are set to TRUE (or maybe it could be reasonable to just throw an error in that case, asking the user to disambiguate?).
Thanks! I think update comes from CRUD which is the pattern GDAL uses afaict, but we never actually update an _existing_ feature (=modify), so better get rid of that term.
Please explain to me why we'd need both an overwrite and an append: can't we say that if overwrite is TRUE, we overwrite (and do not append), if it is FALSE we append instead?
This is the behavior of DBI (although it relies on the driver implementation) adapted from the DBI example.
library(DBI)
con <- dbConnect(RSQLite::SQLite(), ":memory:")
dbWriteTable(con, "mtcars", mtcars[1:2, ])
dbReadTable(con, "mtcars")
#> mpg cyl disp hp drat wt qsec vs am gear carb
#> 1 21 6 160 110 3.9 2.620 16.46 0 1 4 4
#> 2 21 6 160 110 3.9 2.875 17.02 0 1 4 4
dbWriteTable(con, "mtcars", mtcars[1:2, ])
#> Error: Table mtcars exists in database, and both overwrite and append are FALSE
dbReadTable(con, "mtcars")
#> mpg cyl disp hp drat wt qsec vs am gear carb
#> 1 21 6 160 110 3.9 2.620 16.46 0 1 4 4
#> 2 21 6 160 110 3.9 2.875 17.02 0 1 4 4
dbWriteTable(con, "mtcars", mtcars[3:4, ], append = TRUE)
dbReadTable(con, "mtcars")
#> mpg cyl disp hp drat wt qsec vs am gear carb
#> 1 21.0 6 160 110 3.90 2.620 16.46 0 1 4 4
#> 2 21.0 6 160 110 3.90 2.875 17.02 0 1 4 4
#> 3 22.8 4 108 93 3.85 2.320 18.61 1 1 4 1
#> 4 21.4 6 258 110 3.08 3.215 19.44 1 0 3 1
dbWriteTable(con, "mtcars", mtcars[10:12, ], overwrite = TRUE)
dbReadTable(con, "mtcars")
#> mpg cyl disp hp drat wt qsec vs am gear carb
#> 1 19.2 6 167.6 123 3.92 3.44 18.3 1 0 4 4
#> 2 17.8 6 167.6 123 3.92 3.44 18.9 1 0 4 4
#> 3 16.4 8 275.8 180 3.07 4.07 17.4 0 0 3 3
# No row names
dbWriteTable(con, "mtcars", mtcars[1:4, ], append = TRUE, overwrite = TRUE)
#> Error: overwrite and append cannot both be TRUE
dbReadTable(con, "mtcars")
#> mpg cyl disp hp drat wt qsec vs am gear carb
#> 1 19.2 6 167.6 123 3.92 3.44 18.3 1 0 4 4
#> 2 17.8 6 167.6 123 3.92 3.44 18.9 1 0 4 4
#> 3 16.4 8 275.8 180 3.07 4.07 17.4 0 0 3 3
dbWriteTable(con, "mtcars2", mtcars[1:4, ], overwrite = TRUE) # works
dbWriteTable(con, "mtcars3", mtcars[1:4, ], append = TRUE) # works
dbDisconnect(con)
Created on 2020-02-26 by the reprex package (v0.3.0)
What I dislike about the interaction between update and delete_layer (when update is FALSE) is that it can lead to accidental loss of data. Since writing is so important and data on disk is so valuable, I would prefer in that case to sacrifice a little convenience for an additional security. Using NA is also slightly unusual, so if a user wants to write a table only if it doesn't exist --and be explicit about it--, they would need to do append = NA, delete_layer = FALSE, but I would intuitively do append = FALSE, delete_layer = FALSE, which would get the table deleted.
Then I suggest we go with overwrite, since overwrite = TRUE is more explicit about data loss to be expected (you told so!) than the equivalent append = FALSE. Instead of NA as default we can also leave it missing by default and check for that.
Would that mean using only the Overwrite argument and having this kind of behaviour?
| Overwrite | Behaviour on existing dataset |
| ------------- | ------------- |
| Missing (or NA) | Error |
| TRUE | Overwrite |
| FALSE | Append |
Maybe I am nitpicking, but it would still surprise me if setting "Overwrite" to FALSE (which as in last @etiennebr message I could naively do to ask to "write to a table only if it doesn't exist --and be explicit about it--") would mean actually asking for "Appending" on it. Although it would not lead to "data loss" (as feared by @etiennebr for the case of using "Append/Update" as "single argument"), it could lead to the "opposite": unintentional data "alteration" through dataset "expansion".
The only way I see to get away with a single and "unambiguous" argument would be to go from a logical to a "semantic" character one, such as:
| what_to_do_on_existing | Behaviour on existing dataset |
| ------------- | ------------- |
| Missing (or NA, or NULL, or a default like "Quit") | Error |
| "overwrite" | Overwrite |
| "append" | Append |
Otherwise, I'd think the safest option would be to keep two arguments as in current sf (eventually changing names) and "uniform" with the dbWriteTable behaviour reported by @etiennebr that, if I got it right, would be:
| Overwrite (a.k.a. delete_layer) | Append (a.k.a. update) | Behaviour on existing dataset |
| ----------- | ------------- | ------------- |
| FALSE | FALSE | Error |
| TRUE | FALSE | Overwrite |
| FALSE | TRUE | Append |
| TRUE | TRUE | Error |
However, really, I do not want to waste your time here. I will be ok with any implementation you decide that does not lead to overwrite/append on existing tables unless appropriate argument(s) are set explicitly.
I like your semantic proposal, and suggest we call this argument if_exists, with possible values missing, "overwrite", "append". Default for write_sf would be "overwrite", for st_write missing. Other proposals?
I still prefer DBI's API ยฏ_(ใ)_/ยฏ. Would having a default append = FALSE, overwrite = !append solve the issue? That way a user can change the behavior only by changing append and sf overwrites by default (like many other write methods in R)... I'd get used to it :)
sf` overwrites by default (like many other write methods in R)... I'd get used to it :)
sf overwrites by default only if using write_sf. Using st_write that is not the case (at the moment, default is to error on non-DB drivers, to append on DB ones - changing this default was the initial topic of this issue). Defaulting overwrite to TRUE on existing datasets is not a good idea, IMO (one reason I prefer using st_write and not `write_sf).
Can we close this by merging branch update?
Maybe the github version is outdated, but it doesn't seem to pass travis and it is still uses update rather than append.
Ah, yes, sorry, some work to be done here. So we rename update to append, and delete_layer to overwrite, and have st_write not overwrite but error by default if the layer exists.
Pending successful travis tests (they're not identical to what my machine does), this commit renames update into append, deprecates update, and raises an error if the layer exists but append has not been specified to either TRUE (do append) or FALSE (overwrite layer) .
I decided (for now) against a separate overwrite argument, as setting append=FALSE overwrites. In addition, we do have the delete_dsn and delete_layer logic of GDAL, which does different things to different drivers, which I think we want to keep.
Let me know if anything is missing, I'm close to integrating this in master.
@edzer Sorry for not following up. We are in a bit of a bind, in Italy, right now...
Only one clarification: with current new settings, what would happen if I set delete_layer to TRUE without specifying anything for append ?
That succeeds, always, and does what is documented.
Then I'd say it's ok! Thanks for working on this.
Most helpful comment
We have
write_sfwhich hasdelete_layer = TRUE, so always overwrites, like all otherwrite_xxxfunctions do. That makes sense IMO. Futurest_writewill give an error when the layer exists, indicating the user should setupdatetoTRUEorFALSE.