Sf: Boolean columns are dropped with st_write_db

Created on 12 Dec 2017  路  19Comments  路  Source: r-spatial/sf

st_write_db drops boolean/logical columns, see reprex below. I just learned that this makes sense with st_write, because GDAL does not support boolean field types. However, this is not the case with (R)PostgreSQL.

Currently I also don't see a workaround (e.g. manually defining PostgreSQL field types).

(Is clean_columns even necessary with st_write_db? AFAIK RPostgreSQL handles the conversion of factors and non-UTF-8 character strings, but not sure regarding POSIXlt.).

Reprex & Session info

library("RPostgreSQL")
#> Lade n枚tiges Paket: DBI
library("sf")
#> Linking to GEOS 3.6.1, GDAL 2.2.0, proj.4 4.9.3

pgcon <- 
  dbConnect(
    PostgreSQL(),
    host = "localhost",
    dbname = "postgres"
  )

nc_bool <- st_read(system.file("shape/nc.shp", package="sf"))
#> Reading layer `nc' from data source `C:\Users\daniel\Documents\R\win-library\3.4\sf\shape\nc.shp' using driver `ESRI Shapefile'
#> Simple feature collection with 100 features and 14 fields
#> geometry type:  MULTIPOLYGON
#> dimension:      XY
#> bbox:           xmin: -84.32385 ymin: 33.88199 xmax: -75.45698 ymax: 36.58965
#> epsg (SRID):    4267
#> proj4string:    +proj=longlat +datum=NAD27 +no_defs
nc_bool$large_area <- nc_bool$AREA >= 0.15

st_write_db(pgcon, nc_bool)
#> ignoring columns with unsupported type:
#> [1] "large_area"

dbDisconnect(pgcon)
#> [1] TRUE
devtools::session_info()
#> Session info -------------------------------------------------------------
#>  setting  value                       
#>  version  R version 3.4.2 (2017-09-28)
#>  system   x86_64, mingw32             
#>  ui       RTerm                       
#>  language (EN)                        
#>  collate  German_Germany.1252         
#>  tz       Europe/Berlin               
#>  date     2017-12-12
#> Packages -----------------------------------------------------------------
#>  package     * version date       source                               
#>  backports     1.1.1   2017-09-25 CRAN (R 3.4.1)                       
#>  base        * 3.4.2   2017-09-28 local                                
#>  compiler      3.4.2   2017-09-28 local                                
#>  datasets    * 3.4.2   2017-09-28 local                                
#>  DBI         * 0.7     2017-06-18 CRAN (R 3.4.0)                       
#>  devtools      1.13.4  2017-11-09 CRAN (R 3.4.2)                       
#>  digest        0.6.12  2017-01-27 CRAN (R 3.4.2)                       
#>  evaluate      0.10.1  2017-06-24 CRAN (R 3.4.0)                       
#>  graphics    * 3.4.2   2017-09-28 local                                
#>  grDevices   * 3.4.2   2017-09-28 local                                
#>  grid          3.4.2   2017-09-28 local                                
#>  htmltools     0.3.6   2017-04-28 CRAN (R 3.4.0)                       
#>  knitr         1.17    2017-08-10 CRAN (R 3.4.1)                       
#>  magrittr      1.5     2014-11-22 CRAN (R 3.4.2)                       
#>  memoise       1.1.0   2017-10-16 Github (hadley/memoise@d63ae9c)      
#>  methods     * 3.4.2   2017-09-28 local                                
#>  Rcpp          0.12.14 2017-11-23 CRAN (R 3.4.2)                       
#>  rmarkdown     1.8     2017-11-17 CRAN (R 3.4.2)                       
#>  RPostgreSQL * 0.6-3   2017-11-16 Github (tomoakin/RPostgreSQL@bfdb284)
#>  rprojroot     1.2     2017-01-16 CRAN (R 3.4.2)                       
#>  sf          * 0.5-5   2017-10-06 Github (r-spatial/sf@5f7a415)        
#>  stats       * 3.4.2   2017-09-28 local                                
#>  stringi       1.1.6   2017-11-17 CRAN (R 3.4.2)                       
#>  stringr       1.2.0   2017-02-18 CRAN (R 3.4.2)                       
#>  tools         3.4.2   2017-09-28 local                                
#>  udunits2      0.13    2016-11-17 CRAN (R 3.4.1)                       
#>  units         0.4-6   2017-08-27 CRAN (R 3.4.1)                       
#>  utils       * 3.4.2   2017-09-28 local                                
#>  withr         2.1.0   2017-11-01 CRAN (R 3.4.2)                       
#>  yaml          2.1.15  2017-12-01 CRAN (R 3.4.3)

Most helpful comment

I'll provide more details, but the origin of the problem is that the odbc driver truncates the geometry to 255 characters, so obviously the WKB becomes invalid. This is not the case with the other drivers tested and there are no geometries large enough in the tests to raise the issue with odbc.

All 19 comments

This issue seems to be solved by #558.

In case someone needs a workaround until this is on CRAN: convert the logical column to character and define the field.type (for all columns) manually.

library("RPostgreSQL")
#> Lade n枚tiges Paket: DBI
library("sf")
#> Linking to GEOS 3.6.1, GDAL 2.2.0, proj.4 4.9.3

pgcon <- 
  dbConnect(
    PostgreSQL(),
    host = "localhost",
    dbname = "postgis"
  )

nc_bool <- st_read(system.file("shape/nc.shp", package="sf"))
#> Reading layer `nc' from data source `C:\Users\daniel\Documents\R\win-library\3.4\sf\shape\nc.shp' using driver `ESRI Shapefile'
#> Simple feature collection with 100 features and 14 fields
#> geometry type:  MULTIPOLYGON
#> dimension:      XY
#> bbox:           xmin: -84.32385 ymin: 33.88199 xmax: -75.45698 ymax: 36.58965
#> epsg (SRID):    4267
#> proj4string:    +proj=longlat +datum=NAD27 +no_defs
nc_bool$large_area <- nc_bool$AREA >= 0.15

# too lazy to define field.types for all columns, so picking only the boolean
# and geometry
nc_bool <- nc_bool[c("large_area", "geometry")]

# convert logical/boolean to character
nc_bool$large_area <- as.character(nc_bool$large_area)

# define field types
field_types_pg <-
  c(
    "large_area"   = "boolean",
    "geometry"     = "geometry(MultiPolygon,4267)" # modify geometry type and crs accordingly
  )


st_write_db(pgcon, nc_bool, row.names = FALSE, field.types = field_types_pg)

dbDisconnect(pgcon)
#> [1] TRUE

We're in the process of integrating st_write_db into st_write, see https://github.com/r-spatial/sf/pull/558; I guess we need to revisit clean_columns to have it accept cases (like boolean) that RPostgreSQL and/or DBI accept, but GDAL doesn't. Am I right? @etiennebr

Yes, I have seen that today and as far as I can tell, @etiennebr already took care of this. I.e. st_write does not use clean_columns when writing to db anymore.

Yes, I wasn't sure what to do about clean_column. I manually added the units (and sfc) conversion in dbDataType() to rely on DBI as much as possible, but there might be other corner cases that I missed. As @dpprdan mentioned, each database has it's own way of handling different types, so it should probably be left to each database driver as much as possible.

Can we close this here?

AFAIK it's not fixed in master until #558 is merged. Not sure, though, how you usually close issues, i.e. is the existence of a fix in a pull request enough, even though it's not merged yet?

Thanks; that still fails for me with

> test_check("sf")
1. Failure: can read from db (@test_postgis_ODBC.R#119) ------------------------
error$message does not match "no applicable method for 'st_read'".
Actual value: "dsn should specify a data source or filename"


2. Failure: new SRIDs are handled correctly (@test_postgis_ODBC.R#271) ---------
st_crs(x) == crs isn't true.


3. Failure: can read from db (@test_postgis_RPostgreSQL.R#119) -----------------
error$message does not match "no applicable method for 'st_read'".
Actual value: "dsn should specify a data source or filename"

@etiennebr I'm close to submitting 0.6-1 to CRAN; shall we leave this resting to the next release opportunity (once every two months)?

How close? I can finish it by the end of the week.

That would be great. 0.6-1 is needed for a CRAN release of stars; I bundled all GDAL stuff in sf.

I will submit 0.6-1 on Monday.

@edzer, I'm deeply sorry, but I can't finish it for Monday; I still have issues with multi-geom columns and I don't want to rush it. I think it will also be a good thing to test the new features on the dev version before the CRAN release.

sure, no worries!

This should be closed with #558. See tests. Logical types are supported for RPostgres and RPostgreSQL. ODBC has issues with certain drivers, apparently (feel free to test it and report back).

This works fine now for `RPostgreSQL`` connections:

suppressPackageStartupMessages(library(RPostgreSQL))
suppressPackageStartupMessages(library(sf))

nc_bool <- 
  st_read(
    system.file("shape/nc.shp", package="sf"), 
    stringsAsFactors = FALSE, 
    quiet = TRUE
  )[, c("AREA", "geometry")]

nc_bool$large_area <- nc_bool$AREA >= 0.15

pg_con <- dbConnect(PostgreSQL(),
                 host = "localhost", 
                 dbname = "postgis")

st_write(obj = nc_bool, dsn = pg_con, "nc_bool", overwrite = TRUE)
#> [1] TRUE

nc_bool2 <- st_read(pg_con, "nc_bool")

str(nc_bool2$large_area)
#>  logi [1:100] FALSE FALSE FALSE FALSE TRUE FALSE ...

dbDisconnect(pg_con)
#> [1] TRUE

I can also write booleans to DB using odbc:

odbc_con <- dbConnect(odbc::odbc(), 
                      driver = "PostgreSQL Unicode(x64)",
                      host = "localhost", 
                      database = "postgis")
st_write(obj = nc_bool, dsn = odbc_con, layer = "nc_bool", overwrite = TRUE)
#> Note: method with signature 'DBIObject#sf' chosen for function 'dbDataType',
#>  target signature 'PostgreSQL#sf'.
#>  "OdbcConnection#ANY" would also be valid
dbGetQuery(odbc_con, "SELECT pg_typeof(large_area) FROM nc_bool;") %>% head(2)                                     
#>   pg_typeof
#> 1   boolean
#> 2   boolean

The only problem I have here is that I cannot read the table with st_read and the odbc dsn:

st_read(dsn = odbc_con, layer = "nc_bool")
#> Error in CPL_read_wkb(x, EWKB, spatialite): range check error: WKB buffer too small. Input file corrupt?

dbDisconnect(odbc_con)

But that is probably something else. Shall I open a separate issue for this or did I miss something?

FWIW I cannot resproduce your problem in https://github.com/r-dbi/odbc/issues/168, i.e. dbWriteTable and dbReadTable with non-sf tables via odbc works fine for me. (Windows, PostgreSQL 9.6.5).

When trying to write the boolean column to postgres via GDAL, the column is dropped (as expected)

st_write(nc_bool, "PG:dbname=postgis", "nc_bool", delete_layer = TRUE)
#> Deleting layer `nc_bool' using driver `PostgreSQL'
#> Updating layer `nc_bool' to data source `PG:dbname=postgis' using driver `PostgreSQL'
#> features:       100
#> fields:         1
#> geometry type:  Multi Polygon
st_read("PG:dbname=postgis", "nc_bool", quiet = TRUE) %>% colnames()
#> [1] "area"         "wkb_geometry"

What is unexpected, however, is that there is no warning (contrary to e.g. a geopackage dsn as mentioned in #579)! This ought to be fixed IMO.

Should the use of the GDAL driver even be discouraged when there are good 'native' R DB drivers available (i.e. at least for PostgreSQL)?

Thanks for the feedback @dpprdan. I can reproduce the WKB buffer too small error with odbc.

I think you're right, there should be a warning when the driver drops a column. Could you open another issue for that as well?

Looks like something I caused. A traceback() right after the error message would be helpful!

I'll provide more details, but the origin of the problem is that the odbc driver truncates the geometry to 255 characters, so obviously the WKB becomes invalid. This is not the case with the other drivers tested and there are no geometries large enough in the tests to raise the issue with odbc.

Closing this since the original problem was solved by #558 and #721 and #722 were opened (with #722 already solved) for the related problems with odbc and GDAL.

Was this page helpful?
0 / 5 - 0 ratings