I have some data I'm storing in a PostGIS database. I noticed when using st_read(con, query = "select ...") I get a strange result when there are some data columns that contain NA. It appears when an NA is involved, the function thinks there are multiple geometry columns. I'm using sf_0.6-4 See reprex below:
library(tidyverse)
library(sf)
library(RPostgreSQL)
# Connect to PostGIS database
con <- dbConnect(
drv = dbDriver('PostgreSQL'),
user = 'postgres',
dbname = 'postgis',
host = 'localhost',
password = 'admin'
)
# Make up data, with rows 2 and 3 having an NA value
data <- tribble(
~id, ~code, ~category, ~status, ~col_that_is_often_na,
1, "AA", "Amphibian", "Y", "LT",
2, "AB", "Bird", NA, "SC",
3, "AF", "Fish", "N", NA
)
# Make up geometry
geom <- st_sfc(
st_point(c(-111, 33), dim = "XY"),
st_point(c(-112, 33), dim = "XY"),
st_point(c(-113, 33), dim = "XY")
)
# Write to PostGIS database
st_write(
st_as_sf(cbind(data, geom), crs = 4326),
dsn = con,
layer = "data",
overwrite = TRUE
)
#> [1] TRUE
# Reading entire table works as expected
st_read(con, layer = "data")
#> Simple feature collection with 3 features and 5 fields
#> geometry type: POINT
#> dimension: XY
#> bbox: xmin: -113 ymin: 33 xmax: -111 ymax: 33
#> epsg (SRID): 4326
#> proj4string: +proj=longlat +datum=WGS84 +no_defs
#> id code category status col_that_is_often_na geometry
#> 1 1 AA Amphibian Y LT POINT (-111 33)
#> 2 2 AB Bird <NA> SC POINT (-112 33)
#> 3 3 AF Fish N <NA> POINT (-113 33)
# So does using a query to select first row
st_read(con, query = "select * from data where id = 1")
#> Simple feature collection with 1 feature and 5 fields
#> geometry type: POINT
#> dimension: XY
#> bbox: xmin: -111 ymin: 33 xmax: -111 ymax: 33
#> epsg (SRID): 4326
#> proj4string: +proj=longlat +datum=WGS84 +no_defs
#> id code category status col_that_is_often_na geometry
#> 1 1 AA Amphibian Y LT POINT (-111 33)
# Something goes wrong when querying the second and third row
# NA value is taken to mean empty GEOMETRYCOLLECTION
st_read(con, query = "select * from data where id = 2")
#> Simple feature collection with 1 feature and 4 fields
#> Active geometry column: status (with 1 geometry empty)
#> geometry type: GEOMETRYCOLLECTION
#> dimension: XY
#> bbox: xmin: NA ymin: NA xmax: NA ymax: NA
#> epsg (SRID): NA
#> proj4string: NA
#> id code category col_that_is_often_na status
#> 1 2 AB Bird SC GEOMETRYCOLLECTION EMPTY
#> geometry
#> 1 POINT (-112 33)
st_read(con, query = "select * from data where id = 3")
#> Simple feature collection with 1 feature and 4 fields
#> Active geometry column: col_that_is_often_na (with 1 geometry empty)
#> geometry type: GEOMETRYCOLLECTION
#> dimension: XY
#> bbox: xmin: NA ymin: NA xmax: NA ymax: NA
#> epsg (SRID): NA
#> proj4string: NA
#> id code category status col_that_is_often_na geometry
#> 1 3 AF Fish N GEOMETRYCOLLECTION EMPTY POINT (-113 33)
Created on 2018-07-26 by the reprex package (v0.2.0).
sessionInfo()
sessionInfo()
R version 3.4.2 (2017-09-28)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 7 x64 (build 7601) Service Pack 1
Matrix products: default
locale:
[1] LC_COLLATE=English_United States.1252 LC_CTYPE=English_United States.1252 LC_MONETARY=English_United States.1252
[4] LC_NUMERIC=C LC_TIME=English_United States.1252
attached base packages:
[1] stats graphics grDevices utils datasets methods base
other attached packages:
[1] RPostgreSQL_0.6-2 DBI_1.0.0 sf_0.6-4 forcats_0.2.0 stringr_1.3.1 dplyr_0.7.4 purrr_0.2.4
[8] readr_1.1.1 tidyr_0.7.2 tibble_1.4.2 ggplot2_2.2.1 tidyverse_1.2.1
loaded via a namespace (and not attached):
[1] Rcpp_0.12.18 lubridate_1.7.4 lattice_0.20-35 class_7.3-14 assertthat_0.2.0 rprojroot_1.3-2 digest_0.6.15
[8] psych_1.7.8 R6_2.2.2 cellranger_1.1.0 plyr_1.8.4 backports_1.1.1 reprex_0.2.0 evaluate_0.10.1
[15] e1071_1.6-8 httr_1.3.1 pillar_1.2.2 rlang_0.2.0.9001 lazyeval_0.2.1 readxl_1.0.0 rstudioapi_0.7
[22] whisker_0.3-2 callr_2.0.4 rmarkdown_1.10 devtools_1.13.6 foreign_0.8-69 munsell_0.4.3 broom_0.4.2
[29] compiler_3.4.2 modelr_0.1.1 pkgconfig_2.0.1 mnormt_1.5-5 clipr_0.4.0 htmltools_0.3.6 crayon_1.3.4
[36] withr_2.1.2 grid_3.4.2 nlme_3.1-131 spData_0.2.8.3 jsonlite_1.5 gtable_0.2.0 magrittr_1.5
[43] units_0.6-1 scales_0.5.0 cli_1.0.0 stringi_1.1.7 reshape2_1.4.3 bindrcpp_0.2 xml2_1.2.0
[50] tools_3.4.2 glue_1.2.0 hms_0.4.2 processx_3.1.0 parallel_3.4.2 colorspace_1.3-2 classInt_0.2-3
[57] rvest_0.3.2 memoise_1.1.0 knitr_1.20 bindr_0.1 haven_1.1.0
Any thoughts? Perhaps I am missing a crucial argument, but I would expect rows 2 and 3 to return the correct geometry.
Fascinating! @etiennebr any idea?
Thanks for reporting this. I suggest you try the RPostgres driver rather than the RPostgresql. I don't have access to a computer right now, so I can't say for sure, but theRPostgresql driver has additional logic to resolve geometry columns that seems to fail in this situation.
Yes! Thank you @etiennebr, this works just as expected with RPostgres. I'm closing the issue since this appears to be a RPostgreSQL problem, not sf. I'm leaving a working reprex here for completeness. Thank you both for your quick response as always.
library(tidyverse)
library(sf)
library(RPostgres)
# Connect to PostGIS database with RPostgres NOT RPostreSQL
con <- dbConnect(
RPostgres::Postgres(),
user = 'postgres',
dbname = 'postgis',
host = 'localhost',
password = 'admin'
)
# Make up data, with rows 2 and 3 having an NA value
data <- tribble(
~id, ~code, ~category, ~status, ~col_that_is_often_na,
1, "AA", "Amphibian", "Y", "LT",
2, "AB", "Bird", NA, "SC",
3, "AF", "Fish", "N", NA
)
# Make up geometry
geom <- st_sfc(
st_point(c(-111, 33), dim = "XY"),
st_point(c(-112, 33), dim = "XY"),
st_point(c(-113, 33), dim = "XY")
)
# Write to PostGIS database
st_write(
st_as_sf(cbind(data, geom), crs = 4326),
dsn = con,
layer = "data",
overwrite = TRUE
)
#> Note: method with signature 'DBIObject#sf' chosen for function 'dbDataType',
#> target signature 'PqConnection#sf'.
#> "PqConnection#ANY" would also be valid
# Reading entire table and each individual row now works as expected with RPostgres package
st_read(con, layer = "data")
#> Simple feature collection with 3 features and 5 fields
#> geometry type: POINT
#> dimension: XY
#> bbox: xmin: -113 ymin: 33 xmax: -111 ymax: 33
#> epsg (SRID): 4326
#> proj4string: +proj=longlat +datum=WGS84 +no_defs
#> id code category status col_that_is_often_na geometry
#> 1 1 AA Amphibian Y LT POINT (-111 33)
#> 2 2 AB Bird <NA> SC POINT (-112 33)
#> 3 3 AF Fish N <NA> POINT (-113 33)
st_read(con, query = "select * from data where id = 1")
#> Simple feature collection with 1 feature and 5 fields
#> geometry type: POINT
#> dimension: XY
#> bbox: xmin: -111 ymin: 33 xmax: -111 ymax: 33
#> epsg (SRID): 4326
#> proj4string: +proj=longlat +datum=WGS84 +no_defs
#> id code category status col_that_is_often_na geometry
#> 1 1 AA Amphibian Y LT POINT (-111 33)
st_read(con, query = "select * from data where id = 2")
#> Simple feature collection with 1 feature and 5 fields
#> geometry type: POINT
#> dimension: XY
#> bbox: xmin: -112 ymin: 33 xmax: -112 ymax: 33
#> epsg (SRID): 4326
#> proj4string: +proj=longlat +datum=WGS84 +no_defs
#> id code category status col_that_is_often_na geometry
#> 1 2 AB Bird <NA> SC POINT (-112 33)
st_read(con, query = "select * from data where id = 3")
#> Simple feature collection with 1 feature and 5 fields
#> geometry type: POINT
#> dimension: XY
#> bbox: xmin: -113 ymin: 33 xmax: -113 ymax: 33
#> epsg (SRID): 4326
#> proj4string: +proj=longlat +datum=WGS84 +no_defs
#> id code category status col_that_is_often_na geometry
#> 1 3 AF Fish N <NA> POINT (-113 33)
Created on 2018-07-27 by the reprex package (v0.2.0).
Interestingly we were able to replicate this error with RPostgreSQL in this issue and the issue this was fixed by #850. Here is a reprex just because I stumbled on this issue:
packageVersion("sf")
#> [1] '0.7.0'
library(tidyverse)
library(sf)
#> Linking to GEOS 3.6.1, GDAL 2.2.3, PROJ 4.9.3
library(RPostgreSQL)
#> Loading required package: DBI
con <- dbConnect(PostgreSQL(),
host = "localhost",
dbname = "postgis_in_action",
port = '5432',
user = "postgres")
#Make up data, with rows 2 and 3 having an NA value
data <- tribble(
~id, ~code, ~category, ~status, ~col_that_is_often_na,
1, "AA", "Amphibian", "Y", "LT",
2, "AB", "Bird", NA, "SC",
3, "AF", "Fish", "N", NA
)
# Make up geometry
geom <- st_sfc(
st_point(c(-111, 33), dim = "XY"),
st_point(c(-112, 33), dim = "XY"),
st_point(c(-113, 33), dim = "XY")
)
# Write to PostGIS database
st_write(
st_as_sf(cbind(data, geom), crs = 4326),
dsn = con,
layer = "data",
overwrite = TRUE
)
#> [1] TRUE
# Reading entire table works as expected
st_read(con, layer = "data")
#> Simple feature collection with 3 features and 5 fields
#> geometry type: POINT
#> dimension: XY
#> bbox: xmin: -113 ymin: 33 xmax: -111 ymax: 33
#> epsg (SRID): 4326
#> proj4string: +proj=longlat +datum=WGS84 +no_defs
#> id code category status col_that_is_often_na geometry
#> 1 1 AA Amphibian Y LT POINT (-111 33)
#> 2 2 AB Bird <NA> SC POINT (-112 33)
#> 3 3 AF Fish N <NA> POINT (-113 33)
# So does using a query to select first row
st_read(con, query = "select * from data where id = 1")
#> Simple feature collection with 1 feature and 5 fields
#> geometry type: POINT
#> dimension: XY
#> bbox: xmin: -111 ymin: 33 xmax: -111 ymax: 33
#> epsg (SRID): 4326
#> proj4string: +proj=longlat +datum=WGS84 +no_defs
#> id code category status col_that_is_often_na geometry
#> 1 1 AA Amphibian Y LT POINT (-111 33)
# This instance fixed
st_read(con, query = "select * from data where id = 2")
#> Simple feature collection with 1 feature and 5 fields
#> geometry type: POINT
#> dimension: XY
#> bbox: xmin: -112 ymin: 33 xmax: -112 ymax: 33
#> epsg (SRID): 4326
#> proj4string: +proj=longlat +datum=WGS84 +no_defs
#> id code category status col_that_is_often_na geometry
#> 1 2 AB Bird <NA> SC POINT (-112 33)
# As well as this one
st_read(con, query = "select * from data where id = 3")
#> Simple feature collection with 1 feature and 5 fields
#> geometry type: POINT
#> dimension: XY
#> bbox: xmin: -113 ymin: 33 xmax: -113 ymax: 33
#> epsg (SRID): 4326
#> proj4string: +proj=longlat +datum=WGS84 +no_defs
#> id code category status col_that_is_often_na geometry
#> 1 3 AF Fish N <NA> POINT (-113 33)
Created on 2018-09-27 by the reprex package (v0.2.1)
Most helpful comment
Interestingly we were able to replicate this error with RPostgreSQL in this issue and the issue this was fixed by #850. Here is a reprex just because I stumbled on this issue:
Created on 2018-09-27 by the reprex package (v0.2.1)