Sf: st_read with query not returning correct result when NA values present

Created on 27 Jul 2018  路  4Comments  路  Source: r-spatial/sf

I have some data I'm storing in a PostGIS database. I noticed when using st_read(con, query = "select ...") I get a strange result when there are some data columns that contain NA. It appears when an NA is involved, the function thinks there are multiple geometry columns. I'm using sf_0.6-4 See reprex below:

library(tidyverse)
library(sf)
library(RPostgreSQL)

# Connect to PostGIS database
con <- dbConnect(
  drv = dbDriver('PostgreSQL'),
  user = 'postgres',
  dbname = 'postgis',
  host = 'localhost',
  password = 'admin'
)

# Make up data, with rows 2 and 3 having an NA value
data <- tribble(
  ~id, ~code, ~category, ~status, ~col_that_is_often_na,
  1, "AA", "Amphibian", "Y", "LT",
  2, "AB", "Bird", NA, "SC",
  3, "AF", "Fish", "N", NA
)

# Make up geometry
geom <- st_sfc(
  st_point(c(-111, 33), dim = "XY"),
  st_point(c(-112, 33), dim = "XY"),
  st_point(c(-113, 33), dim = "XY")
)

# Write to PostGIS database
st_write(
  st_as_sf(cbind(data, geom), crs = 4326),
  dsn = con,
  layer = "data",
  overwrite = TRUE
)
#> [1] TRUE

# Reading entire table works as expected
st_read(con, layer = "data")
#> Simple feature collection with 3 features and 5 fields
#> geometry type:  POINT
#> dimension:      XY
#> bbox:           xmin: -113 ymin: 33 xmax: -111 ymax: 33
#> epsg (SRID):    4326
#> proj4string:    +proj=longlat +datum=WGS84 +no_defs
#>   id code  category status col_that_is_often_na        geometry
#> 1  1   AA Amphibian      Y                   LT POINT (-111 33)
#> 2  2   AB      Bird   <NA>                   SC POINT (-112 33)
#> 3  3   AF      Fish      N                 <NA> POINT (-113 33)

# So does using a query to select first row
st_read(con, query = "select * from data where id = 1")
#> Simple feature collection with 1 feature and 5 fields
#> geometry type:  POINT
#> dimension:      XY
#> bbox:           xmin: -111 ymin: 33 xmax: -111 ymax: 33
#> epsg (SRID):    4326
#> proj4string:    +proj=longlat +datum=WGS84 +no_defs
#>   id code  category status col_that_is_often_na        geometry
#> 1  1   AA Amphibian      Y                   LT POINT (-111 33)

# Something goes wrong when querying the second and third row
# NA value is taken to mean empty GEOMETRYCOLLECTION

st_read(con, query = "select * from data where id = 2")
#> Simple feature collection with 1 feature and 4 fields
#> Active geometry column: status (with 1 geometry empty)
#> geometry type:  GEOMETRYCOLLECTION
#> dimension:      XY
#> bbox:           xmin: NA ymin: NA xmax: NA ymax: NA
#> epsg (SRID):    NA
#> proj4string:    NA
#>   id code category col_that_is_often_na                   status
#> 1  2   AB     Bird                   SC GEOMETRYCOLLECTION EMPTY
#>          geometry
#> 1 POINT (-112 33)

st_read(con, query = "select * from data where id = 3")
#> Simple feature collection with 1 feature and 4 fields
#> Active geometry column: col_that_is_often_na (with 1 geometry empty)
#> geometry type:  GEOMETRYCOLLECTION
#> dimension:      XY
#> bbox:           xmin: NA ymin: NA xmax: NA ymax: NA
#> epsg (SRID):    NA
#> proj4string:    NA
#>   id code category status     col_that_is_often_na        geometry
#> 1  3   AF     Fish      N GEOMETRYCOLLECTION EMPTY POINT (-113 33)

Created on 2018-07-26 by the reprex package (v0.2.0).

sessionInfo()

sessionInfo()
R version 3.4.2 (2017-09-28)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 7 x64 (build 7601) Service Pack 1

Matrix products: default

locale:
[1] LC_COLLATE=English_United States.1252  LC_CTYPE=English_United States.1252    LC_MONETARY=English_United States.1252
[4] LC_NUMERIC=C                           LC_TIME=English_United States.1252    

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
 [1] RPostgreSQL_0.6-2 DBI_1.0.0         sf_0.6-4          forcats_0.2.0     stringr_1.3.1     dplyr_0.7.4       purrr_0.2.4      
 [8] readr_1.1.1       tidyr_0.7.2       tibble_1.4.2      ggplot2_2.2.1     tidyverse_1.2.1  

loaded via a namespace (and not attached):
 [1] Rcpp_0.12.18     lubridate_1.7.4  lattice_0.20-35  class_7.3-14     assertthat_0.2.0 rprojroot_1.3-2  digest_0.6.15   
 [8] psych_1.7.8      R6_2.2.2         cellranger_1.1.0 plyr_1.8.4       backports_1.1.1  reprex_0.2.0     evaluate_0.10.1 
[15] e1071_1.6-8      httr_1.3.1       pillar_1.2.2     rlang_0.2.0.9001 lazyeval_0.2.1   readxl_1.0.0     rstudioapi_0.7  
[22] whisker_0.3-2    callr_2.0.4      rmarkdown_1.10   devtools_1.13.6  foreign_0.8-69   munsell_0.4.3    broom_0.4.2     
[29] compiler_3.4.2   modelr_0.1.1     pkgconfig_2.0.1  mnormt_1.5-5     clipr_0.4.0      htmltools_0.3.6  crayon_1.3.4    
[36] withr_2.1.2      grid_3.4.2       nlme_3.1-131     spData_0.2.8.3   jsonlite_1.5     gtable_0.2.0     magrittr_1.5    
[43] units_0.6-1      scales_0.5.0     cli_1.0.0        stringi_1.1.7    reshape2_1.4.3   bindrcpp_0.2     xml2_1.2.0      
[50] tools_3.4.2      glue_1.2.0       hms_0.4.2        processx_3.1.0   parallel_3.4.2   colorspace_1.3-2 classInt_0.2-3  
[57] rvest_0.3.2      memoise_1.1.0    knitr_1.20       bindr_0.1        haven_1.1.0   

Any thoughts? Perhaps I am missing a crucial argument, but I would expect rows 2 and 3 to return the correct geometry.

Most helpful comment

Interestingly we were able to replicate this error with RPostgreSQL in this issue and the issue this was fixed by #850. Here is a reprex just because I stumbled on this issue:

packageVersion("sf")
#> [1] '0.7.0'

library(tidyverse)
library(sf)
#> Linking to GEOS 3.6.1, GDAL 2.2.3, PROJ 4.9.3
library(RPostgreSQL)
#> Loading required package: DBI

con <- dbConnect(PostgreSQL(), 
                  host = "localhost", 
                  dbname = "postgis_in_action", 
                  port = '5432', 
                  user = "postgres")

#Make up data, with rows 2 and 3 having an NA value
data <- tribble(
  ~id, ~code, ~category, ~status, ~col_that_is_often_na,
  1, "AA", "Amphibian", "Y", "LT",
  2, "AB", "Bird", NA, "SC",
  3, "AF", "Fish", "N", NA
)

# Make up geometry
geom <- st_sfc(
  st_point(c(-111, 33), dim = "XY"),
  st_point(c(-112, 33), dim = "XY"),
  st_point(c(-113, 33), dim = "XY")
)

# Write to PostGIS database
st_write(
  st_as_sf(cbind(data, geom), crs = 4326),
  dsn = con,
  layer = "data",
  overwrite = TRUE
)
#> [1] TRUE

# Reading entire table works as expected
st_read(con, layer = "data")
#> Simple feature collection with 3 features and 5 fields
#> geometry type:  POINT
#> dimension:      XY
#> bbox:           xmin: -113 ymin: 33 xmax: -111 ymax: 33
#> epsg (SRID):    4326
#> proj4string:    +proj=longlat +datum=WGS84 +no_defs
#>   id code  category status col_that_is_often_na        geometry
#> 1  1   AA Amphibian      Y                   LT POINT (-111 33)
#> 2  2   AB      Bird   <NA>                   SC POINT (-112 33)
#> 3  3   AF      Fish      N                 <NA> POINT (-113 33)

# So does using a query to select first row
st_read(con, query = "select * from data where id = 1")
#> Simple feature collection with 1 feature and 5 fields
#> geometry type:  POINT
#> dimension:      XY
#> bbox:           xmin: -111 ymin: 33 xmax: -111 ymax: 33
#> epsg (SRID):    4326
#> proj4string:    +proj=longlat +datum=WGS84 +no_defs
#>   id code  category status col_that_is_often_na        geometry
#> 1  1   AA Amphibian      Y                   LT POINT (-111 33)

# This instance fixed
st_read(con, query = "select * from data where id = 2")
#> Simple feature collection with 1 feature and 5 fields
#> geometry type:  POINT
#> dimension:      XY
#> bbox:           xmin: -112 ymin: 33 xmax: -112 ymax: 33
#> epsg (SRID):    4326
#> proj4string:    +proj=longlat +datum=WGS84 +no_defs
#>   id code category status col_that_is_often_na        geometry
#> 1  2   AB     Bird   <NA>                   SC POINT (-112 33)

# As well as this one
st_read(con, query = "select * from data where id = 3")
#> Simple feature collection with 1 feature and 5 fields
#> geometry type:  POINT
#> dimension:      XY
#> bbox:           xmin: -113 ymin: 33 xmax: -113 ymax: 33
#> epsg (SRID):    4326
#> proj4string:    +proj=longlat +datum=WGS84 +no_defs
#>   id code category status col_that_is_often_na        geometry
#> 1  3   AF     Fish      N                 <NA> POINT (-113 33)

Created on 2018-09-27 by the reprex package (v0.2.1)

All 4 comments

Fascinating! @etiennebr any idea?

Thanks for reporting this. I suggest you try the RPostgres driver rather than the RPostgresql. I don't have access to a computer right now, so I can't say for sure, but theRPostgresql driver has additional logic to resolve geometry columns that seems to fail in this situation.

Yes! Thank you @etiennebr, this works just as expected with RPostgres. I'm closing the issue since this appears to be a RPostgreSQL problem, not sf. I'm leaving a working reprex here for completeness. Thank you both for your quick response as always.

library(tidyverse)
library(sf)
library(RPostgres)

# Connect to PostGIS database with RPostgres NOT RPostreSQL
con <- dbConnect(
  RPostgres::Postgres(),
  user = 'postgres',
  dbname = 'postgis',
  host = 'localhost',
  password = 'admin'
)


# Make up data, with rows 2 and 3 having an NA value
data <- tribble(
  ~id, ~code, ~category, ~status, ~col_that_is_often_na,
  1, "AA", "Amphibian", "Y", "LT",
  2, "AB", "Bird", NA, "SC",
  3, "AF", "Fish", "N", NA
)

# Make up geometry
geom <- st_sfc(
  st_point(c(-111, 33), dim = "XY"),
  st_point(c(-112, 33), dim = "XY"),
  st_point(c(-113, 33), dim = "XY")
)

# Write to PostGIS database
st_write(
  st_as_sf(cbind(data, geom), crs = 4326),
  dsn = con,
  layer = "data",
  overwrite = TRUE
)
#> Note: method with signature 'DBIObject#sf' chosen for function 'dbDataType',
#>  target signature 'PqConnection#sf'.
#>  "PqConnection#ANY" would also be valid

# Reading entire table and each individual row now works as expected with RPostgres package
st_read(con, layer = "data")
#> Simple feature collection with 3 features and 5 fields
#> geometry type:  POINT
#> dimension:      XY
#> bbox:           xmin: -113 ymin: 33 xmax: -111 ymax: 33
#> epsg (SRID):    4326
#> proj4string:    +proj=longlat +datum=WGS84 +no_defs
#>   id code  category status col_that_is_often_na        geometry
#> 1  1   AA Amphibian      Y                   LT POINT (-111 33)
#> 2  2   AB      Bird   <NA>                   SC POINT (-112 33)
#> 3  3   AF      Fish      N                 <NA> POINT (-113 33)

st_read(con, query = "select * from data where id = 1")
#> Simple feature collection with 1 feature and 5 fields
#> geometry type:  POINT
#> dimension:      XY
#> bbox:           xmin: -111 ymin: 33 xmax: -111 ymax: 33
#> epsg (SRID):    4326
#> proj4string:    +proj=longlat +datum=WGS84 +no_defs
#>   id code  category status col_that_is_often_na        geometry
#> 1  1   AA Amphibian      Y                   LT POINT (-111 33)


st_read(con, query = "select * from data where id = 2")
#> Simple feature collection with 1 feature and 5 fields
#> geometry type:  POINT
#> dimension:      XY
#> bbox:           xmin: -112 ymin: 33 xmax: -112 ymax: 33
#> epsg (SRID):    4326
#> proj4string:    +proj=longlat +datum=WGS84 +no_defs
#>   id code category status col_that_is_often_na        geometry
#> 1  2   AB     Bird   <NA>                   SC POINT (-112 33)


st_read(con, query = "select * from data where id = 3")
#> Simple feature collection with 1 feature and 5 fields
#> geometry type:  POINT
#> dimension:      XY
#> bbox:           xmin: -113 ymin: 33 xmax: -113 ymax: 33
#> epsg (SRID):    4326
#> proj4string:    +proj=longlat +datum=WGS84 +no_defs
#>   id code category status col_that_is_often_na        geometry
#> 1  3   AF     Fish      N                 <NA> POINT (-113 33)

Created on 2018-07-27 by the reprex package (v0.2.0).

Interestingly we were able to replicate this error with RPostgreSQL in this issue and the issue this was fixed by #850. Here is a reprex just because I stumbled on this issue:

packageVersion("sf")
#> [1] '0.7.0'

library(tidyverse)
library(sf)
#> Linking to GEOS 3.6.1, GDAL 2.2.3, PROJ 4.9.3
library(RPostgreSQL)
#> Loading required package: DBI

con <- dbConnect(PostgreSQL(), 
                  host = "localhost", 
                  dbname = "postgis_in_action", 
                  port = '5432', 
                  user = "postgres")

#Make up data, with rows 2 and 3 having an NA value
data <- tribble(
  ~id, ~code, ~category, ~status, ~col_that_is_often_na,
  1, "AA", "Amphibian", "Y", "LT",
  2, "AB", "Bird", NA, "SC",
  3, "AF", "Fish", "N", NA
)

# Make up geometry
geom <- st_sfc(
  st_point(c(-111, 33), dim = "XY"),
  st_point(c(-112, 33), dim = "XY"),
  st_point(c(-113, 33), dim = "XY")
)

# Write to PostGIS database
st_write(
  st_as_sf(cbind(data, geom), crs = 4326),
  dsn = con,
  layer = "data",
  overwrite = TRUE
)
#> [1] TRUE

# Reading entire table works as expected
st_read(con, layer = "data")
#> Simple feature collection with 3 features and 5 fields
#> geometry type:  POINT
#> dimension:      XY
#> bbox:           xmin: -113 ymin: 33 xmax: -111 ymax: 33
#> epsg (SRID):    4326
#> proj4string:    +proj=longlat +datum=WGS84 +no_defs
#>   id code  category status col_that_is_often_na        geometry
#> 1  1   AA Amphibian      Y                   LT POINT (-111 33)
#> 2  2   AB      Bird   <NA>                   SC POINT (-112 33)
#> 3  3   AF      Fish      N                 <NA> POINT (-113 33)

# So does using a query to select first row
st_read(con, query = "select * from data where id = 1")
#> Simple feature collection with 1 feature and 5 fields
#> geometry type:  POINT
#> dimension:      XY
#> bbox:           xmin: -111 ymin: 33 xmax: -111 ymax: 33
#> epsg (SRID):    4326
#> proj4string:    +proj=longlat +datum=WGS84 +no_defs
#>   id code  category status col_that_is_often_na        geometry
#> 1  1   AA Amphibian      Y                   LT POINT (-111 33)

# This instance fixed
st_read(con, query = "select * from data where id = 2")
#> Simple feature collection with 1 feature and 5 fields
#> geometry type:  POINT
#> dimension:      XY
#> bbox:           xmin: -112 ymin: 33 xmax: -112 ymax: 33
#> epsg (SRID):    4326
#> proj4string:    +proj=longlat +datum=WGS84 +no_defs
#>   id code category status col_that_is_often_na        geometry
#> 1  2   AB     Bird   <NA>                   SC POINT (-112 33)

# As well as this one
st_read(con, query = "select * from data where id = 3")
#> Simple feature collection with 1 feature and 5 fields
#> geometry type:  POINT
#> dimension:      XY
#> bbox:           xmin: -113 ymin: 33 xmax: -113 ymax: 33
#> epsg (SRID):    4326
#> proj4string:    +proj=longlat +datum=WGS84 +no_defs
#>   id code category status col_that_is_often_na        geometry
#> 1  3   AF     Fish      N                 <NA> POINT (-113 33)

Created on 2018-09-27 by the reprex package (v0.2.1)

Was this page helpful?
0 / 5 - 0 ratings