Sf: st_write tries to convert encoding to ISO-8859-1

Created on 21 Dec 2018 · 5Comments · Source: r-spatial/sf

When using sf_write to write a shapefile, it always convert the encoding to ISO-8859-1. The information of my session is:

> sessionInfo()
R version 3.5.1 (2018-07-02)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Debian GNU/Linux 9 (stretch)

Matrix products: default
BLAS: /usr/lib/openblas-base/libblas.so.3
LAPACK: /usr/lib/libopenblasp-r0.2.19.so

locale:
 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C               LC_TIME=en_US.UTF-8
 [4] LC_COLLATE=en_US.UTF-8     LC_MONETARY=en_US.UTF-8    LC_MESSAGES=C
  [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                  LC_ADDRESS=C
 [10] LC_TELEPHONE=C             LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C

 attached base packages:
 [1] stats     graphics  grDevices utils     datasets  methods   base

 other attached packages:
 [1] sf_0.7-1       nvimcom_0.9-75 colorout_1.2-0

 loaded via a namespace (and not attached):
  [1] compiler_3.5.1 magrittr_1.5   class_7.3-14   DBI_1.0.0      tools_3.5.1    units_0.6-2
  [7] Rcpp_1.0.0     grid_3.5.1     e1071_1.7-0    classInt_0.2-3 spData_0.2.9.6

Here is a reproducible example:

> library(sf)
Linking to GEOS 3.5.1, GDAL 2.1.2, PROJ 4.9.3
>
> nc <- st_read(system.file("shape/nc.shp", package="sf"))
Reading layer `nc' from data source `/usr/local/lib/R/site-library/sf/shape/nc.shp' using driver `ESRI Shapefile'
Simple feature collection with 100 features and 14 fields
geometry type:  MULTIPOLYGON
dimension:      XY
bbox:           xmin: -84.32385 ymin: 33.88199 xmax: -75.45698 ymax: 36.58965
epsg (SRID):    4267
proj4string:    +proj=longlat +datum=NAD27 +no_defs
> nc$test <- "èáéíñ字" # this is added to create the warning
> st_write(nc, "nc.shp", delete_layer = TRUE)
Deleting layer `nc' using driver `ESRI Shapefile'
Writing layer `nc' to data source `nc.shp' using driver `ESRI Shapefile'
features:       100
fields:         15
geometry type:  Multi Polygon
Warning message:
In CPL_write_ogr(obj, dsn, layer, driver, as.character(dataset_options),  :
                   GDAL Message 1: One or several characters couldn't be converted correctly from UTF-8 to ISO-8859-1.
                   This warning will not be emitted anymore.

Is there a way to force it to save with encoding UTF-8. Thanks in advance.

Source

ErickChacon

Most helpful comment

I didn't want to bring this up but fully agree. Having to read shapefiles is something you can't always avoid, but writing them is something you can avoid, and should avoid, by all means.

edzer on 22 Dec 2018

👍2

All 5 comments

st_write(nc, "nc.shp", layer_options = "ENCODING=UTF-8", delete_layer = TRUE)

I just found out that I can use the above code to solve the problem. However, I still do not understand why, by default, it tries to change the encoding to ISO-8859-1.

ErickChacon on 21 Dec 2018

👍2

Maybe this is a shapefile property of a GDAL-shapefile-driver property; see https://www.gdal.org/drv_shapefile.html

edzer on 22 Dec 2018

You may also consult the rgdal vignette: https://cran.r-project.org/web/packages/rgdal/vignettes/OGR_shape_encoding.pdf. On Linux, your GDAL should have been built with iconv support, but you can check as shown in the vignette. Because DBF is unpredictable when handling non-single byte characters, many now prefer to use SQLite or better GPKG, which do handle UTF-8 properly.

rsbivand on 22 Dec 2018

I didn't want to bring this up but fully agree. Having to read shapefiles is something you can't always avoid, but writing them is something you can avoid, and should avoid, by all means.

edzer on 22 Dec 2018

👍2

I can see that it is due to GDAL properties. Thank you @edzer and @rsbivand for the links and the suggestion. I will try out the other alternatives.

ErickChacon on 22 Dec 2018

Was this page helpful?

0 / 5 - 0 ratings

Related issues

st_sample size

WWolf · 25Comments

GEOS 3.7.1 to 3.7.2 tightening of validity needed for operations

rsbivand · 24Comments

geom_sf()

edzer · 49Comments

writing gpkg and sqlite on samba shares fails

rnuske · 58Comments

How can I set INTERLEAVED_READING options with PBF files

agila5 · 34Comments