Suppose I have a .Rmd file like below:
---
title: "Untitled"
output:
officedown::rdocx_document:
default
---
```{r setup, include=FALSE}
knitr::opts_chunk$set(echo = TRUE)
md5-4caffc4710057148fdad652000287a10
# Chapter1 {#ch1}
# Chapter2 {#ch2}
Refer to \@ref(ch1).
When \@ref(ch1) is surrounded by multibyte strings (e.g., Chinese characters), it would possibly encounter errors.
Pure multibyte + ref
上下\@ref(ch1)
Mixed multibyte/singlebyte + ref
上a下\@ref(ch1)
ref + multibyte
\@ref(ch1)。
Error in nchar(u, itype) : invalid multibyte string, element 1
Calls:... regmatches<- -> regmatches -> Map -> mapply ->
Can you please look into this issue? Thanks.
sessionInfo()
R version 4.0.2 (2020-06-22)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 10 x64 (build 20180)
Matrix products: default
locale:
[1] LC_COLLATE=Chinese (Simplified)_China.936
[2] LC_CTYPE=Chinese (Simplified)_China.936
[3] LC_MONETARY=Chinese (Simplified)_China.936
[4] LC_NUMERIC=C
[5] LC_TIME=Chinese (Simplified)_China.936
attached base packages:
[1] stats graphics grDevices utils datasets methods
[7] base
other attached packages:
[1] officer_0.3.12 officedown_0.2.0 flextable_0.5.10
[4] ggplot2_3.3.2 tidyr_1.1.1 knitr_1.29
[7] dplyr_1.0.2 reticulate_1.16
loaded via a namespace (and not attached):
[1] Rcpp_1.0.5 lattice_0.20-41 prettyunits_1.1.1
[4] sysfonts_0.8.1 ps_1.3.4 utf8_1.1.4
[7] rprojroot_1.3-2 assertthat_0.2.1 digest_0.6.25
[10] R6_2.4.1 backports_1.1.9 evaluate_0.14
[13] pillar_1.4.6 gdtools_0.2.2 rlang_0.4.7
[16] curl_4.3 uuid_0.1-4 data.table_1.13.0
[19] callr_3.4.3 Matrix_1.2-18 rmarkdown_2.3
[22] desc_1.2.0 labeling_0.3 devtools_2.3.1
[25] stringr_1.4.0 munsell_0.5.0 tinytex_0.25
[28] compiler_4.0.2 xfun_0.16 pkgconfig_2.0.3
[31] systemfonts_0.2.3 base64enc_0.1-3 pkgbuild_1.1.0
[34] rvg_0.2.5 htmltools_0.5.0 tidyselect_1.1.0
[37] tibble_3.0.3 bookdown_0.20 fansi_0.4.1
[40] crayon_1.3.4 showtextdb_3.0 withr_2.2.0
[43] grid_4.0.2 jsonlite_1.7.0 gtable_0.3.0
[46] lifecycle_0.2.0 magrittr_1.5 scales_1.1.1
[49] zip_2.1.0 cli_2.0.2 stringi_1.4.6
[52] farver_2.0.3 fs_1.5.0 remotes_2.2.0
[55] testthat_2.3.2 xml2_1.3.2 ellipsis_0.3.1
[58] generics_0.0.2 vctrs_0.3.2 tools_4.0.2
[61] showtext_0.9 glue_1.4.1 purrr_0.3.4
[64] processx_3.4.3 pkgload_1.1.0 yaml_2.2.1
[67] colorspace_1.4-1 sessioninfo_1.1.1 memoise_1.1.0
[70] usethis_1.6.1
```````
title: "Untitled"
output:
officedown::rdocx_document:
{r setup, include=FALSE}
knitr::opts_chunk$set(echo = TRUE)
Refer to @ref(ch1).
When @ref(ch1) is surrounded by multibyte strings (e.g., Chinese characters), it would possibly encounter errors.
Your issue is related to the fact you are not working with a UTF-8 encoded file.
R, R Markdown and Windows does not work well when encoding is not UTF-8.
Yes, @davidgohel, you are right. Althougth the .Rmd file is in UTF-8, the OS is running on GBK encoding. When I change to bookdown::word_document2
, the knitr engine manages to compile the file. But I still get ?? where the bookmark is supposed to appear.
You don't need to try new output format functions.
The result shown below is made with a Windows with french locale. But I made sure the file was encoded as UTF-8 (I am using readr::guess_encoding()
, if not UTF-8 encoded, I can change it to UTF8 with fpeek::peek_iconv()
).
Could you show the result of
readr::guess_encoding("your/rmd/file")
The results are
no | encoding | confidence
---|-------------|-----------:
1 | UTF-8 | 1
2 | windows-1252 | 0.28
Hi @madlogos,
I am aslo a Chinese user. The multibyte problem has also bothered me for a long time. Here is my trick for it:
@ref
as usual;readr::read_lines
it;"\\\\@ref\\([^\\)]+\\)"
pattern;"\\\\@ref\\([^\\)]+\\)"
on a single line;For example, 请参考表\@ref(tab: coco)中的数据
should be splited as
[line 1] 请参考表
[line 2] \@ref(tab: coco)
[line 3] 中的数据
Well, I am not sure if this is an effective solution but it works for me. 😄
@bishun945 thank you for the turn-around. Good stuff.
@madlogos I have tried another solution: just switch your system and MS Word language to English.