I want to convert the tex file into a docx document using the following command:
pandoc -f latex -t docx -o master.docx --bibliography ./proposal.bib master.tex
but pandoc gave the following error:
pandoc: Cannot decode byte '\xf6': Data.Text.Internal.Encoding.streamDecodeUtf8With: Invalid UTF-8 stream
I am assured that the tex file is saved as the utf-8 format, but still the problem occurred.
Can you reduce your file to a smallest possible test case that exhibits the issue?
I am assured that the tex file is saved as the utf-8 format
Just in case, try: https://pandoc.org/MANUAL.html#character-encoding
I am assured that the tex file is saved as the utf-8 format
Just in case, try: https://pandoc.org/MANUAL.html#character-encoding
Thank you for your reply, I tried the command line as you suggested, but still failed.
Can you reduce your file to a smallest possible test case that exhibits the issue?
You must be using an older version of pandoc, because recent versions will give the byte position of the decoding error; that may help in tracking this down. Your input must not be valid UTF-8.
Check the bib file too!
Closing until we have a reproducible example...
I ran into similar problems and thought I would give some more details for others searching for the same issue.
A (quite) minimal example of the error:
\usepackage{amsaddr}
with amsaddr downloaded from https://www.ctan.org/tex-archive/macros/latex/contrib/amsaddr.
$ pandoc --version
pandoc 2.7.3
Compiled with pandoc-types 1.17.5.4, texmath 0.11.2.2, skylighting 0.8.1
Default user data directory: /localhome/scstr/.local/share/pandoc or /localhome/scstr/.pandoc
Copyright (C) 2006-2019 John MacFarlane
Web: http://pandoc.org
This is free software; see the source for copying conditions.
There is no warranty, not even for merchantability or fitness
for a particular purpose.
$ echo "\usepackage{amsaddr}" | pandoc -t native
pandoc: Cannot decode byte '\xe9': Data.Text.Internal.Encoding.streamDecodeUtf8With: Invalid UTF-8 stream
Clearly the iconv fix doesn't make any difference here in the way suggest at https://pandoc.org/MANUAL.html#character-encoding but a possible fix is to look into amsaddr.sty and remove the bad characters from line 9
- %% Copyright (C) 2006 by J锟絩锟絤e Lelong <[email protected]>
+ %% Copyright (C) 2006 by Lelong <[email protected]>
Then we have:
$ echo "\usepackage{amsaddr}" | pandoc -t native
[RawBlock (Format "tex") "\\usepackage{amsaddr}"]
The bad characters are in a comment so perhaps you would hope that this does not cause problems. Either way, this is a possible fix for users.
Most helpful comment
I ran into similar problems and thought I would give some more details for others searching for the same issue.
A (quite) minimal example of the error:
with
amsaddrdownloaded from https://www.ctan.org/tex-archive/macros/latex/contrib/amsaddr.Clearly the iconv fix doesn't make any difference here in the way suggest at https://pandoc.org/MANUAL.html#character-encoding but a possible fix is to look into amsaddr.sty and remove the bad characters from line 9
Then we have:
The bad characters are in a comment so perhaps you would hope that this does not cause problems. Either way, this is a possible fix for users.