pandoc: Cannot decode byte '\xf6': Data.Text.Internal.Encoding.streamDecodeUtf8With: Invalid UTF-8 stream

Created on 18 Jun 2019  路  8Comments  路  Source: jgm/pandoc

I want to convert the tex file into a docx document using the following command:

pandoc -f latex -t docx -o master.docx --bibliography ./proposal.bib master.tex

but pandoc gave the following error:

pandoc: Cannot decode byte '\xf6': Data.Text.Internal.Encoding.streamDecodeUtf8With: Invalid UTF-8 stream

I am assured that the tex file is saved as the utf-8 format, but still the problem occurred.

more-info-needed

Most helpful comment

I ran into similar problems and thought I would give some more details for others searching for the same issue.

A (quite) minimal example of the error:

\usepackage{amsaddr}

with amsaddr downloaded from https://www.ctan.org/tex-archive/macros/latex/contrib/amsaddr.

 $ pandoc --version
pandoc 2.7.3
Compiled with pandoc-types 1.17.5.4, texmath 0.11.2.2, skylighting 0.8.1
Default user data directory: /localhome/scstr/.local/share/pandoc or /localhome/scstr/.pandoc
Copyright (C) 2006-2019 John MacFarlane
Web:  http://pandoc.org
This is free software; see the source for copying conditions.
There is no warranty, not even for merchantability or fitness
for a particular purpose.
$ echo "\usepackage{amsaddr}" | pandoc -t native
pandoc: Cannot decode byte '\xe9': Data.Text.Internal.Encoding.streamDecodeUtf8With: Invalid UTF-8 stream

Clearly the iconv fix doesn't make any difference here in the way suggest at https://pandoc.org/MANUAL.html#character-encoding but a possible fix is to look into amsaddr.sty and remove the bad characters from line 9

- %% Copyright (C) 2006 by J锟絩锟絤e Lelong <[email protected]>
+ %% Copyright (C) 2006 by Lelong <[email protected]>

Then we have:

$ echo "\usepackage{amsaddr}" | pandoc -t native
[RawBlock (Format "tex") "\\usepackage{amsaddr}"]

The bad characters are in a comment so perhaps you would hope that this does not cause problems. Either way, this is a possible fix for users.

All 8 comments

Can you reduce your file to a smallest possible test case that exhibits the issue?

I am assured that the tex file is saved as the utf-8 format

Just in case, try: https://pandoc.org/MANUAL.html#character-encoding

I am assured that the tex file is saved as the utf-8 format

Just in case, try: https://pandoc.org/MANUAL.html#character-encoding

Thank you for your reply, I tried the command line as you suggested, but still failed.

Can you reduce your file to a smallest possible test case that exhibits the issue?

You must be using an older version of pandoc, because recent versions will give the byte position of the decoding error; that may help in tracking this down. Your input must not be valid UTF-8.

Check the bib file too!

Closing until we have a reproducible example...

I ran into similar problems and thought I would give some more details for others searching for the same issue.

A (quite) minimal example of the error:

\usepackage{amsaddr}

with amsaddr downloaded from https://www.ctan.org/tex-archive/macros/latex/contrib/amsaddr.

 $ pandoc --version
pandoc 2.7.3
Compiled with pandoc-types 1.17.5.4, texmath 0.11.2.2, skylighting 0.8.1
Default user data directory: /localhome/scstr/.local/share/pandoc or /localhome/scstr/.pandoc
Copyright (C) 2006-2019 John MacFarlane
Web:  http://pandoc.org
This is free software; see the source for copying conditions.
There is no warranty, not even for merchantability or fitness
for a particular purpose.
$ echo "\usepackage{amsaddr}" | pandoc -t native
pandoc: Cannot decode byte '\xe9': Data.Text.Internal.Encoding.streamDecodeUtf8With: Invalid UTF-8 stream

Clearly the iconv fix doesn't make any difference here in the way suggest at https://pandoc.org/MANUAL.html#character-encoding but a possible fix is to look into amsaddr.sty and remove the bad characters from line 9

- %% Copyright (C) 2006 by J锟絩锟絤e Lelong <[email protected]>
+ %% Copyright (C) 2006 by Lelong <[email protected]>

Then we have:

$ echo "\usepackage{amsaddr}" | pandoc -t native
[RawBlock (Format "tex") "\\usepackage{amsaddr}"]

The bad characters are in a comment so perhaps you would hope that this does not cause problems. Either way, this is a possible fix for users.

Was this page helpful?
0 / 5 - 0 ratings