I have a Markdown document containing the HTML character entity →. When I convert this to HTML using pandoc -o myfile.html myfile.md, the character is converted to a UTF-8 encoded right arrow character, which my browser displays as an ugly jumble →. Other character entities like &, on the other hand, are preserved correctly as inline HTML.
A workaround to this is to include a tag
<meta http-equiv="Content-Type" content="text/html; charset=utf-8">
at the beginning of my Markdown document, but that seems a little inelegant as I can't assume that any Markdown converter will produce UTF-8 encoded output. IMHO, pandoc should either consequently preserve HTML character entites, or properly announce UTF-8 encoding in the HTML output.
I'm using pandoc on Windows:
$ pandoc -v
pandoc 1.11.1
Compiled with citeproc-hs 0.3.8, texmath 0.6.1.3, highlighting-kate 0.5.3.8
...
Pandoc converts all entities to unicode characters. That is because it needs to handle output formats other than HTML.
If you use the -s flag to create a standalone document, pandoc will apply its default template, which includes the meta tag specifying UTF-8.
Another option is to use the --ascii flag, which will cause → to be output as → (the equivalent character).
Thanks a lot!
Most helpful comment
Pandoc converts all entities to unicode characters. That is because it needs to handle output formats other than HTML.
If you use the
-sflag to create a standalone document, pandoc will apply its default template, which includes the meta tag specifying UTF-8.Another option is to use the
--asciiflag, which will cause→to be output as→(the equivalent character).