Pandoc: Pandoc (Markdown to HTML) converts some character entities to UTF-8

Created on 3 May 2013 · 2Comments · Source: jgm/pandoc

I have a Markdown document containing the HTML character entity →. When I convert this to HTML using pandoc -o myfile.html myfile.md, the character is converted to a UTF-8 encoded right arrow character, which my browser displays as an ugly jumble â†’. Other character entities like &, on the other hand, are preserved correctly as inline HTML.

A workaround to this is to include a tag
<meta http-equiv="Content-Type" content="text/html; charset=utf-8">
at the beginning of my Markdown document, but that seems a little inelegant as I can't assume that any Markdown converter will produce UTF-8 encoded output. IMHO, pandoc should either consequently preserve HTML character entites, or properly announce UTF-8 encoding in the HTML output.

I'm using pandoc on Windows:

$ pandoc -v
pandoc 1.11.1
Compiled with citeproc-hs 0.3.8, texmath 0.6.1.3, highlighting-kate 0.5.3.8
...

Source

bencarabelli

Most helpful comment

Pandoc converts all entities to unicode characters. That is because it needs to handle output formats other than HTML.

If you use the -s flag to create a standalone document, pandoc will apply its default template, which includes the meta tag specifying UTF-8.

Another option is to use the --ascii flag, which will cause → to be output as → (the equivalent character).

jgm on 3 May 2013

👍7 ❤1

All 2 comments

Pandoc converts all entities to unicode characters. That is because it needs to handle output formats other than HTML.

If you use the -s flag to create a standalone document, pandoc will apply its default template, which includes the meta tag specifying UTF-8.

Another option is to use the --ascii flag, which will cause → to be output as → (the equivalent character).

jgm on 3 May 2013

👍7 ❤1

Thanks a lot!

bencarabelli on 3 May 2013

Was this page helpful?

0 / 5 - 0 ratings

Related issues

The docx reader does not parse figures made with a styled paragraph

danse · 3Comments

Headers 4 levels deep render differently

chrissound · 4Comments

org mode headings past level three converted to numbered outline list

acate · 3Comments

Smart quotes don't work for multi-paragraph quotations

johnridesabike · 4Comments

Image caption is missing if ATTR_LATEX goes after CAPTION in org mode

velimir · 4Comments