Pandoc: Non-breaking spaces in HTML ignored

Created on 23 Feb 2020  路  3Comments  路  Source: jgm/pandoc

The user manual states:

A backslash-escaped space is parsed as a nonbreaking space. It will appear in TeX output as ~ and in HTML and XML as \  or \ .

While it works for TeX, it doesn't seem to work for HTML in my case.

$ echo "jo\ what " | pandoc -t html
<p>jo聽what</p>

$ echo "jo\ what " | pandoc -t latex
jo~what

$ echo "jo\ what " | pandoc -t json
{"blocks":[{"t":"Para","c":[{"t":"Str","c":"jo聽what"}]}],"pandoc-api-version":[1,20],"meta":{}}

pandoc: 2.9.2

docs

Most helpful comment

Ok, thanks for letting me know. So this means the feature does actually work and only the documentation requires an update? I can have a look at that, but will need some time. ;)

All 3 comments

You are right, the manual is a bit misleading there. The non-breaking
space is inserted via the unicode character in most formats, including
HTML. You can use the --ascii option to create an entity instead:

% echo "jo\ what" | pandoc -t html | hexdump
0000000 703c 6a3e c26f 77a0 6168 3c74 702f 0a3e
0000010

% echo "jo what" | pandoc -t html | hexdump
0000000 703c 6a3e 206f 6877 7461 2f3c 3e70 000a
000000f

% echo "jo\ what" | pandoc -t html --ascii

jo what

Thanks for noticing. Pull requests are always welcome. ;)

Ok, thanks for letting me know. So this means the feature does actually work and only the documentation requires an update? I can have a look at that, but will need some time. ;)

@timtroendle Done :slightly_smiling_face:

Was this page helpful?
0 / 5 - 0 ratings