Pandoc: [markdown reader] indentation in HTML blocks is parsed as code block when markdown_in_html_blocks is enabled

Created on 25 Dec 2014  路  5Comments  路  Source: jgm/pandoc

This is an example from the documentation:

$ pandoc --version
pandoc 1.13.2
...
$ cat test.markdown
<table>
    <tr>
        <td>*one*</td>
        <td>[a link](http://google.com)</td>
    </tr>
</table>
$ pandoc --from=markdown+raw_html+markdown_in_html_blocks test.markdown 
<table>
<pre><code>&lt;tr&gt;
    &lt;td&gt;*one*&lt;/td&gt;
    &lt;td&gt;[a link](http://google.com)&lt;/td&gt;
&lt;/tr&gt;</code></pre>
</table>

The documentation says that I should get

$ pandoc --from=markdown+raw_html+markdown_in_html_blocks test.markdown 
<table>
    <tr>
        <td><em>one</em></td>
        <td><a href="http://google.com">a link</a></td>
    </tr>
</table>
bug Markdown reader more-discussion-needed

All 5 comments

Indeed, this seems to be a regression. (I just tried with pandoc 1.9.4.1 and got the right result.) I'm not sure which release broke this, but I suspect the culprit is this change in version 1.13:

    + Revamped raw HTML block parsing in markdown (#1330).
      We no longer include trailing spaces and newlines in the
      raw blocks.  We look for closing tags for elements (but without
      backtracking).  Each block-level tag is its own `RawBlock`;
      we no longer try to consolidate them (though `--normalize` will do so).

Previously we parsed clumps of raw HTML tags as one block. With this change, each tag went into its own block. But that had the side effect that the indented tag gets parsed as an indented code block.

Actually, it's a bit unclear what the behavior _should_ be. If we're really parsing markdown inside HTML tags, then anything indented four spaces should be a code block, which is exactly what we see in 1.13.2.

This has turned out as a bug in my htmlTable R package. I have a fix for the issue consisting of removing the tabs but this removes the readability if I want to look at the raw output. If possible a solution using a comment tag may be an option:

<!-- Start raw html -->
<table class='gmisc_table' style='border-collapse: collapse;' >
    <thead>
    <tr>
        <th> </th>
        <th>Header</th>
    </tr>
    </thead>
    <tbody>
    <tr>
        <td>Row 1</td>
        <td>Value</td>
    </tr>
    </tbody>
</table>
<!-- End raw html -->

Perhaps an extension to turn off automatic code blocks for indented lines would be a good workaround?

This appears to happen not just for indented HTML, but within any HTML element. For example, unindenting the simple example above is a fix, but:

$ pandoc --version
pandoc 1.17.0.2
Compiled with texmath 0.8.5, highlighting-kate 0.6.2.

$ cat test2.markdown
<table>
<tr>
<td>    *one*</td>
<td>    [a link](http://google.com)</td>
</tr>
</table>

$ pandoc --from=markdown+raw_html+markdown_in_html_blocks test2.markdown
<table>
<tr>
<td>
<pre><code>*one*&lt;/td&gt;</code></pre>
<td>
<pre><code>[a link](http://google.com)&lt;/td&gt;</code></pre>
</tr>
</table>

or

$ cat test3.markdown
<table>
<tr>
<td>    *one*</td> <td>    [a link](http://google.com)</td>
</tr>
</table>

$ /usr/bin/pandoc --from=markdown+raw_html+markdown_in_html_blocks test3.markdown
<table>
<tr>
<td>
<pre><code>*one*&lt;/td&gt; &lt;td&gt;    [a link](http://google.com)&lt;/td&gt;</code></pre>
</tr>
</table>

(I encountered this when embedding raw HTML tables, generated by the R xtable package, that had "too many" leading spaces before some of the numbers, and one HTML table row per line.)

Being able to embed line-oriented markdown within HTML elements is a nice feature, but wouldn't it make sense to insist that it be at the start of an actual line, given that HTML (roughly) doesn't care? E.g.,

$ cat test4.markdown
<table>
<tr>
<td>
    This is a
    multi-line code block
</td> <td>
* [a link](http://google.com)
* another item
</td>
</tr>
</table>

$ /usr/bin/pandoc --from=markdown+raw_html+markdown_in_html_blocks --to=html test5.markdown
<table>
<tr>
<td>
<pre><code>This is a
multi-line code block</code></pre>
</td>
<td>
<ul>
<li><a href="http://google.com">a link</a></li>
<li>another item</li>
</ul>
</td>
</tr>
</table>
Was this page helpful?
0 / 5 - 0 ratings