Pandoc: do not start a new paragraph after list

Created on 7 Dec 2018  Â·  17Comments  Â·  Source: jgm/pandoc

After a list, for example

- a
- b

or

1. a
0. b

pandoc starts a new paragraph. If indent=true, then the text line following the list is indented but should not be. Note that this does not occur with indent=false, as is the default, because parskip is used.

Most helpful comment

Furthermore, pandoc's AST helps to prevent invalid HTML output. HTML requires phrasing content in <p> elements, but lists (<ol> or <ul>) are flow content, not phrasing content.

You can test this with the w3 validator and the following input:

<!DOCTYPE html>
<head><title>test</title>
<p>Hello
  <ul>
    <li>one</li>
    <li>two</li>
  </ul>
</p>

This fails to validate, as the <ul> implicitly closes the preceding <p> element. The final closing tag is therefor out of place.

All 17 comments

To illustrate, with indent=true, it is

- a
- b
    Hello!

whereas without it is
```

  • a
  • b

Hello!

Also note that the indentation after a lists that lies inside a list is correct, that is, no indentation.

Regarding the generated TeX code, it is a question of removing the empty line generated by pandoc that follows

\begin{itemize}
...
\end{itemize}

(respectively enumerate in lieu of itemize.

Perhaps also interesting to note that for example the KOMA document classes automatically insert more vertical spacing after lists (and enumerations), even if parskip is set to zero: https://komascript.de/node/1657

Therefore, no need to add an extra empty line in the TeX source code by pandoc.

Thank you for submitting a bug report. However, we cannot improve pandoc or help you unless you give us all of the following information:

  • the pandoc version (check using pandoc -v and try to reproduce the bug with the latest released version of pandoc, or even better: the development version or nightly build.)
  • the exact command line used
  • the exact input used
  • the output received
  • the output you expected instead

Finally, ask questions on pandoc-discuss and read the User's Guide.

Version:

➜ ~ pandoc -v
pandoc 2.5
Compiled with pandoc-types 1.17.5.4, texmath 0.11.1.2, skylighting 0.7.4

Command:

    pandoc --from markdown --to latex file.pandoc --output output.tex

Input:

Hello!

- a
- b

Bye!

Output:

Hello!

\begin{itemize}
\tightlist
\item
  a
\item
  b
\end{itemize}

Bye!

Expected:

Hello!

\begin{itemize}
\tightlist
\item
  a
\item
  b
\end{itemize}
Bye!

Thanks, I see now what you were expecting. But I still don't understand why you would expect that. Markdown lists are block elements, just like paragraphs.

You can use pandoc -t native to see how the document is represented in pandoc's internal AST.

With indent=true the additional indentation after a list, caused by the empty following the list in the latex code generated by pandoc, is unnecessary. The vertical space following the list is already sufficient to distinguish the list from the paragraph that follows.

With indent=true, the PDF compiled from the TeX code generated by pandoc shows

- a
- b
    Bye!

whereas without the additional empty line it would show

- a 
- b

Bye!

To me, the latter is aesthetically more pleasing.

From the MANUAL:

Pandoc attempts to preserve the structural elements of a document, but not formatting details such as margin size.

Indentation is also a formatting detail. The pandoc document your markdown represents is a list followed by a paragraph (try -t html to see that better). And in LaTeX, a paragraph is proceeded by a newline. So there's nothing wrong with the current output.

If you want to change the LaTeX layout to not indent a paragraph followed by a list, you can change your LaTeX template or use the pandoc header-includes variable to inject LaTeX layout instructions. See e.g. https://tex.stackexchange.com/questions/112404/reliable-code-for-automatic-noindent-after-specific-environments

The problem is that currently pandoc lets the author no choice when indent is true: Instead of indenting every line that follows a list , there might be lists that end a paragraph, but others not. For example, not uncommon in academia might be

If

- a complicated condition, a slew of adjectives,
- another complicated condition, a slew of formulas,

then this and that holds.

where no indentation is expected to precede the line then this and that holds in the compiled document.
Whereas after

start of paragraph 
...
big elaboration
...
So we conclude

- this and
- that.

Start of new paragraph...

there an indentation that precedes the line Start of new paragraph is expected.

Right now, I do not know how to write a markdown file where, with indent being true, pandoc

  • adds no indentation to the first line following the former list, and
  • adds an indentation to the first line following the latter list.

To me it seems that for example in LaTeX, pandoc always adds an empty line after the list, thus forcing the indentation of the first line following the list, even in the former example where it is undesired.
The solution proposed, using

the pandoc header-includes variable to inject LaTeX layout instructions. See e.g. https://tex.stackexchange.com/questions/112404/reliable-code-for-automatic-noindent-after-specific-environments

would achieve the exact opposite, now, even with indent being true, not indenting the first line following the latter list.

The pandoc AST I linked to above (here again) doesn't allow a paragraph to contain other Block content like a list, it only allows Inline. However, you can use raw TeX:

- this and
- that

\noindent
Start of new paragraph...

Okay, thank you again.

Are there technical reasons why a paragraph in pandoc cannot contain a list? After all, when for example TeX permits it, then why not pandoc? If this is not a technical problem, could it be discussed?

From the MANUAL:

Because pandoc’s intermediate representation of a document is less expressive than many of the formats it converts between, one should not expect perfect conversions between every format and every other. Pandoc attempts to preserve the structural elements of a document, but not formatting details such as margin size. And some document elements, such as complex tables, may not fit into pandoc’s simple document model. While conversions from pandoc’s Markdown to all formats aspire to be perfect, conversions from formats more expressive than pandoc’s Markdown can be expected to be lossy.

So it's partly a design choice (markdown doesn't have it), and partly arbitrary. But now it would be very difficult to the change the pandoc document AST in such a way, since basically all of pandoc would need to be rewritten to accommodate paragraphs containing other block elements.

Furthermore, pandoc's AST helps to prevent invalid HTML output. HTML requires phrasing content in <p> elements, but lists (<ol> or <ul>) are flow content, not phrasing content.

You can test this with the w3 validator and the following input:

<!DOCTYPE html>
<head><title>test</title>
<p>Hello
  <ul>
    <li>one</li>
    <li>two</li>
  </ul>
</p>

This fails to validate, as the <ul> implicitly closes the preceding <p> element. The final closing tag is therefor out of place.

Okay, thank you, apparently the creators ofHTML were firmly convinced that a paragraph could never include a list, as opposed to the creators of TeX, for example. Not sure about odt, docx, ...

@Konfekt - this is an unfortunate limitation of pandoc, I agree. I often wish for the ability to put a list or blockquote in a paragraph, as you can in LaTeX, and resort to raw \noindent.

Perhaps we could brainstorm about solutions, but not here (better on pandoc-discuss).

Was this page helpful?
0 / 5 - 0 ratings

Related issues

naught101 picture naught101  Â·  5Comments

elliottslaughter picture elliottslaughter  Â·  4Comments

krobelus picture krobelus  Â·  4Comments

acate picture acate  Â·  3Comments

timtroendle picture timtroendle  Â·  3Comments