Pandoc: Provide a way to exclude (some) unnumbered headers from TOC

Created on 14 Nov 2014 · 19Comments · Source: jgm/pandoc

The Pandoc LaTeX Writer adds \addcontentsline after any unnumbered section:

\section*{Section Heading}\label{section-heading}
\addcontentsline{toc}{section}{Section Heading}

This is clearly by design (hence I don't provide a MWE) from lines 662-666 of pandoc/src/Text/Pandoc/Writers/LaTeX.hs:

$$ if unnumbered
    then "\\addcontentsline{toc}" <>
        braces (text sectionType) <>
        braces txtNoNotes
else empty

This behavior is not correct (for LaTeX). The star in the LaTeX section command (\section*{}) removes section number and also removes the entry for the TOC? Why is it explicitly added back? I suppose the answer is that the Pandoc internal data type treats section numbering and TOC entries separately, and other formats (e.g. HTML) expect to see these sections in the TOC. But for that matter, what format besides LaTeX numbers sections anyway? Isn't the unnumber flag (# Section {-} or # Section {.unnumbered}) intended for LaTeX?

This has come up before on pandoc-discuss: https://groups.google.com/forum/#!searchin/pandoc-discuss/addcontentsline/pandoc-discuss/5yWDF28sDWU/ZzbGM9EU9RkJ

fiddlosopher offers the workaround hack:

\renewcommand{\addcontentsline}[3]{}

I prefer to deal with it via sed:

pandoc -f markdown-auto_identifiers -t latex Document.md | sed -e '/addcontentsline/d' > Document.tex

However, this fails on occasionally long subsection titles, which wrap.

I believe the correct patch is to just remove this code (just as it is ignored in LaTeX Reader), or enable it with a switch for compatibility.

LaTeX writer

Source

rjturner

👍5

Most helpful comment

OK, we need some kind of solution then.

There are definitely cases where you want unnumbered sections to appear in the TOC (actually, in most cases where I'd use unnumbered sections, I would want them in the TOC). So, maybe we could make this sensitive to a no-toc class or something like that. Ideally this would also work with numbered sections, though I'm not sure how to prevent those from getting in the toc.

jgm on 3 Dec 2015

👍5

All 19 comments

This was added in commit 11f74074454d8c58da18ee1b7e2480530baeb7e1.
I don't recall why. Perhaps because I had a test document in which
it was desired that an unnumbered section appear in the TOC?

Even in LaTeX, TOC and numbering are conceptually distinct, even
if the default for section* is to disable both. You can always
put an unnumbered section in the TOC.

I suppose we could add a toc class (like unnumbered) and make
it sensitive to this?

jgm on 14 Nov 2014

Hi John,

I agree with this conclusion. You are right that LaTeX provides two conceptually distinct actions through the single starred section syntax, and this creates some ambiguity for a parser like pandoc. However, I believe the correct resolution to this ambiguity is to do what LaTeX does already: remove both numbers from sections and sections from TOC. I don't know how this overlays with other markup languages (should the LaTeX Writer force starred TOC entries?), which is why, ultimately, I think this should be a switch (default: no TOC as in LaTeX).

Thanks, I would love to see this implemented!
Ryan

rjturner on 17 Nov 2014

For those who find this page later, the following perl will remove (possibly multi-line) addcontentsline commands from LaTeX output, provided your headings do not include { and }. It will also remove literal addcontentsline commands from your source file.

perl -00 -pe "s/^\\\addcontentsline(\{[^}]*\}){3}\n//ms"

Use it in a stream, thus:

pandoc -f markdown -t latex Document.md | perl -00 -pe "s/^\\\addcontentsline(\{[^}]*\}){3}\n//ms" > Document.tex

Or in post processing:

perl -00 -pie "s/^\\\addcontentsline(\{[^}]*\}){3}\n//ms" Document.tex

rjturner on 22 Nov 2014

👍1

I encountered this issue today, and the workaround mentioned by rjturner

\renewcommand{\addcontentsline}[3]{}

does not work at least in my case: it removes other items from the table of content as well.

randomizedthinking on 30 Nov 2015

Try putting \renewcommand{\addcontentsline}[3]{} right before the unnumbered sections.

jgm on 30 Nov 2015

Thanks, John! Now the renewcommand is put right before the unnumbered section, but the effect is still not what we want: it removed all items from the table of content, so basically we end up with an empty TOC. Something to do with how LaTeX handles such things.

randomizedthinking on 2 Dec 2015

OK, we need some kind of solution then.

jgm on 3 Dec 2015

👍5

I suggest an .unlisted class to be used to tell pandoc "don't put this section in TOC"

kuba-orlik on 27 Feb 2017

@kuba-orlik,

from the comments above

So, maybe we could make this sensitive to a no-toc class or something like that. Ideally this would also work with numbered sections, though I'm not sure how to prevent those from getting in the toc.

I think they already decided the class to use. While it is arbitrary, it seems to me it's better than .unlisted. Also see the 2 commits @chdemko did, which are already using no-toc.

ickc on 10 Mar 2017

I'm interested in this as well, it seems @chdemko started doing something related, to finally retract his change. I didn't quite get what was done, since there are commits present in his fork that are not present on the closed PR (https://github.com/jgm/pandoc-templates/pull/189). Here he says that a solution has been proposed but I don't see it.

I think a class that removes a Header from the TOC is a good idea, I vote for "unlisted".

saveman71 on 12 Mar 2017

👍2

I think I agree that unlisted is better than no-toc.

jgm on 12 Mar 2017

👍1

I think I agree that unlisted is better than no-toc.

unlisted might not be unambiguous enough (unlisted where?).

ickc on 12 Mar 2017

👍1

@ickc I think that it might be a good thing, in the way that in the markup language, the header is marked as unlisted, and that's it.

Then, the writer might decide to not list it in the TOC/wherever it actually _lists_ headers.

saveman71 on 12 Mar 2017

This is a bit more complex than I thought. Easy to implement an unlisted that only works with unnumbered, but harder to get one that works with numbered headers too. Probably the former is all that's needed, but people will assume the class can be used independently.

Putting this on the back burner for now.

jgm on 27 Jun 2017

FYI, I've created a filter for LaTeX output which solves this issue https://pypi.org/project/pandoc-latex-unlisted/

chdemko on 5 Jan 2018

👍4

This is a bit more complex than I thought. Easy to implement an unlisted that only works with unnumbered, but harder to get one that works with numbered headers too. Probably the former is all that's needed, but people will assume the class can be used independently.

Is numbered but not listed a case supported by LaTeX? This seems like the kind of thing document classes usually try to prevent because they are considered incorrect from a standpoint of typographic norms.

Regardless I agree with the overall rationale behind using another class tag for the additional case.

And while I agree that unlisted is a sensible name, it is important to remember that currently only one class name is predefined for particular handling. This name is hard-coded into the writer. Ideally, the namespace would be completely open to the user, but a single predefined item that already exists presents no major problems. As further granularity of control is requested, however, more of these names will be needed. Before introducing too much clutter, it might be wise to consider a slightly different approach. A class name can still be used, but perhaps could be chosen for the document using a metadata field, according to the user's liking. Any class not named in the metadata field will be treated as an ordinary class.

(A more drastic solution is simply to provide a metadata field that lists all the identifiers for selected chapters.)

brainchild0 on 10 Oct 2019

I think it makes sense to implement the unlisted class, and document that it only works in conjunction with unnumbered.

jgm on 10 Oct 2019

👍1

I think it makes sense to implement the unlisted class, and document that it only works in conjunction with unnumbered.

Depending on whether the behavior of LaTeX and the document classes is, in the particular combination, to fail, or simply to generate a document without the desired features, it might make sense for Pandoc in all cases to try the combination exactly as given by the user. The documentation might then express this limitation, which is really just a feature of available classes, rather than of LaTeX itself. In principle someone could write a more flexible document class, which optimally would then work with the specific combination from MarkDown source.

brainchild0 on 11 Oct 2019

@jgm would you consider to implement this feature for epub/epub3 as well? running pandoc 2.10 here, installed from cabal. html output produces the desired unlisted toc elements, but epub still lists them. for now I abuse CSS for this (#toc-li-114 { display: none; }), but it doesnt affect the epub-toc, only the additional one and it needs manual selection of the generated id for ol.toc li (luckily the same for epub2 and 3) and addition to custom stylesheet besides still creating an element (# Contributors {epub:type=copyright-page .unlisted .unnumbered }) I didnt want to have in the first place. or is there another way/template to do this myself?