Pandoc: no \frontmatter sequence in latex templates

Created on 14 Feb 2019  路  28Comments  路  Source: jgm/pandoc

__Current behavior:__ Latex document classes depend on \frontmatter, \mainmatter, and\backmatter declarations for proper typesetting. For example, a book will normally be paginated with arabic number _1_ representing the fist page of the first numbered chapter, with lowercase Roman numerals for title page, tables, preface, introduction, and so on. Pandoc already understands unnumbered chapters, for example, reading the class tags of chapter titles in MarkDown input, but does not place them in a separate front-matter section of the document.

$ pandoc -D latex | grep 'frontmatter'
(no output)

__Desired behavior:__ Partition document into segments designated in LaTeX output with appropriate control sequences \frontmatter, \mainmatter, and\backmatter.

__Effected versions:__ Pandoc 2.6

LaTeX more-discussion-needed templates

All 28 comments

See related #4823

I don't think I'd want to do this automatically in the way you suggest, but making it explicit with a class as in #4823 might work.

Ideally, conversion would occur such that, in most cases, a simple invocation (i.e. default values for options) will produce a correct result, with non-default parameter values available for customization but not usually needed for correction .

From a standpoint of established typographical conventions, it is a serious transgression when the
title page has the same pagination sequence as a book body.

But it is not much better when front matter does the same. Most document authors will benefit from behavior within Pandoc that simply works according these observations, rather than by default creating output that is, practically speaking, incorrect.

I agree completely that the title page and table of
contents should be in front matter -- in a book.

Of course, the default for pandoc is an article.

We could look into making more things work
automatically when top-level-section=chapters
is selected.

My point was only that using unnumbered sections as
a guide to front matter/back matter isn't fully
reliable.

Yes, but note LaTeX design principles. It is the job of the document author (e.g. Pandoc) to mark correctly the logical structure of the document. It is the job of the document class to arrange the logical elements following a coherent typographic plan. If the article class were to become confused by the \*matter tags, then I would support the reluctance to add them. Fortunately, and by design, it is not, and fortunately, and by design, a LaTeX document with correct logical control sequences will format correctly in all document classes (with certain exceptions, obviously, unrelated to the current question), _but without such sequences, will not format correctly in certain classes_.

So I can't see any defense for not putting title page, TOC, etc in a segment designated as front matter.

Regarding inferring about front, main, or back matter based on _unnumbered_ tagging, would you not think that in nearly every case an unnumbered chapter not preceded by any numbered chapter is appropriate for front matter, and one not followed by any numbered chapter for back matter?

You do know that you can simply insert \frontmatter, \backmatter, and so on in the markdown document, where you want them, and they'll be passed through to the latex? [EDIT: of course this won't work for the toc if it's handled in the template. I will consider your proposal. I hadn't known that the \*matter commands could be used in article without bad effects.]

No I did not, and it is useful, but a converting platform that converts Markdown + LaTeX source to LaTeX output is still less useful than one that converts Markdown source to LaTeX output, because the latter is truer to the actual concept of conversion, compared to mere copying.

If someone sent a manuscript to a publisher, then the human who formats the book typographically would realize without being told that the early chapters titled "Introduction" and "Preface" were front matter, and similarly that appendices were back matter.

Pandoc helps its users best if it behaves like a publisher, making the best inferences from the input set, unless explicitly told otherwise.

If the article class were to become confused by the \*matter tags, then I would support the reluctance to add them. Fortunately, and by design, it is not, and fortunately, and by design, a LaTeX document with correct logical control sequences will format correctly in all document classes (with certain exceptions, obviously, unrelated to the current question)

Actually, this is false. \frontmatter does not work with the article documentclass:

\documentclass{article}
\begin{document}
\frontmatter
\tableofcontents 
\mainmatter
\section{Introduction}
Some text
\end{document}

yields

! Undefined control sequence.
l.3 \frontmatter

Well, thank you for noting this correction. I was sure that the standard classes were all mutually compatible in terms of commands for describing logical document structure, supporting the case when the one is switched to the other without any effort.

Obviously, we would not break article support. Currently, for books, the main options are 1) publishing improperly formatted book, or 2) placing LaTeX command in the MarkDown. The latter has the following problems:

  1. Still breaks articles, if the input document contains the book-class commands.
  2. Requires that input document be planned in terms of output type and style. This is especially infeasible for cases where MarkDown is the intermediary format, generated automatically from another source).
  3. Does not support the simple use case of properly formatted LaTeX output, for a book, from MarkDown input that expresses the content and structure of the document without fuss over formatting.

One possibility is a switch for generating book segments, much like many of the other switches already supported (e.g. for generating TOC). If used improperly, of course, this switch would still break output.

Another possibility is inserting the commands conditionally based on an explicit test for document class. Obviously this is too ugly and fragile to consider seriously.

Another possibility, which I would tend to prefer, is calling the commands conditionally, through a if-statement in the LaTex, based on whether they are defined (by the document class). A clean way to achieve this effect is to define the commands, following the document class declaration, as null commands, unless they are previously defined (by the document class). Then the generated document body is clean-looking, and correct function is automatic and insulated from user decisions and error.

The question which chapters include as front matter or back matter is still open. I was thinking about your preference for defining a separate document class compared to my suggestion for inferring what is in almost every case, I have argued, the desired effect. I thought that you might consider changing the default behavior following my suggestion, but then adding an explicit class to reverse the behavior (e.g. .unnumberedmain, ...or something) when desired.

Edit: Numbering and placement in front/main/back matter are separate issues in the general case, so you could just add a class called .mainmatter, which suppresses placement out of main matter based on the default rules relating to numbering and sequencing (i.e. trying to find maximal sequences of unnumbered chapters on the head and tail ends of the entire chapter sequence). Thus, chapters are processed according to whether they are in the .unnumbered class, and if so, whether they are also in the .mainmatter class.

The present fix doesn't do everything you want -- it still leaves everything in the body in the mainmatter.
Dividing up the body would raise more complications, but this is a good start.
With the present framework, you can put things like acknowledgements in metadata and add variables for them to the template under the frontmatter.

Does this work only with the builtin book document class, not custom classes that mimick the book semantics (e.g. Koma-Script book class)?

I think some of the suggestions I have made represent a much more comprehensive and robust solution. I hope you would consider some variation of them for the future. I feel that they could be implemented to facilitate the needs of those writing books without introducing any regressions or instability.

Thank you for beginning to address the concerns with the recent commit.

Edit: Is book-class a user-defined field?

The variable is automatically set when the document uses the book or memoir class.

Just to clarify: The book class detection was introduced in https://github.com/jgm/pandoc/commit/b76ba44c52563806f89fbdb0d825e19c32dc9d36. The book-class variable is set by pandoc when one of these classes is used ["memoir","book","report","scrreprt","scrbook"].

Thank you. I have been using scrbook, and have picked up the benefit of title and TOC having a separate sequence of page numbers from the body.

The programmable environment of the LaTeX environment makes it feasible to achieve these effects in a very robust and reliable way, without any need to explicitly test the type of document class against a prescribed list.

Even though the issue was closed, I hope that the discussion points previously laid down can be revisited in the near future.

Mauro Bieg notifications@github.com writes:

The variable is automatically set when the document uses book or memoir class.

Or when you set --top-level-division=chapter (or part).

looks like \frontmatter broke the report class:

 Error producing PDF.
 ! Undefined control sequence.
 l.127 \frontmatter

I've got this after pandoc upgrade for the document with the report class.

The present fix doesn't do everything you want -- it still leaves everything in the body in the mainmatter.
Dividing up the body would raise more complications, but this is a good start.
With the present framework, you can put things like acknowledgements in metadata and add variables for them to the template under the frontmatter.

With due respect to the complexity of this issue, when doing a Markdown to PDF conversion, copying content out into variables does defeat the purpose of a programmatic repurposing/conversion (though it may be that the original md->epub might also be organized to have the copyright and dedication pages as variables in a metadata block, I'll look at this).

However, if \frontmatter \mainmatter \backmatter and \appendix were to be referenced in the pandoc conversion process, say by using variables for the particular filenames (--frontmatter=fm.md, etc.), that would provide a neat resolution.

See also: https://tex.stackexchange.com/questions/20538/what-is-the-right-order-when-using-frontmatter-tableofcontents-mainmatter

Can't most of this be solved with the following?:

\providecommand\frontmatter{}

And similarly for the other partitioning commands.

On a related vein, if there is a need to classify the current document class, it can be done according to its traits, in order to avoid a lookup in a static table. E.g. is the \fontmatter command defined?

And if, for some reason, we're not detecting frontmatter and backmatter based on contiguous sequences of unnumbered chapters on either end of the document, then can we at least try to find a solution that preserves the symbolic and logical features of the source document? For example, tagging headers with something like endfrontmatter and startbackmatter at the appropriate points in the document? Or referencing header identifiers in the metadata: endfrontmatter: introduction?

Ok, I've got most of what I need working around this issue. Using pandoc markdown I can indeed get \frontmatter, \mainmatter, etc. into the latex processor. The only issue I have at this point is trying to exclude the chapters in \frontmatter (Copyright, Dedication) from being in the Table of Contents. All ways of exclusion appear use latex syntax that is not available in Markdown. I'm looking for a global way of supressing \frontmatter chapters from the ToC.

I'm looking for a global way of supressing \frontmatter chapters from the ToC.

Is that not the default for book classes?

Edit: Maybe it depends on where the TOC is placed. So the Pandoc template puts it before the frontmatter, whereas most would put it after.

I'm actually using scrbook, and have placed the ToC after the Copyright and Dedication page, and before the \mainmatter and Introduction chapter. The ToC is generated on its own page in the correct place, and the \frontmatter and \mainmatter pagination is correct. However, I want to suppress the Copyright and Dedication pages from the ToC. All content is in Markdown (with the exception of the \frontmatter, \tableofcontents \mainmatter \backmatter \glossary tags that are being passed through).

@jeffmcneill you can just use this markdown # Some Heading {-} to exclude a chapter from the table of contents.

@Wandmalfarbe Sorry but that did not work for me.

@Wandmalfarbe Adding {-} (which is syntactical sugar for {.unnumbered}) to headings just keeps them from being counted as a numbered (chapter|section), it does not exclude it from the ToC (this might vary by output format, but at least this is true for TeX formats).

Thanks @alerque, I've just found that reference. It seems that numbering and inclusion in the ToC are definitely distinct in Latex. This issue may be related:

There is a python script that provides this functionality of filtering some unlisted sections from the ToC:

@jeffmcneill With the recent commit 68b09a6d81b24b928b1629ecb3061a51a5ce2352 for #1762, assuming you are willing to try running from a development branch, can you get what need entirely from portable MD source, or is there still a dependence on embedded LaTeX?

@brainchild0 Yes, I think I could do everything I need without embedded LaTeX, but being able to have \frontmatter, etc. pass through is an advantage, and allows for changing the pagination from roman to arabic numerals. That isn't something possible without embedded LaTeX. I understand that Markdown is more limited, and re-inventing LaTeX based on Markdown class manipulation is not really feasible. Getting control over listing/unlisting and numbering/unnumbered is fine, and really are the only issues I've ever had that needed some work (both in Epub and PDF).

p.s. the Nightly Build has been failing for the last four days, so I have not yet been able to test out the latest commit.

That isn't something possible without embedded LaTeX.

Is it not easily possible simply by extending the existing metaphors, in the same spirit as the recent addition of the unlisted tag?

I'm not sure how the metaphors are so different for one versus the other.

Was this page helpful?
0 / 5 - 0 ratings

Related issues

brainchild0 picture brainchild0  路  66Comments

jgm picture jgm  路  117Comments

stepht picture stepht  路  54Comments

ERnsTL picture ERnsTL  路  58Comments

kevinushey picture kevinushey  路  79Comments