Pandoc: Feed LaTeX directly to the typesetter

Created on 1 Apr 2018 · 8Comments · Source: jgm/pandoc

Apparently some people picked up a habit to just process any document to PDF using pandoc. Of course, the results are horrible if you try to typeset a LaTeX input file using pandoc

pandoc test.tex -o test.pdf

First pandoc parses the LaTeX into its internal structure, just to emit LaTeX again and call the typesetter. In this process a lot of features in the document obviously get lost, because pandoc is not a TeX engine.

Maybe it would be better to simply feed LaTeX through to the typesetter as is, without even parsing it.

out of scope?

Source

hmenke

Most helpful comment

But then you couldn’t use templates, filters etc. if you wanted to.

I guess the people will just have to learn what to use pandoc for and what not...

mb21 on 1 Apr 2018

👍2

All 8 comments

But then you couldn’t use templates, filters etc. if you wanted to.

I guess the people will just have to learn what to use pandoc for and what not...

mb21 on 1 Apr 2018

👍2

I agree, we should keep things uniform and parse the latex.
If people use pandoc for a purpose it's not designed for, they'll
get bad results. But that's on them.

Note: If they turn on --verbose, they'll see a long list of warnings
about content that has been skipped. That would probably be sufficient
to tell them that the results will not be good. In
6d862ff9549445ee544d43edbccec439bed3fde6 we downgraded this from
WARNING to INFO status, so it's only displayed with --verbose.
We might reconsider this, but I think the downgrading was a good
idea in most other cases.

Another thing we might consider is issuing a special warning if the
input format is tex and the output format is HTML (and the pdf-engine is
a latex engine).

jgm on 1 Apr 2018

For a more "Swiss Army Knife" like behavior, maybe the intermediate steps from .tex to .pdf could be specified as command line options.

holtzermann17 on 2 Apr 2018

Can you please reenable the warning or make pandoc complain in any way, because more questions pop up on TeX.SX? https://tex.stackexchange.com/questions/425103/pandoc-cannot-load-packages-with-usepackage

I think it would in general be a good idea to display a warning if the user requests an “identity” transformation through pandoc, i.e. if input format = output format. Using for example

pandoc -s -i in.html -o out.html

displays no warnings, but with in.html

<html>
  <body>
    <p>A paragraph</p>
    <unrecognizedtag>
      bla bla bla
    </unrecognizedtag>
  </body>
</html>

yields out.html

<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
<html xmlns="http://www.w3.org/1999/xhtml">
<head>
  <meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
  <meta http-equiv="Content-Style-Type" content="text/css" />
  <meta name="generator" content="pandoc" />
  <title></title>
  <style type="text/css">code{white-space: pre;}</style>
</head>
<body>
<p>A paragraph</p>
<p>bla bla bla</p>
</body>
</html>

which is clearly not the identity.

hmenke on 6 Apr 2018

I posted the question on TeX.SE that @hmenke linked in the last comment, and I found this thread because he commented (thankfully!) on my post with a link here:

Stop using pandoc as a LaTeX engine! github.com/jgm/pandoc/issues/4516 Seriously, where did you get the idea to do that?

Well, the homepage of pandoc uses the term "swiss army knife" and it happens to do LaTeX -> PDF conversion, according to the manual. I've spent around the order of magnitude of 10 hours trying to figure out how to do this, before finally posting on TeX.SE (the post was written over the course of a few days, so I missed the duplicate that was posted 5 days before mine). Believe me, this includes searching around the web and reading the manual.

What I glossed over was "convert files from one markup format into another" (emphasis mine). I didn't find other evidence that proper LaTeX -> PDF conversion was not supported.

I think it would be helpful if the website made this more clear.

jasonszhao on 7 Apr 2018

@jasonszhao It's all in the last paragraph of the first section in the MANUAL:

Because pandoc’s intermediate representation of a document is less expressive than many of the formats it converts between, one should not expect perfect conversions between every format and every other. Pandoc attempts to preserve the structural elements of a document, but not formatting details such as margin size. And some document elements, such as complex tables, may not fit into pandoc’s simple document model. While conversions from pandoc’s Markdown to all formats aspire to be perfect, conversions from formats more expressive than pandoc’s Markdown can be expected to be lossy.

mb21 on 7 Apr 2018

👍1

Wow. That was enlightening.

I have a bad habit of fishing around in documentation and not reading manual sections from start to finish when I'm stuck, and it frequently burns me like this. After reading that section of the manual, I'm not so convinced about my argument anymore.

jasonszhao on 7 Apr 2018

People frequently use "identity" transformations on purpose, e.g. to reformat markdown or clean up docx. So I'm not sure about the idea of issuing a warning in all such cases. These are cases where people are using pandoc to do what it's intended to do.

jgm on 9 Apr 2018

Was this page helpful?

0 / 5 - 0 ratings