Pandoc: wishlist: server-side KaTeX

Created on 30 Aug 2020 · 9Comments · Source: jgm/pandoc

KaTeX can be used to render math to HTML in advance. You need node.js and the katex command line tool at render time, but then no JavaScript needs to be executed in the browser (a CSS file is still necessary). I've written a proof-of-concept Lua filter that implements this mode in Pandoc and it works reasonably well:

-- Pandoc filter: if we are generating HTML, replace each Math element
-- with the result of running KaTeX on its contents.  This requires
-- the command-line "katex" program to be installed at rendering time,
-- but does not require any JavaScript to be executed on the reader's
-- browser.  (The built-in --katex mode makes the opposite tradeoff.)
if FORMAT:match 'html' then
   have_math = false

   function Math(elem)
      local function trim(s)
         return s:gsub("^%s+", ""):gsub("%s+$", "")
      end

      have_math = true

      local katex_args = {'--no-throw-on-error'}
      if elem.mathtype == 'DisplayMath' then
         table.insert(katex_args, '--display-mode')
      end

      return pandoc.RawInline(
         FORMAT, trim(pandoc.pipe("katex", katex_args, trim(elem.text))))
   end

   function Meta(data)
      -- The "has_math" property will be absent when there is no math
      -- and the string "true" when there is math.
      if have_math then
        data.has_math = "true"
      end
      return data
   end
end

Due to the need to start a fairly heavyweight program (the node.js interpreter) for every math element, though, it's quite slow. I'm going to experiment with using a JSON filter written in node.js instead, but I wonder whether native support for this mode in Pandoc might be even faster -- it could fork off the katex utility the first time it ran into a math element, and pipe it the math markup as it arrives. Native support would also allow --standalone to know when it should link to the KaTeX CSS instead of the JS.

Source

zackw

Most helpful comment

Another option: set up a server that does the conversions, and use --webtex (pointing at this server) and --self-contained. Then you don't need a filter at all.

jgm on 1 Sep 2020

👍3

All 9 comments

it could fork off the katex utility the first time it ran into a math element, and pipe it the math markup as it arrives

Isn't this something that could, in principle, be done in a filter too?

jgm on 30 Aug 2020

I don’t think this is currently possible from a Lua filter; pandoc.pipe can
only run a subprocess synchronously to completion AFAICT. It’d need to be
more like python’s subprocess.Popen and frankly I’m not sure that would be
worth the effort.

A JSON filter certainly can do this but then the question is whether the
JSON serialization overhead eats all the performance gain from only
invoking the node interpreter once.

zackw on 31 Aug 2020

Doesn't lua have primitives allowing you to open a subprocess and pipe things to it?
I may be missing something, @tarleb can comment more helpfully.

jgm on 31 Aug 2020

There are no built-in primitives, but there are libraries which provide this functionality. E.g., the posix library comes with posix.popen, posix.spawn, posix.sys.wait, etc.

This requires pandoc to be compiled against the system's Lua installation. Distro-packages usually do this, as do the official Docker images. The Lua posix package can conveniently be installed via the distro's package manager.

tarleb on 31 Aug 2020

I'm going to experiment with using a JSON filter written in node.js instead

yes, sounds like that's the way to go here. you'll need node.js anyway to run in, so no particular advantage in using a lua filter here.

btw. see also https://pandoc.org/MANUAL.html#math-rendering-in-html for built-in alternatives

mb21 on 31 Aug 2020

Well, Lua filters should be a bit faster, and more so if there are only a few math elements in a very long document. But I agree that using a node.js filter is probably the better approach here.

tarleb on 31 Aug 2020

Another option: set up a server that does the conversions, and use --webtex (pointing at this server) and --self-contained. Then you don't need a filter at all.

jgm on 1 Sep 2020

👍3

Sounds a lot like this filter!

MyriaCore on 3 Sep 2020

There is also https://github.com/lierdakil/mathjax-pandoc-filter. It's written in TypeScript so it doesn't fork for every equation. Not sure how MathJax compares with KaTeX, though.