Pandoc: idea: include files (and csv tables)

Created on 29 Jun 2012  Ā·  53Comments  Ā·  Source: jgm/pandoc

As far as I understand pandoc can process several files
in one way only. You have to list them in the command line. There is
a solution to simulate include files with scripting. It's indicated
in the pandoc's official guide.

Markdown is a tiny language. We should keep it small. So here is an idea
of how to simulate latex's input command without extending Markdown syntax.
We can overload include image construction. If a file has an image extension,
than it's treated as an image, but if it's .txt, it can be treated as Markdown:

![Show me if there is no such a file](subfile.txt)

I've come to this idea while thinking about long tables.
Imagine that someone is writing a research report. There are long
tables produced by an algorithm. Tables are saved in some
standard format for tables, for example CSV. And then user can write

![So it goes](table.csv)
high enhancement Markdown reader

Most helpful comment

Full support for markdown inclusion would probably have to be baked into pandoc.
If you just read the file and parse it as markdown using a filter, then e.g. you can't have reference links or footnotes that are only defined in the included file.

So this is something I'd consider. We already have most of the code we'd need for this kind of thing, since we support includes in reST.

All 53 comments

Not sure if this would fit in Pandoc's goals as being a "universal document converter", but you can do this easily with some wrapper around Pandoc. This would of course require some technical skills from you, but you would gain much more then the above suggested features.

There are a bunch of programming tools in Pandoc extra, from which I know (and develop) pander. You could easily write a simple brew file which would compile a list of all images in a directory and create a link for those and also reading csv files and printing would be not problematic too (e.g. putting a simple read.table(foo) in a brew chunk between <%=...%> tags.

I hope you would find this useful.

anton-k, I like the idea; had something similar in mind, when writing
a technical report recently. File-extension-dependent inclusion would be a nice
pandoc-extension for markdown.

Another idea related to that:

It would be great to have more general support for literate programming.

Currently I use the R-knitr-package
for mixing programming languages in technical reports; as an example see
https://github.com/yihui/knitr/blob/master/inst/examples/knitr-lang.Rmd.

Using pandoc directly with a file format, say lmd -- for literate markdown,
would facilitate the workflow considerably.

In that sense knitr works pretty well: You could include different languages
eg

``` {r test-r, engine='R'}
set.seed(123)
rnorm(5)

``````

Unfortunately haskell is currently not included like eg

``````
``` {engine='ghc'}
[x^2|x <- [1..10], x > 3 ]

``````

With that in mind writing tutorials with REPLs like ghci, irb, R would be
more pleasant.

This could be done easily using the techniques described in the scripting documentation.

I opened a seperate issure #656 for it.

Hi. I'm also looking for this feature :)
I found Marked.app has a nice extension:

  <<[Code title](folder/filename)

Same syntax is also supported by Leanpub system. I think some "include feature" is a must if you write large text in markdown.
Now I'm using Marked.app with Custom Markdown processor configured for pandoc, so I can include files that include files and so on. Very useful if you are writing a little book with code source samples :). But is a bit tedious need printing to PDF from the Marked.app. Having this feature in pandoc will allow for command line automation :)

@jcangas -> it looks like ThoughtBot has done this before, based on looking at the raw markdown files from their Backbone on Rails book.

@jasonm was the person who worked on the project.

@thewatts, Thanks for the clue. It is very easy to follow the "do your self" way, of course: I have a bit of Ruby that does the magic. But I see value in it as a standard feature with a standard syntax a no need for externals tools...

Found what they use - they have a rakefile that will take and parse the <<[Code title](folder/filename) code, and then add it into the main file.

There is gpp in Pandoc Extras mentionned by @daroczig that can be used to include file directly (gpp is a gcc-like preprocessor) and much more. It provides a syntax to preprocess files and execute commands and the file inclusion could be achieved through #include I'm currently working on a python wrapper aiming at using gpp to preprocess special commands in a markdown file before providing it to pandoc (things like file inclusion, code inclusion, color, underline, etc).
I will soon put it on github and if people are interested in such a wrapper I will add some more info about it.

@thewatts I also have a rake file doing the same thing :). Well, mine is recursive also. I copy here so it can help others

# yields every line. Assume root_dir & file are Pathname objects
def merge_mdown_includes(root_dir, file, &block)
  file.each_line do |line|
    if line =~/(.*)<<\[(.*)\]$/
      incl_file = root_dir + $2
      yield $1 if block
      merge_mdown_includes(root_dir, incl_file, &block)
    else
      yield line if block
    end
  end
end

# hin about use previous routine:
merge_mdown_includes(root_dir, file) do |line|
   output_file.puts line
end

Instead of adding another preprocessing syntax on top of Pandoc Markdown I use the following syntax to include files:

`filename.md`{.include}

one could also extend this to:

~~~ {.include}
filename.md
~~~

This way the inclusion syntax can act on the abstract syntax tree (AST) of a Pandoc document - one can get the same result from HTML like this (HTML -> Markdown -> Markdown with inclusions -> Target format):

<code class="include">filename</code>

Here is a small hack in form of a Perl script that I use by now.

while(<>) {
    if (/^`([^`]+)`\{\.include\}\s*$/) {
        if (-e $1 && open my $fh, '<', $1) {
            local $/;
            print <$fh>;
            close $fh;
        } else {
            print STDERR "failed to include file $1\n";
        }
    } else {
        print $_;
    }
}

The final implementation should work on the AST as well to allow inclusion inside other elements, for instance:

* `longlistitem.md`{.include}

@nichtich Nice idea; converted to python and combined with Makefile:

# Makefile fragment

%.pdf : %.md
    cat $^ | ./include.py | pandoc -o $@
#!/usr/bin/env python

import re
import sys                                                                                                     
include = re.compile("`([^`]+)`\{.include}")
for line in sys.stdin:
    if include.search(line):
        input_file = include.search(line).groups()[0]
        file_contents = open(input_file, "rb").read()
        line = include.sub(line, file_contents)
    sys.stdout.write(line)

See also this discussion on the mailing list.

And here's my take on a Haskell filter that includes CSV's as tables: pandoc-placetable

File extension dependent overloading of the image inclusion is a great idea!
Would love to see it implemented!

I've written a basic Pandoc filter in Haskell that could include referenced Markdown files recursively, meaning the nested includes are also included. (Although only 2 levels deep, for now.) Take a look:

https://github.com/steindani/pandoc-include

To include one or multiple files use the following syntax:

markdown ```include chapter1.md chapter2.md #dontinclude.md ```

Hi, @mpickering, may I ask what's the status on this? Are there any branch that has work-in-progress (to see if anything to help)?

I think there are a few different categories of file extensions that can be included:

  1. those file extensions associated with pandoc readers: this allow including multiple different sources in the markdown source. e.g. ![](file.docx) would actually use the pandoc docx reader to read it into AST and include at the position.
  2. RawInline: some might not want the pandoc readers to read it though. So e.g. ![](file.tex){RawInline="true"}, ![](file.html){RawInline="true"}, will include the raw TeX and raw HTML at the position.
  3. CodeBlock: ![](file.md){CodeBlock ="true"}, ![](file.py){CodeBlock="true} would include the files as a code-block.
  4. csv: e.g. pandoc-placetable
  5. media: audio/videos files.

Is this feature still under development? This would allow a complete replacement most static site generators..

I don't think anybody is working on this. My personal opinion is that this is out of scope, as the increase in complexity seems not worth it.

A solution for CSV exists with pandoc-placetable. If one does not want to install additional binaries, pandoc 2 makes it easy achieve most of what was suggested here via lua filters. E.g., the below filter would replace an figure with its Markdown content if an image has class "markdown". This is fully portable and doesn't require extra software other than pandoc.

function Para (elem)
  if #elem.content == 1 and elem.content[1].t == "Image" then
    local img = elem.content[1]
    if img.classes[1] == "markdown" then
      local f = io.open(img.src, 'r')
      local blocks = pandoc.read(f:read('*a')).blocks
      f:close()
      return blocks
    end
  end
end

Is this feature still under development?

Do you mean include files or table? Apparently 2 different (related) issues are mentioned here.

I think the reason why it's been taking so long is mainly not because of the difficulty/feasibility to include files, but about the question of if this should be included in pandoc, and how it should behaves (e.g. recursive?).

e.g. @jgm has an pandoc-include example in the tutorial in writing pandoc filters, and has been distributed in pandoc-include: Include other Markdown files. And there's also panflute filter doing so. So does it needed to be done in pandoc?

This would allow a complete replacement most static site generators..

Having a better template system is more important than having native pandoc-include in this aspect. I remember there's an issue about this. try searching it and see if you have any comments/suggestions there.

pandoc-include is built against pandoc 1.19 , so the newer syntax is not parsed correctly..
eg. Div classes via ::::{.class} ::::

Currently my workaround is to use paru-insert.rb but it's really rather slow, pushing my build times up by 10s just to include 3 partials..

Try filing an issue over there (or try a pull request).

Did you uses other pandoc filters? If you already have filters in python using panflute, panflute has a way to run all panflute filters in 1 pass to avoid multiple to and from JSON conversion. Often the time the reason for a slow filter is this unnecessary conversion.

You can also try the now ā€œnativeā€ lua filter. It’s fast exactly because it avoided this issue.

You can also use some proprocessor to handle include separately. e.g. this is exactly how Multimarkdown handles ā€œtransclusionā€ internally. If I remember correctly, there’s a mode for the multimakrdown cli to only process the transclusion without parsing the markdown. If true, you can use multimarkdown as a preprocessor and passes it to pandoc to parse it.

I personally setup makefiles to handle stuff like this. So far it is ok. But there are times I’d want to have a template system on top of it to eases some routines. One advantage of using make is to use the -j option to build things in parallel (if you write the dependencies correctly). This would dramatically speed up a build. (I once speed up a pandoc project from 2 min+ to ~20 sec. using -j 8 on a 4-core CPU.)

+++ Rohit Goswami [Nov 09 17 22:36 ]:

Is this feature still under development? This would allow a complete
replacement most static site generators..

Well, we have implemented the .. csv-table directory for RST.

Since he mentioned site generators, I’d imagine he’s talking about the include feature rather than CSV.

By the way, since pandoc 2.0 allow arbitrary raw blocks (of other format), probably this can be used to put a csv table inlined as a raw rst block.

Well, we've also implemented includes in RST!

+++ ickc [Nov 10 17 08:22 ]:

Since he mentioned site generators, I’d imagine he’s talking about the
include feature rather than CSV.

By the way, since pandoc 2.0 allow arbitrary raw blocks (of other
format), probably this can be used to put a csv table inlined as a raw
rst block.

—
You are receiving this because you were mentioned.
Reply to this email directly, [1]view it on GitHub, or [2]mute the
thread.

References

  1. https://github.com/jgm/pandoc/issues/553#issuecomment-343406918
  2. https://github.com/notifications/unsubscribe-auth/AAAL5HhA1J6tT1DvQPhA-CdH2aQeICNHks5s1AfKgaJpZM4ADOQS

I ended up switching to panflutes and replacing all paru filters, this worked out rather well, so the build runs in around 4-5 sec, a 50% speed increase...

Would a minimalist include syntax using attributed spans be possible, rather than overloading the image or code block syntax? For example

[some-file.md]{.include}

Then have a lua filter (similar to one on this thread: https://groups.google.com/forum/#!topic/pandoc-discuss/FMmb1mf2lHU) to do the work?

Span contents is parsed as markdown, so paths with backslash (e.g., \users\steve\example.md) or spaces (My Documents/foo.md) would be more difficult to input correctly. An inline code based method could work better, but I'm not sure if it's desirable.

`some-file.md`{.include}

We're not the first ones thinking about this. At least the following two are already in use:

Multimarkdown File Transclusion:

{{some_other_file.txt}}

iA Writer Content Blocks proposal:

/Lorem Ipsum.txt

The corresponding talk.commonmark thread

I have another idea on the syntax to be used: to use the new raw_attribute extension, and add an optional attribute include to it.

```{=markdown include='path/to.md'}
```

``{=gfm include='path/to.md'}

The reason for this suggestion is that there are a couple of suggestions on overloading different elements (exluding custom syntax for the momment), but all of them has some flaws on its expected behavior:

  1. image link: in most output formats, it output a reference to the path of the image, not inlining the image itself.

  2. code block / inline code: it suggested what it includes is code (verbatim)

  3. div/span: it suggestes you should include the text in a div/span

(Of course the raw_attribute extension uses code block / inline code, though.)

This has the added benefits of the ability to specified the format of the included documents.

To push this idea further, the following now has an obvious meaning that coincide on what is expected in (2):

```{include='path/to.md'}
```

``{include='path/to.md'}

They includes some text, without a format, hence verbatim.

@tarleb Ah! Yes I can see how that would be problematical...
Having said that I don't have a pressing need for file inclusion myself, but I can see how it might be a useful feature.

Reading through this issue again, there are indeed a couple of different things going on. But I can see this working out:

  1. Include markdown:

    The recursive implementation in the markdown reader shouldn't be too difficult, I think. The above would parse and include the markdown in the same scope, while this would not:

    {file-scope=true}

  2. Include some file as a code block with syntax highlighting (could be a .md file, so needs different syntax):

    {.cpp include=foo.cpp}

    Possible options, see pandoc-include-code.

  3. Include csv as a table:

    table caption{header=yes}

    {include-as=csv}

    Possible options, see pandoc-placetable (where I unfortunately implemented a different syntax).

  4. Include audio/video (already implemented since pandoc 2.7.3):

(Thanks to ickc for teasing out a few cases already, but note that I don't think the need for including html or tex files into markdown files is great enough... and you can still do it in a lua-filter.)

@mb21 i assume, all of this could be handled by some lua filters? however, i'd love to see this as first-class citizens in pandoc markdown :)

also something like

``` {.cpp include=foo.cpp lines=1-10}
```

or even

``` {.cpp include=foo.cpp lines=1-}
```

or

``` {.cpp include=foo.cpp lines=-10}
```

would be very helpful :)

@cagix
This Lua filter may work for you (against your wish, for now)

@K4zuki nice :) thanks for the link ...

For including other files, you can use Markdown Preview Enhanced. It supports including other markdown files using:

@import "File.md"

I use Markdown Preview Enhanced to generate latex/pdf documents with pandoc. While Markdown Preview Enhanced is an extension for VC Code and Atom, the underlaying parser is in a separate library, mume. So it is quite easy to write a script using node.js to generate your files.

If this ever becomes an official feature of Pandoc, I would recommend against using the image syntax for includes as it wouldn't have a graceful fallback for inclusion-unaware processors, but would instead display as broken images, at least in html.

With includes based on the link syntax the fallback would be a link to (hopefully) the resource that would otherwise have been included. Possibly including a helpful label. (And maybe with attributes visible as plain text.)

With a non-markdown markup like {{some include.md}} the fallback would be to pass through the tag as plain text. (Just like pandoc currently does when the already established MMD transcludes are encountered.)

To me both of these options are preferable over broken images.

(Of course none of the fallbacks are acceptable if you e.g. are publishing a book, but in other contexts like in a wiki they might be.)

@davidsvantesson See https://github.com/gabyx/TechnicalMarkdown
where I work with MPE and Pandoc

There is now an include-files.lua filter in the pandoc Lua filters collection: https://github.com/pandoc/lua-filters/tree/master/include-files. It is built around specially marked code blocks.

@tarleb This is awesome!

No, but it wouldn't be too hard to make it recursive. Please submit a feature request to the lua-filters repo.

@tarleb Do you also know if your filter: https://github.com/pandoc/lua-filters/issues/101
uses the same settings/extensions that are given on the command line in pandoc.read

@gabyx please stop asking questions in this thread. Everybody following this tread is getting a notification every time anyone posts here. We should be respectful of the time of those people and move discussions to pandoc/lua-filters.

@daniels It’s true that image syntax hasn’t the best fallback. But the problem with link syntax is: how do you differentiate between a link to a file and an include of that file? The disadvantage of some special syntax like {{file.md}} is that it’s yet another thing to parse, potentially conflicting with other markdown syntax and making parsers more complicated.

I think the include-files filter as it is now with

```{.include format=xxxx}
file.md
-```

is really cool because its a code block and it is basically something similar as a "command launching and replacing" filter:

```{ .command="python -c" format=markdown}
print("### Hello world")
-```

However the .include filter above is recursive.

I had another funny idea (?), just some thoughts:
Could we maybe collocate these two things into a recursive filter .command which executes the command and parses the result with walk_block and if recursive=true will recurse:

So the most general form of this would be:

command -> pandoc parse -> recurse?

Here an esotheric example:

# Test
```{ .command="python" args="-c" format=markdown recursive=true}
print("# Example \n```{.command=python args="-c" format=markdown}\nprint("3")\n```")
\```

and similar for the file transclusion:

```{ .command=transclude format=markdown recursive=true}
a/FileA.md
b/FileB.md
-```

The pandoc-command-transclusion.lua filter will then act a bit optimized for .command=transclude where it does the same thing as the include-files.lua so far. (if it does not make sense to first read all include files into one buffer and then parse with pandoc and then recurse, but to parse file by file also to track cylic includes...)
Of course the recursive flag would be defaulted to true.

Full support for markdown inclusion would probably have to be baked into pandoc.
If you just read the file and parse it as markdown using a filter, then e.g. you can't have reference links or footnotes that are only defined in the included file.

So this is something I'd consider. We already have most of the code we'd need for this kind of thing, since we support includes in reST.

Thanks @jgm:
Ehm, why is there a problem with footnotes? The footnote are parsed right, but then wrongly handled?

Because if it is processed by a filter, the filter only see the markdown of the include file.

I don't know if it would help if the filter include the markdown as raw_markdown rather than parsing it as pandoc AST.

edit: I mean first pass include as raw markdown to markdown then second pass from markdown to target.

@ickc Thanks for the explanation. I mean the following sequential filters

  • include-files
  • pandoc-citeproc
  • pandoc-crossref

work well. I thought footnotes and other stuff is all processsed when the whole AST is complete (all filters have run) ? is this assumption wrong?

I thought footnotes and other stuff is all processsed when the whole AST is complete (all filters have run) ? is this assumption wrong?

Yes, it is. Footnotes are included in the AST just as Note + the contents. The AST doesn't contain separate elements for the footnote itself and the footnote reference. So resolution of footnotes has to occur during construction of the AST.

@jgm : If I understood correctly, as long as one keeps footnotes in the same file such that include-files.lua can resolve it during read(...) everything is fine and include-files.lua works so far pretty well. However there are problems as you stated especially when there should filters be called as well in include-files.lua (which is not supported) ...

However is there any consense or update on the matter "transclusion as first party feature in pandoc"? Which might be the best solution to tackle this and indeed extremely important feature in my point of view :-)

Was this page helpful?
0 / 5 - 0 ratings