Pandoc: --metadata-file with non-markdown contents?

Created on 18 Nov 2019  路  5Comments  路  Source: jgm/pandoc

Currently string fields in --metadata-file are interpreted as pandoc markdown, no matter what input format is specified using --from.

But many JSON files found in the wild have other formats (esp. HTML), and it would be very handy to be able to use them with --metadata-file.

Is there a compelling reason for using pandoc markdown no matter what the format of the main document? @mb21

Most helpful comment

Of course this would mean we'd have to generalize the metadata reading code, which is currently pretty tightly bound up with the Markdown parser. I think this would be a good idea in any case (the code could be split out into a separate module and parameterized with a reader).

All 5 comments

Nope, we decided to start with markdown because currently the parsing of the YAML is quite intertwined with the markdown reader, but conceptually that was always just the first step. See https://github.com/jgm/pandoc/issues/1960#issuecomment-376457466

It would be easy enough to add something like --metadata-file-format, I suppose, though this is a bit ugly. It would be nice just always to use the source format in parsing string fields in the metadata file. Is there a reason not to? Here are the reasons I can think of:

  1. Breaks some existing workflows that assume current behavior.
  2. Some formats don't really lend themselves to inclusion in YAML metadata, e.g. docx.

It would be nice just always to use the source format in parsing string fields in the metadata file.

Yes, I agree.

I wouldn't be too concerned about 1), but there probably should be a way to override the format in case of 2). Or do we want to maintain a blacklist of formats (like docx) where we deem it doesn't make sense to use it for metadata formatting?

We could assume markdown when the input format is docx or odt, and otherwise go with the input format (or HTML4 for epub 2, or HTML5 for epub 3).

Another possibility (perhaps in addition to the above) would be to allow the default to be overridden with a field format_: markdown or whatever. Pandoc could scan the metadata file for this field before parsing.

Of course this would mean we'd have to generalize the metadata reading code, which is currently pretty tightly bound up with the Markdown parser. I think this would be a good idea in any case (the code could be split out into a separate module and parameterized with a reader).

Was this page helpful?
0 / 5 - 0 ratings

Related issues

ERnsTL picture ERnsTL  路  58Comments

elliottslaughter picture elliottslaughter  路  44Comments

matthijskooijman picture matthijskooijman  路  54Comments

brainchild0 picture brainchild0  路  66Comments

GeraldLoeffler picture GeraldLoeffler  路  143Comments