Pandoc: Allow multiline Markdown headers

Created on 10 Jun 2011 · 16Comments · Source: jgm/pandoc

It would be nice if pandoc markdown could allow multiline headers -- that is, multiline in the input, not necessarily in the output (though that could be controlled by a backslash or double-space). This can be an issue with chapter titles, which sometimes have subtitles, and go well beyond 80 characters.

For level 1 and 2 in setext style, the following syntax seems intuitive:

    Long Header Containing a Clever Title for This Section:
    An Overly Wordy Subtitle Explaining What the Clever Title Meant
    ===============================================================

(with an optional backslash after the colon)

Since pandoc breaks from Markdown.pl on requiring a blank line before a header, this wouldn't seem to introduce any new compatibility issues.

I'm not sure if there's an intuitive way in atx headers, though.

Not a huge deal on the writing end, since I can turn off auto-fill-mode for atx headers in emacs automatically, but it seems like it would be a win as far as readability goes.


Google Code Info:
Issue #: 289
Author: [email protected]
Created On: 2011-02-17T15:22:47.000Z
Closed On:

high enhancement more-discussion-needed

Source

jgm

👍6

Most helpful comment

Just signaling interest for this feature, in case it went forgotten. I like to have all my plaintext files 80 char wide, but the not having newlines in headers makes it quite hard to.

Bercio on 21 Sep 2013

👍3

All 16 comments

The attached patch seems to do the job. Of course, I've just been playing around with it in my own use cases, so there well might be unforeseen problems (though it does pass the tests). But, as long as I was playing around with it, I figured I'd pass it on.

One possible change to the above syntax would be to mandate that additional lines in a header be indented four spaces. This might stop accidental long headers. (Though as pandoc works now, the same accidental running together of header lines would also produce unexpected output, since it requires the blank line before.)

Google Code Info:
Author: [email protected]
Created On: 2011-02-18T19:16:28.000Z

jgm on 10 Jun 2011

I'm worried about performance implications. This patch means that pandoc will have to parse every paragraph twice -- first as part of the header parser, and then (when that fails) as part of the paragraph parser.

This could be fixed, perhaps, by integrating the setext header parser with the para parser.

The para parser could do something like this:

Parse one or more inlines that are not followed by a setext border (i.e., newline + row of = or - + blankline). Then, try to parse a setext border. If that succeeds, return a Header, otherwise return a Para.

This might still have performance implications, though, since you'd have to be looking for the setext border as you move through the paragraph. But it might actually improve performance, because we wouldn't have to try the setext header parser on the first line of every block.

You can benchmark after building using

runghc Benchmark.hs markdown

(for which you need the criterion package installed).

Google Code Info:
Author: [email protected]
Created On: 2011-02-18T20:34:29.000Z

jgm on 10 Jun 2011

Ack -- never mind. Slows way down with long docs. I imagine it's busy looking over all multiline elements trying to find out if there's a header at the bottom.

Google Code Info:
Author: [email protected]
Created On: 2011-02-18T20:36:19.000Z

jgm on 10 Jun 2011

Ah, looks like we were getting the same results at the same time. I'll play aroudn with the para parser and see if I can turn out something that behaves properly over a large doc.

Google Code Info:
Author: [email protected]
Created On: 2011-02-18T20:37:48.000Z

jgm on 10 Jun 2011

I've implemented your suggestion, John, and it works without any performance hit. (Actually came out ahead, though within the std. dev.) There is one outstanding tricky problem, though, that I'd love suggestions on.

Right now, pandoc tries to parse sections, and then tables, and then paragraphs. Therefore, if we put setext parsing in the para parser, h2-level headers, with single dashes, are parsed as tables before the para parser can get to them. Any attempt I've made at reordering created more problems than it fixed.

The issue seems to be that the table parser currently accepts a one-column table, but it will never see it, because a one-column table is parsed as level-2 header. Would it be worth it to try to change the table parser so it requires at least two columns, since that's all it should ever see? If so, I could have a go at it and include it as part of this patchset.

Google Code Info:
Author: [email protected]
Created On: 2011-02-19T21:31:05.000Z

jgm on 10 Jun 2011

Or, to be a bit more precise, the current behavior is that you can have a one-column table, provided it starts with a dashed line above the text. I.e, this is a table:

Foo

Bar

and this is two h2-headers:

Foo

Bar

but this behavior isn't built into the table parser. I guess the question is whether the table parser should mandate that one-column tables have to start with a dashed line on top in order to avoid overlap with the setext h2 parser.

Google Code Info:
Author: [email protected]
Created On: 2011-02-20T14:56:43.000Z

jgm on 10 Jun 2011

Okay, I applied the table fix mentioned above in the simplest possible way. Since it's only a question on simple tables with headers, I put a try' and aguard' in the parser. I imagine there's a more optimized way to do it with a lookahead, or perhaps in the header parser, but I'm being a bit hands off because I don't fully understand the ins and outs of that part of the code. In any case, it passes the test, and there doesn't seem to be a performance hit in the benchmarks -- perhaps because tables are relatively rare.

Second-stab patchset attached.

Google Code Info:
Author: [email protected]
Created On: 2011-02-20T21:35:53.000Z

jgm on 10 Jun 2011

Thanks for all of this. I need to think a bit more about the one-column table issue. It seems desirable to allow one-column tables, but I think it's probably enough if you can do it with a line above. So this seems the right approach.

Google Code Info:
Author: [email protected]
Created On: 2011-02-21T01:17:51.000Z

jgm on 10 Jun 2011

Just had another look at this and realized I maybe was a bit unclear. The limitations in one-column tables are already there, regardless of multiline headers, due to an overlap with the setext definition. So these patches don't actually change table behavior at all.

Apologies if unnecessary -- I got a bit caught up myself trying to figure out what I meant by "the current behavior."

Google Code Info:
Author: [email protected]
Created On: 2011-02-23T18:16:36.000Z

jgm on 10 Jun 2011

Hi John,

Just wondering if you had come to a decision about whether or not to accept this feature (not to say this implementation). Or is it awaiting some free time to think it through -- perhaps the end of the semester?

No particular rush -- the patches have been working for me, without bugs or any noticable performance hit. I've been holding off on further stuff, like tests and strict behavior, but if you were interested, I might turn my attention to it once my semester was over.

--J

Google Code Info:
Author: [email protected]
Created On: 2011-04-20T20:32:58.000Z

jgm on 10 Jun 2011

I haven't had time to think through all the implications yet. I'll try to get to it in the next couple months.

Google Code Info:
Author: [email protected]
Created On: 2011-04-20T20:48:44.000Z

jgm on 10 Jun 2011

Cool -- thanks.

Google Code Info:
Author: [email protected]
Created On: 2011-04-20T20:52:26.000Z

jgm on 10 Jun 2011

Just signaling interest for this feature, in case it went forgotten. I like to have all my plaintext files 80 char wide, but the not having newlines in headers makes it quite hard to.

Bercio on 21 Sep 2013

👍3

Was this closed because it's implemented, or because it's been decided that it won't be implemented?

chronodm on 1 Aug 2016

Won't be implemented.

+++ chronodm [Aug 01 16 14:50 ]:

Was this closed because it's implemented, or because it's been decided
that it won't be implemented?

—
You are receiving this because you modified the open/close state.
Reply to this email directly, [1]view it on GitHub, or [2]mute the
thread.

References

https://github.com/jgm/pandoc/issues/82#issuecomment-236718870

https://github.com/notifications/unsubscribe-auth/AAAL5Clfr-aUn8q1W7IpF2oGmoO-zxG9ks5qbmobgaJpZM4BBXOe