pandoc 🚀 - Support for table column spans, table attributes in AST

+++ Alberto Leal [Oct 16 13 17:35 ]:

I tried looking for this within the Pandoc's docs. None of the flavours of markdown tables support column spanning. I don't think there are known markdown flavours that support column spanning except for multimarkdown.

Are there plans to support this?

Long-term, yes, I'd like to.

jgm on 17 Oct 2013

Are there any plans for this? I'm also interested in this.

jokogr on 22 Oct 2015

+++ jokogr [Oct 22 15 08:00 ]:

Are there any plans for this? I'm also interested in this.

Yes, it would be good to do, but it's a big change as it
requires changes in the underlying document model.

jgm on 22 Oct 2015

Is there anything I could do to speed this up?

jokogr on 23 Oct 2015

+1

adius on 12 Nov 2015

+1

colourwonder on 23 Dec 2015

+1

brianfeister on 10 Mar 2016

Are there any plans for this? I'm also interested in this.

@jokogr, sorry for my obvious reply: providing a patch may help.

ousia on 10 Mar 2016

+++ Pablo Rodríguez [Mar 09 16 22:03 ]:

Are there any plans for this? I'm also interested in this.
[1]@jokogr, sorry for my obvious reply: providing a patch may help.

This would require some major architecture change,
including changes in pandoc-types, all readers and writers.

jgm on 10 Mar 2016

@jgm is there any way to achieve this with a two-step process? Compile multimarkdown to an intermediate state and then that result with pandoc?

brianfeister on 10 Mar 2016

+++ Brian Feister [Mar 10 16 08:50 ]:

[1]@jgm is there any way to achieve this with a two-step process?
Compile multimarkdown to an intermediate state and then that result
with pandoc?

No, the problem is very simple. Pandoc's internal document
model doesn't allow colspans or rowspans. It's on the list
of things to improve.

jgm on 10 Mar 2016

Sidenote: probably this issue should be applied the label AST change.

In an attempt to make a suggestion here, I stumbled on issue #3154: pandoc "almost" has a 5th table extension: using native HTML as table. If it were true:

then after the AST changed to allow colspan and rowspan, then before we settled a syntax(es) for them, we can immediately start using it. For example, in md to LaTeX conversion, it can eats the colspan and rowspan and spills the multicolumn and multirow.

The reason for this suggestion is that settling for a syntax(es) is often tricky and requires a lot of discussions (except for possibly "mmd_colspan" since it is already there). But if it only requires the AST change (which is a prerequisite anyway for the new syntax(es) as explained) would make it easier.

ickc on 8 Oct 2016

👍2

An AST change to tables would require changes in both
readers and writers. You're right that we would not
necessarily need to support a native Markdown table
syntax with rowspans and colspans right away. We
could just implement HTML tables. But we'd still need
to change ALL the writers to handle rowspans and colspans,
since these would be in the basic table model. That's
already somewhat daunting (I suppose Markdown could fall
back to HTML, but RST couldn't).

jgm on 9 Oct 2016

@jgm said,

But we'd still need
to change ALL the writers to handle rowspans and colspans,
since these would be in the basic table model.

I don't know much about the design of the AST. Can a new AST including column & row span be a superset of the current AST?

If so, the transition period can be made smoother, _i.e._ a gradual roll out of the feature rather than changing the AST and all the writers & readers at the same time:

The AST can be changed first. Since it is the superset of the original, every reader/writer would still works
The column/row span extension can be activated by a feature flag of each writer and reader, documented in a matrix like this from OpenZFS.
In general, writer has higher priority to implement the feature flag than reader (except for markdown reader), since every existing reader generated a valid AST.
Only when both the from-format reader & to-format writer has the feature flag, the extension is activated.

ickc on 12 Oct 2016

One of the reasons this feature is important is that in scientific setting whenever one needs to compare multiple groups, some kind of subheader is needed in the table. Having that, would permanently make pandoc place firm foot within the space of scientific writing.

lf-araujo on 12 Oct 2016

+++ lf_araujo [Oct 11 16 22:34 ]:

One of the reasons this feature is important is that in scientific
setting whenever one needs to compare multiple groups, some kind of
subheader is needed in the table. Having that, would permanently make
pandoc place firm foot within the space of scientific writing.

Actually there are two issues here, right?

column spans
the ability to have multiple rows in a header

jgm on 12 Oct 2016

No, changing the table AST would definitely require changes
to all writers and readers, immediately. The writers would
all have to know what to do when they encounter colspans.

jgm on 12 Oct 2016

I don't understand. Can we put a switch in the reader such that when a pandoc command is used, knowing the output writer do not understand colspans, then the reader do not parse colspans. It seems like somekind of switch are already being used, say when some extensions is turned off in the command line.

I can see a problem might occur if the input format is AST or JSON where the reader cannot (as least difficult to) switch off the colspan/row features. But people should know what they are doing if AST/JSON is used (and show an error message to them).

ickc on 12 Oct 2016

Actually there are two issues here, right?

Yes. The ability to format subheaders would be also needed.

lf-araujo on 12 Oct 2016

+++ ickc [Oct 12 16 01:15 ]:

I don't understand. Can we put a switch in the reader such that when a
pandoc command is used, knowing the output writer do not understand
colspans, then the reader do not parse colspans. It seems like somekind
of switch are already being used, say when some extensions is turned
off in the command line.

No, readers are always independent of writers. But you're
not seeing the problem. Even if we had a switch that
dependend on the output format, the readers would still need
to be rewritten because of changes in the Pandoc type
(specifically the Table constructor).

jgm on 12 Oct 2016

I see. Then that is really a huge task. And I guess since changing AST would have compatibility issue, one didn't want to change that often, which means the strategy is probably to do serveral important AST change at once, which would make it even a bigger challenge.

And since AST change will break backward compatibility, it is safe to say it will only be in pandoc v2.0? In that case should a milestone be setup (even with no deadline), and add those important AST change to it (among other things).

ickc on 12 Oct 2016

Thanks for the attention. I will leave two models of tables that are prevalent in papers. The first should be approachable in future iterations of pandoc, the second one, however, is a little more tricky and may not.

| Area                      |  Subjects       |      Controls   |
|---------------------------|-----------------|-----------------|
|                           |SD | se | p-value|SD | se | p-value|
|===========================|=================|=================|
| Standardised coefficients                                   |||
|===========================|=================|=================|
| Left fusiform area        | 1 | 2 | .05     | 3 | 4 |  .05    |
| Right insula              | 5 | 6 | .05     | 7 | 8 |  .05    |
| Left insula               | 5 | 6 | .05     | 7 | 8 |  .05    |
| Right fusiform area       | 1 | 2 | .05     | 3 | 4 |  .05    |
|===========================|=================|=================|
| Factor loadings                                             |||
|===========================|=================|=================|
| X                         | 1 | 2 | .05     | 3 | 4 |  .05    |
| Y                         | 5 | 6 | .05     | 7 | 8 |  .05    |
| Z                         | 5 | 6 | .05     | 7 | 8 |  .05    |

The equals were used to represent the bits that usually have lines to separate the subreader.

The second trickier table occurs when one wants to span vertically two cells. This is not essential, I am putting as an example of a common types of tables (a table for description of the population in this case).

| **Variables**                  |  **Healthy subjects (mean)** |   **Patients (mean)**   | **p-value**  |
|--------------------------------|------------------------------|-------------------------|--------------|
| **Age**                        |       11                     |        28.51            |    .01^(U)^  |
| **Gender**                                                                                          ||||                                                                     
| Male (%)                       |        12%                   |      99%                |              |
| Female (%)                     |        13%                   |         88%%            |  .99^(a)^    |                                   
| **Time from onset (days)**     |  NA                          |        111              |              |
| **Education (mean, in years)** |  10                          |     5                   |   .11 ^(U)^  |

What typically happens is a merge of the cell containing the value .99 and the cell above. That statistics concerns both Male and Female. I hope I am being clear.

lf-araujo on 13 Oct 2016

👍1

Pandoc currently allows at most one header row, which must be at the top. A rule is inserted below it in default LaTeX output.

One could try to separate conceptually between _being a header cell_ and _having a rule under_, so that a cell could have one of these properties without the other. Perhaps the idea in @lf-araujo's example is that a rule of hyphens --- divides the header from the rest, while a rule of equals === indicates a rule? But do the hyphens also cause a rule to be rendered?

jgm on 7 Nov 2016

Here's a proposal for an AST change:

type Rowspan = Int
type Colspan = Int
data Cell = Cell Rowspan Colspan [Block]
data Row = Row [Cell] | Header [[Cell]]
type Caption = [Block]
-- constructor for Table in Block:
   | Table Attr Caption [Alignment] [Maybe Double] [Row]

Improvements:

Can have cells spanning columns
Can have cells spanning rows
Can have multi-row headers and secondary headers
Tables can have attributes (e.g. id, class)
Column widths are optional for each column
Captions can have block-level content

Thoughts on this?

Of course, if we implemented this, one should not expect support for all features in all readers/writers. Some tables representable this way may not be renderable in some formats.

jgm on 26 Feb 2017

Looks good on quick glance. Just crossed my mind that we could additionally add [Attr] [Attr] to Table in order to have attributes on the rows and columns as well (the alignments and widths could instead be considered attributes on the columns, not sure if this would clean up or complicate the readers and writers). Then again, this might go too far...

mb21 on 26 Feb 2017

@jgm said:

Column widths are optional for each column

What does this mean? I think currently unspecified widths will be just [0.0, 0.0, ...]. Does it mean in the future it can be [0.0, 0.2, 0.0, 0.3, ...]? What would this mean?

Some tables representable this way may not be renderable in some formats.

How to handle and output format that don't support colspan and rowspan? It would be nice that there's some way to output a table without rowspan/colspan while trying to preserve this structure. The simple way will be putting a bunch of empty cells on the rest of the rowspan/colspan. But then may be some sort of keyword convention can be put there to indicate to human readers that the cell is supposed to be a continuation of the previous one. I'm particularly interested in this because the pantable csv "reader & writer" is currently "lossless" w.r.t. AST to CSV and vice versa.

Other questions are:

it seems this proposal can allow header row after non-header row?
Any plan to support the vertical header mentioned in #1359 (e.g. through transpose specified in attributes)?
@mb21's suggestion on granting attributes on rows and columns might be useful too. Although it might seems overkill for now, but this will ensure it to be "feature-complete" and I'm sure some people might find it useful (since people are already doing it in HTML). And for the markdown syntax for this, the cells in the first row/column following standard pandoc attribute syntax would do.

ickc on 26 Feb 2017

I can't help with the AST, but I have been generating tables with multimarkdown to latex and importing them into my mds and later processing with pandoc.

There are two problems, one is the row span for which I don't think there is an easy solution. The second problem is the column span for which I suggested a layout previously. This design can be further simplified to:

| Area                      |  Subjects       |      Controls   |
|                           |SD | se | p-value|SD | se | p-value|
|---------------------------|-----------------|-----------------|
| Standardised coefficients                                   |||
|---------------------------|-----------------|-----------------|
| Left fusiform area        | 1 | 2 | .05     | 3 | 4 |  .05    |
| Right insula              | 5 | 6 | .05     | 7 | 8 |  .05    |
| Left insula               | 5 | 6 | .05     | 7 | 8 |  .05    |
| Right fusiform area       | 1 | 2 | .05     | 3 | 4 |  .05    |
|---------------------------|-----------------|-----------------|
| Factor loadings                                             |||
|---------------------------|-----------------|-----------------|
| X                         | 1 | 2 | .05     | 3 | 4 |  .05    |
| Y                         | 5 | 6 | .05     | 7 | 8 |  .05    |
| Z                         | 5 | 6 | .05     | 7 | 8 |  .05    |

So no need for equals, instead hyphens can do the trick. The first hyphen conjunct to appear should represent the end of the first header. Each following conjunct of hyphens should represent the beginning and the end of headings for sections within the table (or only markups for printing /midrule in latex).

As for the vertical cell span, which is also very important in publication, unless someone comes up with a readable way of representing it in plain text, it probably should be left out of the changes for now.

lf-araujo on 26 Feb 2017

@lf-araujo, I think you're talking about pipe tables. In grid tables I think the rowspan and colspan will comes naturally. Currently although pandoc has 4 table syntax, not all of them support all features supported by the AST. In fact, only grid table syntax support everything the AST is capable of. And this can be reasonably expected to be true too once rowspan/colspan is implemented.

Personally, I have no easy way to write grid table (when I need all features supported by pandoc's AST), since I don't use emacs. That's why I wrote a filter, pantable, to do something similar but in CSV instead. But it will be challenging to support colspan/rowspan in CSV with pantable. That's why I have a question about this above.

ickc on 26 Feb 2017

👍1

+++ lf_araujo [Feb 26 17 14:43 ]:

As for the vertical cell span, which is also very important in
publication, unless someone comes up with a readable way of
representing it in plain text, it probably should be left out of the
changes for now.

I actually do have some ideas there, but the AST can change
even if we don't have a way of representing in plain text.
After all, one might use pandoc to convert from HTML to
DocBook, for example, and both formats have easy raws of
representing rowspans. Pandoc can revert to raw HTML when
rendering a table with rowspans in Markdown (as it does
now).

jgm on 27 Feb 2017

+++ ickc [Feb 26 17 13:28 ]:

[1]@jgm said:
Column widths are optional for each column
What does this mean? I think currently unspecified widths will be just
[0.0, 0.0, ...]. Does it mean in the future it can be [0.0, 0.2, 0.0,
0.3, ...]? What would this mean?

It would mean that explicit width attributes would be left
off for these columns. In HTML, that's easy. In LaTeX, it
means we don't use a parbox. In Markdown, we'd have to
calculate a width (probably in the brain-dead way of just
dividing available space by the number of columns for
which we need widths).

Some tables representable this way may not be renderable in some
formats.
How to handle and output format that don't support colspan and rowspan?

In Markdown we can revert to raw HTML (as we do now).
In other formats we may simply have to omit the table,
perhaps with a placeholder like [table]. But that wouldn't
be any worse than what we do now, would it?

it seems this proposal can allow header row after non-header row?

Yes, by design, for subheaders.

Any plan to support the vertical header mentioned in [2]#1359 (e.g.
through transpose specified in attributes)?

No, I don't see a clean way to do that.

[3]@mb21's suggestion on granting attributes on rows and columns
might be useful too. Although it might seems overkill for now, but
this will ensure it to be "feature-complete" and I'm sure some
people might find it useful (since people are already doing it in
HTML). And for the markdown syntax for this, the cells in the first
row/column following standard pandoc attribute syntax would do.

jgm on 27 Feb 2017

Any plan to support the vertical header mentioned in [2]#1359 (e.g. through transpose specified in attributes)?

No, I don't see a clean way to do that.

It wouldn't be a semantically clean way, but with attributes for the columns we could at least make it bold or add a class to it.

mb21 on 27 Feb 2017

jgm [27 Feb]:

I actually do have some ideas there, but the AST can change
even if we don't have a way of representing in plain text.
After all, one might use pandoc to convert from HTML to
DocBook, for example, and both formats have easy raws of
representing rowspans. Pandoc can revert to raw HTML when
rendering a table with rowspans in Markdown (as it does
now).

I agree. While I see the advantage to reliably providing like-for-like conversions, pragmatically I think it's seriously worth considering fallbacks.

Rowspan / colspan table attributes are a necessary part of academic / research paper formats; being able to use Pandoc to convert these is something that would have positive effect.

I'd be very keen to see this feature available ASAP.

luke-sift on 5 May 2017

Here is a plan of action that will also us to integrate the new table features little by little, without doing everything in one massive push:

[ ] Decide on new AST type.
[ ] Make changes to pandoc-types, including changes in Builder, JSON, etc.). But, for now, keep the signature of Builder.table the same.
[ ] Modify pandoc.cabal and stack.yaml so pandoc builds against the new pandoc-types commit. Build will fail.
[ ] Add a temporary function that destructures a Table into the same fields we currently have in Table (with appropriate fallbacks). Use this to quickly convert the current writers to work with the new Table type. Readers should already work, because Builder.table has the same signature. A few changes is auxiliary functions may be needed. At this point, pandoc should compile against the new pandoc-types, but it will have no new table fetaures. The idea is that we'll add these gradually.
[ ] Modify signature Builder.table so it can construct new-style tables with all the features. Modify the readers to use the new Builder.table, but not yet to include any new table features. Pandoc should again compile.
[ ] At this point we can work on individual readers and writers, converting them to use the new table features.

jgm on 21 May 2017

👍8 🎉1

We should try to ensure that there's only one way to represent a given table in the data type, and that bad tables are impossible to represent. See #3648.

jgm on 25 May 2017

The following types were proposed earlier:

type Rowspan = Int
type Colspan = Int
data Cell = Cell Rowspan Colspan [Block]
data Row = Row [Cell] | Header [[Cell]]
type Caption = [Block]
-- constructor for Table in Block:
   | Table Attr Caption [Alignment] [Maybe Double] [Row]

Ideally we could get more guarantees into the types, though I'm not sure how. This representation

does not require that the number of alignment specifiers = the number of width specifiers, or that either = the number of columns in a row.
does not require that rows all have the same number of columns (taking into account colspans)
does not require that columns all have the same number of rows (taking into account rowspans).

For example, with this setup, we can represent a table

Row [Cell colspan=1 rowspan=2 A, Cell colspan=1, rowspan=1 B]
Row [Cell colspan=2 rowspan=1 C]

That doesn't really make sense. Maybe we can't do much better, though, without dependent types. At least we could switch to

   | Table Attr Caption [(Alignment, Maybe Double)] [Row]

jgm on 25 May 2017

I guess its kind of the same issue as with typed attributes: are we willing to use special-ghc features (like dependable types or view patterns) to make the code more type safe (while keeping it generic) or do we favour a more simple and accessible code base?

Personally, I'm fine with using using more advanced ghc features if it helps with future maintenance...

mb21 on 25 May 2017

Slight improvement

type Rowspan = Int
type Colspan = Int
data Cell = Cell Rowspan Colspan [Block]
data Row = Row [Cell] | Header [[Cell]]
type Caption = [Block]
-- constructor for Table in Block:
   | Table Attr Caption [(Alignment, Maybe Double)] [Row]

jgm on 29 May 2017

+1

skmohammadi on 31 May 2017

It may be worthwhile to cross check whatever syntax you decide on against docutils grid table implementation. They document their data structure pretty well and have supported this multi-cell spanning functionality for a while without much grief: https://sourceforge.net/p/docutils/code/HEAD/tree/trunk/docutils/docutils/parsers/rst/tableparser.py#l91

alyjak on 1 Jun 2017

For convenience I copy the docutils comment here:

 Here's an example of a grid table::

        +------------------------+------------+----------+----------+
        | Header row, column 1   | Header 2   | Header 3 | Header 4 |
        +========================+============+==========+==========+
        | body row 1, column 1   | column 2   | column 3 | column 4 |
        +------------------------+------------+----------+----------+
        | body row 2             | Cells may span columns.          |
        +------------------------+------------+---------------------+
        | body row 3             | Cells may  | - Table cells       |
        +------------------------+ span rows. | - contain           |
        | body row 4             |            | - body elements.    |
        +------------------------+------------+---------------------+

    Intersections use '+', row separators use '-' (except for one optional
    head/body row separator, which uses '='), and column separators use '|'.

    Passing the above table to the `parse()` method will result in the
    following data structure::

        ([24, 12, 10, 10],
         [[(0, 0, 1, ['Header row, column 1']),
           (0, 0, 1, ['Header 2']),
           (0, 0, 1, ['Header 3']),
           (0, 0, 1, ['Header 4'])]],
         [[(0, 0, 3, ['body row 1, column 1']),
           (0, 0, 3, ['column 2']),
           (0, 0, 3, ['column 3']),
           (0, 0, 3, ['column 4'])],
          [(0, 0, 5, ['body row 2']),
           (0, 2, 5, ['Cells may span columns.']),
           None,
           None],
          [(0, 0, 7, ['body row 3']),
           (1, 0, 7, ['Cells may', 'span rows.', '']),
           (1, 1, 7, ['- Table cells', '- contain', '- body elements.']),
           None],
          [(0, 0, 9, ['body row 4']), None, None, None]])

    The first item is a list containing column widths (colspecs). The second
    item is a list of head rows, and the third is a list of body rows. Each
    row contains a list of cells. Each cell is either None (for a cell unused
    because of another cell's span), or a tuple. A cell tuple contains four
    items: the number of extra rows used by the cell in a vertical span
    (morerows); the number of extra columns used by the cell in a horizontal
    span (morecols); the line offset of the first line of the cell contents;
    and the cell contents, a list of lines of text.

jgm on 1 Jun 2017

Helpful comment on the commonmark forum about the need for row headers to support accessibility (screen readers). So maybe we need to think harder about how to add that.

jgm on 12 Jun 2017

Some more information (and examples) about accessible tables: https://www.w3.org/WAI/tutorials/tables/

davidar on 12 Jun 2017

Is there any place I can see the specification for this planned feature (so I can start implementing it and sending patches), or the design is still WiP?

adamryczkowski on 5 Jul 2017

We haven't yet settled on a specification. I'd like to get it right before we do the coding, because it will be a pain to revise it in the future if we don't. But, help on this is most definitely welcome (both in the design and in the coding phase). If you have a concrete suggestion for the data types, after reading the above discussion, feel free to make it here. When we get to the coding phase I would love to have help.

I would like to do this for pandoc 2.0, just trying to resolve some other issues first.

jgm on 5 Jul 2017

If you are on the planning stage, I'll use that opportunity and post some suggestions for the potential enhancement to the markdown syntax you support.

adamryczkowski on 5 Jul 2017

Here it is: https://github.com/jgm/pandoc/issues/3782

Pandoc is very well integrated into the R statistical environment, and is the format of choice when doing automatically generated scientific reports. I believe multi-cell tables would be welcome by many (if not most) data scientists on this planet. :-)

adamryczkowski on 5 Jul 2017

Here it is: #3782

Why don't you just add it to this issue? I guess #3782 should be closed because it is a duplicate of this.

ickc on 5 Jul 2017

👍1

Just documenting multimarkdown's colspan syntax:

|             |          Grouping           ||
First Header  | Second Header | Third Header |
 ------------ | :-----------: | -----------: |
Content       |          *Long Cell*        ||
Content       |   **Cell**    |         Cell |
To indicate that a cell should span multiple columns, then simply add additional pipes (|) at the end of the cell, as shown in the example. If the cell in question is at the end of the row, then of course that means that pipes are not optional at the end of that row…. The number of pipes equals the number of columns the cell should span.

mb21 on 14 Jul 2017

About the Haskell structure, it may be interesting to see how GraphViz does it...

mb21 on 14 Jul 2017

Looking at the definition of Table, compared to the other Block types; it has a load of parameters, a few of them optional (caption and header, and few with default values (alignment and column widths).
I had a quick look at the HTML and Latex writers, none of these seem to actually pattern match on these values (which @jgm seems to have as a concern in #684 regarding how a general Attributes type should be defined).
The pattern matching I have found (by browsing quickly through the two writers) was based on nested elements inside for example Block elements.

What I am trying to say here is that at least the optional Table parameters could easily be placed inside some sort of Attribute instead. I would even go as far as to state that the two parameters with default values could be as well.
That would not complicate the code in my opinion.

Handling the caption would be something like

blockToHtml opts (Table capt aligns widths headers rows') = do
  captionDoc <- if null capt
                   then return mempty
                   else do
[...]

blockToHtml opts (Table Attr rows') = do
  captionDoc <- if isNothing (lookup "tableCaption" Attr)
                   then return mempty
                   else do
[...]

I know this is a wee bit simplified, as the caption is a list of Inline, and the lookup of the Attr class that @mb21 proposed on 26 Feb returns a String. However that could be solved by storing the inlines as the HTML string (or Latex string, etc) and then perhaps changing the lookup function return a Maybe instead.

Regarding the non optional elements. The writer seems to assume that they have the correct lengths? So I guess it could just as well assume that they are available in the attributes. But of cause if you wan't to use dependent types at some point to assert the correct length of the list, then I assume it would be required that they were not stored inside the attributes, but kept as they are now.

One issue, however, is that one would need to know which attributes is "special" and as such should not be written in the final output. For example in the HTML writer the "tableCaption" attribute should be removed from the list, before the table tag is being generated with the remaining attributes added.

reenberg on 31 Jul 2017

See #2978; it would be useful to have a way to specify a "short caption."

jgm on 15 Aug 2017

I don't have time to do this in the near future, so I'm removing it from the pandoc 2.0 milestone so as not to hold up the release.

jgm on 20 Aug 2017

That's a real shame about missing 2.0.

What sort of version bump would such a change to the AST require — is 2.0 → 2.1 acceptable, or would it have to wait for a 3.0?

dbaynard on 23 Aug 2017

👍2

It can be done in 2.1, which is sort of the plan now...
Originally the main thrust of 2.0 were the API changes in
pandoc itself; no major changes in pandoc-types were
planned.

+++ dbaynard [Aug 23 17 01:46 ]:

That's a real shame about missing 2.0.

What sort of version bump would such a change to the AST require — is
2.0 → 2.1 acceptable, or would it have to wait for a 3.0?

—
You are receiving this because you were mentioned.
Reply to this email directly, [1]view it on GitHub, or [2]mute the
thread.

References

https://github.com/jgm/pandoc/issues/1024#issuecomment-324263719

https://github.com/notifications/unsubscribe-auth/AAAL5OQKapXp9DZsCWklCW0-VCF2E82Tks5sa-bfgaJpZM4BGvN6

jgm on 23 Aug 2017

❤1 👍1

I think it's a good strategy:

Too much change in 1 release put too much burden not only to the pandoc developers but also to the developers of the "pandoc ecosystem", e.g. the wrappers/interfaces
supporting table column/row span is (probably) not backward incompatible in syntax (i.e. old documents, cli scripts, created in pandoc 2.0 should still be valid)
wait for pandoc-types 2.0?

ickc on 23 Aug 2017

I'm willing to help (I have haskell experience, though not lua), but circumstances do not allow it for another 6 weeks.

@jgm I note in your proposal above that you are adding Attr to Table

-- constructor for Table in Block:
   | Table Attr Caption [(Alignment, Maybe Double)] [Row]

Do you want to resolve this at the same time as #684 (Attr for all block elements?)?

dbaynard on 25 Aug 2017

For #684, I don't think it has been settled yet. I think as in many issues here, the bottleneck is not in implementing it, but to discuss and decide on which syntax/feature/etc. is exactly needed. Last time @jgm spoke about the #684 issue, he's still not convinced that all should receive it. It seems that the mentality is if for some elements we can get away with having no attributes, then we shouldn't grant it.

And it is kind of similar for this issue too. Not only the AST is not settled (is it?), but the syntaxes to use this feature (say in pandoc markdown) is also not.

And just to note the level of complexity of this issue: all reader/writer pairs has to support it (even for some format it has to be rendered in HTML), so it is quite non-trivial. I'd imagine it will be easier to be done by a couple of people, but then someone need to coordinate it, for example, settling the AST first, and then make a list of formats, and then say for these formats, HTML will be used, for these others (markdown, rst, etc.), grid table will be used. And I imagine the LaTeX pair will be kind of difficult because the from-format can have a lot of different package variants, and we may need to be careful on the suitability of long table in the to-format (not to mention there are other issues of rendering tables only using long table).

Lastly, I guess @jgm want to release pandoc 2.0 ASAP. Six weeks might be too long for that (guessed from comments on the polyglot HTML writer).

ickc on 26 Aug 2017

👍1

+++ dbaynard [Aug 25 17 20:37 ]:

Do you want to do this at the same time as [2]#684 (Attr for all block
elements?)?

I'm still undecided about the Attr for all block elements.
I think it would make sense to do the table change first.

jgm on 26 Aug 2017

If helpful, this is the expected compatibility matrix (assuming || is adopted)?

| Type | Row Spans? | Col Spans? |
|:----------------:|:----------:|:----------:|
| Simple Tables | No | No |
| Multiline Tables | No | Yes (?) |
| Grid Tables | Yes | Yes |
| Pipe Tables | No | Yes (?) |

I don't know if this is appropriate for this thread, but would it make sense to promote grid tables as enabled by default, as well as the default output for the markdown writer? It would make the table to table conversion from say TeX or html a little more convenient if you can assume that an input row and colspan table will always convert to an output row and colspan capable format.

svenevs on 29 Aug 2017

I have some ideas about a syntax allowing row spans in pipe
tables, so we might eventually be able to support that.

I don't know if this is appropriate for this thread, but would it make
sense to promote grid tables as enabled by default, as well as the
default output for the markdown writer? It would make the table to
table conversion from say TeX or html a little more convenient if you
can assume that an input row and colspan table will always convert to
an output row and colspan capable format.

The current behavior will try to find a table format that is
supported by the enabled extensions and can handle the
features of the table. So, e.g. if the table contains block
level content, a grid table is used.

jgm on 29 Aug 2017

Another point of reference for this discussion: Grid tables are a pain to write in most cases and also create pretty hard to review diffs: If you are looking for text formats that allow row and column spans, The Linux Kernel team created a restructured text extension to create what they call a "flat table" which allows for column and row spans as well as multiple header rows. Here's the spec https://return42.github.io/linuxdoc/linuxdoc-howto/table-markup.html#rest-flat-table

If you guys something similar to a flat table type to cover these advanced table markup requests there may be some potential wins both with respect to ease of development as well as ease of use for pandoc markdown users.

alyjak on 15 Nov 2017

The Linux Kernel team created a restructured text extension to create what they call a "flat table" which allows for column and row spans as well as multiple header rows.

One suggestion is that after pandoc add the column/row span feature in its AST (pandoc 2.1?), then the rst reader will add the support of this extension. And then people who would want to use this feature "in markdown" could use the new pandoc 2.0 extension raw_attribute to inline an rst table in markdown. The downside is that markdown syntax won't be allowed within the table. But from what I observe, the pandoc community holds a high standard on what markdown extensions should be added according to the "markdown test". From what I read from your link, that table format is not markdown-ish at all. So most probably it will not be considered as a markdown table extension in pandoc.

I feel your pain in writing tables in markdown too. Currently, the grid table is the only one (out of 4) in pandoc that can utilizes every table feature in the AST. That's why I wrote pantable, to write my markdown tables in CSV format instead (which also exhaust all pandoc's table feature in AST, and has a reader/writer pair to jump back and forth between native table and csv table). By the way, it would be a challenge to support col/row span feature in this CSV format, but it's my intention to do that because I can't imagine writing tables in any of the native extensions.

ickc on 15 Nov 2017

👍1

Nice! A csv format that supports all the features of the AST would be perfect! I've always imagined trying to make that work but haven't had the time to try and get something working. I'm watching pantable now :+1:

alyjak on 15 Nov 2017

😄1

Any plan to support the vertical header mentioned in #1359

Helpful comment on the commonmark forum about the need for row headers to support accessibility (screen readers)

HTML solved this with the th (table header) element, which is used in place of td (cells), not in place of tr (rows). See e.g. this example. Yet of course, there is also the thead element which works more like the proposed ADT.

Docbook, has the rowheader attribute for this case.

If pandoc would use a HeaderCell as well, the HTML Writer could simply wrap the first n rows that only contain HeaderCells in a thead. It could also wrap the last n rows in the table with only HeaderCells in a tfoot.

We should try to ensure that there's only one way to represent a given table in the data type, and that bad tables are impossible to represent. See #3648.

As mentioned, I guess there are only two ways:

dependent types (@jgm, have you ruled this out categorically already?)
make sure everything goes through the builer (which would do some runtime corrections if necessary, like filling up missing columns)

I'm still pondering adding Attr to Cell. If only to give filters etc. an escape-hatch. And for complex use-cases that require the headers attribute in HTML, which "contains a list of space-separated strings, each corresponding to the id attribute of the <th> elements that apply to this element".

Incorporating all of the above, would lead to the following AST:

type Rowspan = Int
type Colspan = Int
type Caption = [Block]
type ShortCaption = [Inline]
type Colwidth = Maybe Double
data Cell = Cell       Attr Rowspan Colspan [Block]
          | HeaderCell Attr Rowspan Colspan [Block]
          | NoCell -- if this slot is taken by anoter cell's row/colspan (idea from docutils snippet above)
type Row = [Cell]
-- constructor for Table in Block:
   | Table Attr Caption ShortCaption [(Alignment, Colwidth)] [Row]

(That is, unless we get the general possibly-floating container discussed in #3177, which could wrap the table. Then we could place the Caption and ShortCaption in that.)

Edit: An alternative to Cell ... | HeaderCell ... would be Cell CellType ...

mb21 on 28 Dec 2017

👍2

Maybe we could get away with using LiquidHaskell to verify the table invariants. (I have close to no experience with LiquidHaskell, so I don't know whether it's a good choice for this use-case.)

tarleb on 28 Dec 2017

@tarleb LiquidHaskell might indeed work for this, whereas I'm not sure dependent types would (although I've no experience either), because we don't know at compile-time how long the table rows are going to be – we only know that all rows should have the same length.

mb21 on 30 Dec 2017

Perhaps a structure that guarantees that the tables make sense can be composed without dependent types with something like this:

data Orientation = Vertical | Horizontal
data Split a = Split Orientation Rational a a
             -- ^ orientation, ratio, values
data Grid = GridSplit (Split Grid)
          | GridCell String

Edit: I've slightly messed it up at first, but the idea is the same.
Edit 2: Though there would be more than one way to represent a table, and it won't work for all kinds of tables.

defanor on 1 Jan 2018

Just for the record, a simple dependent-types-based demo, without row- and col-spans though:

{-# LANGUAGE DataKinds, GADTs, StandaloneDeriving #-}

-- List with typed length
-- from https://www.schoolofhaskell.com/user/konn/prove-your-haskell-for-great-safety/dependent-types-in-haskell
-- for production, we could use https://hackage.haskell.org/package/sized instead
data Nat = Z | S Nat -- type-level natural numbers
data List n a where
  Nil  :: List Z a
  (:-) :: a -> List n a -> List (S n) a
infixr 5 :-
deriving instance Eq a => Eq (List n a)
deriving instance Show a => Show (List n a)

-- `Table n` and `Row n` have `n` columns
data Alignment = AlignLeft
               | AlignRight
               | AlignDefault deriving (Eq, Show)
type Colwidth = Maybe Double
data Cell = Cell String
  deriving (Show, Eq)
type Row n = List n Cell
data Table n = Table (List n Alignment) (List n Colwidth) [Row n]
  deriving (Show, Eq)

-- Sample
r1 = Cell "foo" :- Nil
r2 = Cell "bar" :- Nil
r3 = Cell "foobar" :- Cell "bar" :- Nil
t = Table (AlignDefault :- Nil) (Just 1.0 :- Nil) [r1, r2]
-- the following do not type-check:
-- Table (AlignDefault :- AlignDefault :- Nil) (Just 1.0 :- Nil) [r1, r2]
-- Table (                AlignDefault :- Nil) (Just 1.0 :- Nil) [r1, r3]

mb21 on 2 Jan 2018

❤1

there are a lot of valid contributions in this thread, and this is a complex and multi-faceted problem. i suggest to add a document to the repo about the design of the feature, and develop the discussion through pull requests to that document. advantages:

we will have a reference about the current status of the design at any time
the document can stay also afterwards as documentation about the code
discussion about the design can be threaded by specific changes to the design document

i hope you will agree that this is desirable, i find it difficult to keep all the comments here in mind in order to get an idea about where we want to go, it makes context switching more demanding. if we all agree about the method, it will be just a matter of who has availability to summarise this thread in a design document the soonest. i might work on this in the near future as we are facing the same problem

danse on 14 Feb 2018

👍3

add a document to the repo

The wiki on GitHub can be used for this.

ickc on 15 Feb 2018

The wiki on GitHub can be used for this.

How would non-collaborators contribute? This article outlined a nice automated workflow. It seems pretty legit, the API token is encrypted, etc.

svenevs on 15 Feb 2018

i was thinking of something more straightforward, like a new .md file in doc or in a design folder. in the past i experimented with "documentation driven development", that is designing a feature by writing the documentation for the user. in our case, an user is also the developer of a writer/reader who will be interested in adding support for spans, and is thus interested in the structure of the data model and the rationale behind that

danse on 15 Feb 2018

How would non-collaborators contribute?

Oh, I don't know if the pandoc wiki is setup to allow anyone to edit. I think it is but can anyone confirm?

ickc on 15 Feb 2018

I think it is but can anyone confirm?

Oh my...it is...that seems kinda dangerous. I (very briefly) added a second exclamation point at the top of the home-page and the edit was accepted. FWIW I don't think that should be editable by just anybody...

I personally think that a github-markdown document would be ideal, it could allow for task-lists etc. ~~Either way, I think a maintainer / collaborator needs to create the Wiki or markdown document first before it can be edited by others.~~ Not true, "outsiders" could create it.

Edit: One downside to the wiki is it's more difficult to understand who created what, and what happens with multiple editors working at once. That quick test did this

commit c030c0401d91087e6ef059b96bc7316b0449476c
Author: Stephen McDowell <[email protected]>
Date:   Thu Feb 15 15:44:12 2018 -0800

    Updated Home (markdown)

commit 2170707e2ea0d41bf06ef3e0773d1cd4fa968153
Author: Stephen McDowell <[email protected]>
Date:   Thu Feb 15 15:43:59 2018 -0800

    Updated Home (markdown)

GitHub makes it a little too easy to do that. Sorry for cluttering that, I was expecting a new screen like when you edit a file online for a PR, but it seems it just made the commit right away :scream:

Benefit of Wiki: the maintainers of pandoc don't need to approve every PR that goes toward this specific discussion. That in its own right might be justification enough.

svenevs on 16 Feb 2018

I think at least it requires a github account? And since it is a git the commits can be reversed.

It is danderous, because pandoc.org points to pandoc extras page of the wiki, and anyone can edit or even delete that page.

But I guess it had been working for pandoc for years and no body did any damage so far. So I guess it should only be changed when the hypothetical bad actor appears? (Note that someone did do something to the gitit demo wiki that forces @jgm to take it down. That’s why we don’t have a demo for gitit anymore.)

ickc on 16 Feb 2018

😕1

Yeah you do need a GitHub account, and there is a history attached so it can definitely be revived if the bad actor appears.

So what are your thoughts about this initial mock-up skeleton? I don't really understand too much about the (magic) behind pandoc, but if you think it's a good start then once the page is added anybody watching this issue can jump in

svenevs on 16 Feb 2018

thanks @svenevs! i copy your idea of collecting references and with this post i propose a summary of what happened above, hoping that it will help.

summary of the above

i think contributions can be grouped in three categories:

markdown syntax
plan of action
data model proposals, comparisons, ideas, constraints

personally, given the title of this issue, i think that discussions on markdown syntax should go elsewhere, it's a way more specific problem. the plan of action gives us a way to evolve the data model but there isn't agreement about how that data model will look like. overthinking is a good idea in this case given the cost of changing the design later.

so basically the "design document" i was thinking about consists of a few lines of haskell code that have been pasted here in different versions, and what's the agreement about it?

this is it, as far as i understand the point now is just to evolve this data model definition. i hope that this post can save time to others going through this long issue

danse on 20 Feb 2018

i've been thinking about data models that can give us geometrical
consistency without the need for dependent types. we don't want data
instances featuring cells that overlap, or areas of a table that are
not covered by any cell, or rows or columns that overflow outside
table boundaries.

i had the intuition that such constraints could be enforced by a
recursively defined table structure, but i found out that a table like
the following cannot be represented that way:

a b b    letters are repeated to represent
a c d    cells spanning multiple rows or columns
e e d

i keep this comment for the next person that will have a similar idea

danse on 23 Feb 2018

👍1

I've stumbled upon the same issue with that "gridsplit" approach. Not sure if it can be avoided, but here's another sketch with dependent types (which doesn't account for rowspans and colspans being 0, but counts them otherwise):

data Cell : (colspan, rowspan: Nat) -> Type where
  MkCell : (cs, rs: Nat) -> String -> Cell cs rs

data RowSpan : Type where
  RS : (colspan, rowspan : Nat) -> RowSpan

data Row : (width: Nat) -> (rowspans: List RowSpan) -> Type where
  AddCell : Cell cs rs -> Row n rss -> Row (n + cs) (RS cs rs :: rss)
  EmptyRow : Row 0 []

rowSpanWidth : RowSpan -> Nat
rowSpanWidth (RS cs _) = cs

rowSpanIter : RowSpan -> Maybe RowSpan
rowSpanIter (RS cs Z) = Nothing
rowSpanIter (RS cs (S x)) = Just (RS cs x)

rowSpansWidth : List RowSpan -> Nat
rowSpansWidth = sum . map (\(RS cs _) => cs)

rowSpansIter : List RowSpan -> List RowSpan
rowSpansIter = mapMaybe rowSpanIter

data Table : (width, height: Nat) -> (rowspans: List RowSpan) -> Type where
  AddRow : Row rw rss
         -> Table w h prs
         -> {auto prf: (rw + rowSpansWidth (rowSpansIter prs) = w)}
         -> Table w (S h) (rowSpansIter $ rss ++ prs)
  EmptyTable : (rw: Nat) -> Table rw 0 []

data ValidTable : (width, height: Nat) -> Type where
  MkTable : Table w h rs -> {auto prf: rowSpansIter rs = []} -> ValidTable w h

trickyTable : ValidTable 3 3
trickyTable = MkTable $ 
            AddRow (AddCell (MkCell 2 1 "e") EmptyRow) $
            AddRow (AddCell (MkCell 1 1 "c") (AddCell (MkCell 1 2 "d") EmptyRow)) $
            AddRow (AddCell (MkCell 1 2 "a") (AddCell (MkCell 2 1 "b") EmptyRow)) $
            EmptyTable 3

Edit: though might be easier to add cells without any proofs first, and wrap everything in the end, counting all the widths and heights there.
Edit 2 (2018-06-19): this doesn't account for exact placements of row-spanning cells, still allowing invalid tables; a check for that should be added too.

defanor on 23 Feb 2018

I'm working with tables now and i'm wondering whether Table could be a record, i think that it would make my life simpler. Any drawbacks? I don't see this in either of the proposals i collected above.

I'm not good at reading dependent types but also the last proposal from @defanor doesn't seem to use a record-like type.

I might be wrong, but using a record will also make it easier for us in the future to add properties without the need for extensive refactorings.

danse on 15 Jun 2018

I'm working with tables now and i'm wondering whether Table could be a
record, i think that it would make my life simpler.

At least with full dependent types (I used Idris above, by the way) it
can be done rather easily: a structure can be defined in whatever way is
handy, and then it can be wrapped into another one, which would verify
it (as ValidTable in the example above – though most of the
bookkeeping is done before it, and the check in it is trivial, but all
the counting can be moved there instead). But I doubt that it would be
as straightforward in Haskell.

defanor on 15 Jun 2018

Given the other requirements (a single way to represent a table,
representing valid tables only), I guess the minimum that is required is
type polymorphism, which would make manipulations awkward: there seems
to be 2^(w - 1) valid single-row table layouts of width w (or
2^(w - 1)^h valid w*h tables without rowspans, which make things
harder), and an ADT with such a number of inhabitants would require type
polymorphism to represent, I think. But those would be bad news at least
for parsing.

Edit: On the other hand, valid single-row table layouts of arbitrary width can be encoded simply with [Int], and w should be arbitrary in that 2^(w - 1) , turning it into 2^(1 - 1) + 2^(2 - 1) + 2^(3 - 1) + ...; apparently somehow it works out with infinite sums, and maybe the whole table layouts can be approached that way.

Edit 2: For rows alone it's also easy to come up with data Row a = Singular a | Stretch (Row a) | Cut a (Row a) for x = a + x + ax, which matches those series with a=1 (valid rows full of units matching just the valid row layouts), but for whole tables it may be not the eastiest way to come up with a suitable structure.

defanor on 17 Jun 2018

an user in pandoc-discuss mentioned the value of allowing multiple headers ... not sure whether it's already been mentioned above, maybe we want to collect the use cases for the new data model somewhere else in order to make the design easier

danse on 19 Jul 2018

Multiple headers has been discussed above.

By the way, some people might need a footer as well.

ickc on 19 Jul 2018

Am I understanding correctly that there are two distinct problems here?

A model to represent extended table features in the AST qua data structure.
How to implement such a model in Haskell.

bpj on 28 Nov 2018

@bpj, well, the AST data structure is implemented in Haskell, so it's kind of one problem.

mb21 on 29 Nov 2018

I think models (or just some HTML table properties) outside of Haskell
could give an idea on how to implement those in Haskell, but just any
model wouldn't necessarily be useful.

Speaking of which, I've tried yet another approach now: to define an
empty table (grid), and then fill it with cells (rectangles), tracking
vacant cells. It is pretty simple with dependent types (in Idris again),
and has the desired properties:

data Table : (width, height: Nat) -> (vacant: List (Nat, Nat)) -> Type where
Empty : {w, h : Nat}
-> {auto p1 : LTE 1 w}
-> {auto p2 : LTE 1 h}
-- a bit more code could be written to allow tables with no
-- cells
-> Table w h [(x,y) | x <- [0..w - 1], y <- [0..h - 1]]
Fill : (x, y, w, h: Nat)
-> {auto p1 : LTE 1 h}
-> {auto p2 : LTE 1 w}
-> Table w' h' l
-- ensure that the needed cells are still vacant
-> {auto p3 : True = all (flip Prelude.List.elem l)
[(x',y') | x' <- [x .. x + (w - 1)], y' <- [y .. y + (h - 1)]]}
-- the table must be filled in order, so that there's only one
-- way to represent it
-> {auto p4 : foldl Prelude.Interfaces.min (S x, S y) l = (x, y)}
-> Table w' h'
(filter (Prelude.Bool.not . flip Prelude.List.elem
[(x',y') | x' <- [x .. x + (w - 1)], y' <- [y .. y + (h - 1)]]) l)

Translating it into Haskell would still be challenging (if possible at
all), but perhaps worth trying, unless something better will come up.

defanor on 30 Nov 2018

It's possible to implement a correct-by-construction grid using diagonalization, with cells placed on top.

I've begun an implementation at https://gist.github.com/dbaynard/9736e1e7c78da94f13da3ea6ed45f96f — I'd be grateful for feedback (and contributions).

Briefly, it assumes a table is a grid (represented in diagonal form) with cells (of any size or shape) stored at the first point the diagonal traversal encounters them. The implementation uses a GADT to ensure that only correct tables can be constructed.

I haven't got to the bit where I add the cells, but the algorithm should be quite straightforward to apply.

This by itself imposes no constraints on cells — e.g. header cells in specific places. But it seems that may be desirable.

(I used ghc 8.4.3, no dependencies other than base)

A diagonal traversal of an array (1 → 20) looks as follows:

1  3  6 10 14
2  5  9 13 17
4  8 12 16 19
7 11 15 18 20

The GADT (see the full gist for the rest of the documentation and definition). When I talk about growing the table, I mean while descending the syntax tree. As constructors, these do the opposite, but I found it more helpful to think about them tearing down the data structure.

data T (n :: Nat) (extend :: Extending) a where
  -- | Grow the table height and width by 1, by cons-ing a new diagonal List
  -- of length one greater than the previous
  (:+:) :: List n a -> T (n + 1) extend a -> T n 'Diagonal a
  -- | Grow the table width by 1 at fixed height by cons-ing a new diagonal List
  -- to the right of the previous list
  (:-:) :: Or '[ 'Filling, 'Width] extend => List n a -> T n extend a -> T n 'Width a
  -- | Grow the table height by 1 at fixed width by cons-ing a new diagonal List
  -- below the previous list
  (:|:) :: Or '[ 'Filling, 'Height] extend => List n a -> T n extend a -> T n 'Height a
  -- | Fill the remaining table space by cons-ing a new diagonal list below
  -- and to the right of the previous list
  (:::) :: List n a -> T (n - 1) 'Filling a -> T n 'Filling a
  End :: T 0 'Filling a

dbaynard on 30 Nov 2018

@dbaynard - A representation as a list of rows would map much more easily onto the formats we are targeting. I guess I'd like to understand better what the advantage of the diagonal representation would be.

jgm on 2 Dec 2018

A representation as a list of rows would map much more easily onto the formats we are targeting.

Yes, I can see it would.

I guess I'd like to understand better what the advantage of the diagonal representation would be.

I would too — I need to investigate further whether it is useful. It may be able to guarantee that only valid tables are representable in the AST, yet all tables have the same type (meaning no need for dependent types/liquidhaskell/etc.). Also representations would be unique.

Perhaps even just proposing it may help us to find a solution that meets the criteria in https://github.com/jgm/pandoc/issues/1024#issuecomment-302919730 and subsequent comments, even if it isn't this one.

dbaynard on 3 Dec 2018

@dbaynard and me were pondering this at ZuriHac. Here's our thinking so far.

Basic requirements

col/row-spans
headers:
- multiple rows in a header
- secondary headers
- row headers (e.g. have the first column be headers of the rest of the rows)
footer?
column-widths, alignments (as existing)

Concerning the headers, I feel fairly confident that CellType = DataCell | HeaderCell (analogous to the HTML <td> and <th>) is the best solution. It even allows us to do a simple table footer: The HTML Writer could simply wrap the first n rows containing only HeaderCells in a <thead> and the last n rows containing only HeaderCells in a <tfoot>.

What tables should be representable in the AST?

Let's have only rectangular cells (no overlapping cells).

In the spirit of not letting the perfect stand in the way of the good, we were thinking that a nested list of cells (like we currently have) is the best and most pragmatic solution. It should also make implementing the writers easier.

David had the insight that there wouldn't have to be any possibilities of invalid tables in the AST, if there was a well-defined way how to interpret any AST. Similar to how the HTML spec in some cases tells browsers how to insert certain missing (implicit) elements.

Pandoc could still emit warnings on missing cells. Additionally, a writer can choose to pad out missing rows (with cells of rowspan=1, colspan=1) or not.

Interesting HTML cases

<table>
  <tr>
    <td>1</td>
  </tr>
  <tr>
    <td>1</td>
    <td>2</td>
  </tr>
</table>

HTML validator warning:

A table row was 2 columns wide and exceeded the column count established by the first row (1).

<table>
  <tr>
    <td>1</td>
    <td>2</td>
  </tr>
  <tr>
    <td>1</td>
    <td rowspan="2">2</td>
  </tr>
</table>

HTML validator error:

Table cell spans past the end of its row group established by a tbody element; clipped to the end of the row group.

<table>
  <tr>
    <td colspan="2">2</td>
  </tr>
</table>

HTML validator error:

Table column 2 established by element td has no cells beginning in it.

<table>
  <tr><td>1</td></tr>
  <tr></tr>
  <tr><td>1</td></tr>
</table>

HTML validator error:

Row [...] has no cells beginning on it.

<table>
<tr>
  <td>1</td>
  <td rowspan="2">2</td>
</tr>
<tr>
  <td colspan="2">3</td>
</tr>
</table>

HTML validator error:

Table cell is overlapped by table cell.

Some of those in a jsfiddle.

pandoc's options

So what should a pandoc writer do when it encounters the equivalent of the cases above as a pandoc AST? What should the writer do with overlapping or missing cells (in the implicit grid)?

error out
interpret the table in a different way than in HTML, like push the overlapping cell so far right and to the bottom until it fits, possibly making the table larger.
pass the problem along, potentially outputting invalid HTML (or worse, invalid LaTeX which would make PDF output fail)
drop or crop cells in a deterministic way and emit a warning:
- crop cells that would overlap with already filled space
- crop (or drop) cells that would outgrow the row count established by the first row in the table (this is stricter than HTML, where this case is only a warning and the table grows to the right)
- crop cells that would outgrow the row count established by the column with the smallest row count.
- drop rows without any cells

At this point, I'm favouring the last option. This seems consistent with the feedback we got at ZuriHac, where people were like: "don't overthink it, do whatever HTML does, make sure the reader doesn't produce an invalid table and do whatever is easiest in the writer if someone produces an invalid table with a filter."

ADT

Coming back to the AST, which might then look like this:

type Rowspan = Int
type Colspan = Int
type Caption = [Block]
type ShortCaption = [Inline]
type Colwidth = Maybe Double
data CellType = DataCell | HeaderCell
data Cell = Cell CellType Rowspan Colspan [Block]

Table Attr Caption ShortCaption [(Alignment, Colwidth)] [[Cell]]

There are still some things open:

Instead of Caption and ShortCaption, the caption could also be handled by wrapping the table in a Figure with a caption (#3177)
CellType could also include NoCell (this cell is occupied by a span of another cell)
Potentially adding Attr to Cell. If only to give filters an escape-hatch. And for complex use-cases that usually boil down to that you want to specify which <th> cell is a heading for what <td> cells. Or alternatively, polymorphic Cell a (I'll have to reread Trees That Grow). This might be useful to instantiate differently when writing markdown to save width, heights of a cell.

Concerning 2. and 3. we should probably implement at least the HTML and markdown writers to get a feeling for how the AST format would impact implementation. We might put a function that validates/cleans-up a table in Writers.Shared.

mb21 on 17 Jun 2019

🎉3

As to what to do with malformed input I would do the most predictable and
least controversial thing: error out.

Den mån 17 juni 2019 10:41Mauro Bieg notifications@github.com skrev:

@dbaynard https://github.com/dbaynard and me were pondering this at
ZuriHac. Here's our thinking so far.
Basic requirements

col/row-spans

headers:

multiple rows in a header

secondary headers

row headers (e.g. have the first column be headers of the rest of

the rows)

footer?

column-widths, alignments (as existing)

Concerning the headers, I feel fairly confident that CellType = DataCell
| HeaderCell (analogous to the HTML and ) is the best solution.
It even allows us to do a simple table footer: The HTML Writer could simply
wrap the first n rows containing only HeaderCells in a and the
last n rows containing only HeaderCells in a .
What tables should be representable in the AST?

Let's have only rectangular cells (no overlapping cells).

In the spirit of not letting the perfect stand in the way of the good, we
were thinking that a nested list of cells (like we currently have) is the
best and most pragmatic solution. It should also make implementing the
writers easier.

David had the insight that there wouldn't have to be any possibilities of
invalid tables in the AST, if there was a well-defined way how to interpret
any AST. Similar to how the HTML spec in some cases tells browsers how to
insert certain missing (implicit) elements.

Pandoc could still emit warnings on missing cells. Additionally, a writer
can choose to pad out missing rows (with cells of rowspan=1, colspan=1) or
not.
Interesting HTML cases

1

1 2

HTML validator warning:

A table row was 2 columns wide and exceeded the column count established
by the first row (1).

1 2

1 2

HTML validator error:

Table cell spans past the end of its row group established by a tbody
element; clipped to the end of the row group.

2

HTML validator error:

Table column 2 established by element td has no cells beginning in it.

1

1

HTML validator error:

Row [...] has no cells beginning on it.

1 2

3

HTML validator error:

Table cell is overlapped by table cell.

Some of those in a jsfiddle https://jsfiddle.net/grLwp0ed/.
pandoc's options

So what should a pandoc writer do when it encounters the equivalent of the
cases above as a pandoc AST? What should the writer do with overlapping or
missing cells (in the implicit grid)?

error out

interpret the table in a different way than in HTML, like push the
overlapping cell so far right and to the bottom until it fits, possibly
making the table larger.

pass the problem along, potentially outputting invalid HTML (or
worse, invalid LaTeX which would make PDF output fail)

drop or crop cells in a deterministic way and emit a warning:

crop cells that would overlap with already filled space

crop (or drop) cells that would outgrow the row count established

by the first row in the table (this is stricter than HTML, where this case

is only a warning and the table grows to the right)

crop cells that would outgrow the row count established by the

column with the smallest row count.

drop rows without any cells

At this point, I'm favouring the last option. This seems consistent with
the feedback we got at ZuriHac, where people were like: "don't overthink
it, do whatever HTML does, make sure the reader doesn't produce an invalid
table and do whatever is easiest in the writer if someone produces an
invalid table with a filter."
ADT

Coming back to the AST, which might then look like this:

type Rowspan = Inttype Colspan = Inttype Caption = [Block]type ShortCaption = [Inline]type Colwidth = Maybe Doubledata CellType = DataCell | HeaderCelldata Cell = Cell CellType Rowspan Colspan [Block]
Table Attr Caption ShortCaption [(Alignment, Colwidth)] [[Cell]]

There are still some things open:

Instead of Caption and ShortCaption, the caption could also be
handled by wrapping the table in a Figure with a caption (#3177
https://github.com/jgm/pandoc/issues/3177)

CellType could also include NoCell (this cell is occupied by a span
of another cell)

Potentially adding Attr to Cell. If only to give filters an
escape-hatch. And for complex use-cases
https://www.w3.org/WAI/tutorials/tables/multi-level/ that usually
boil down to that you want to specify which cell is a heading for
what cells. Or alternatively, polymorphic Cell a (I'll have to
reread Trees That Grow). This might be useful to instantiate differently
when writing markdown to save width, heights of a cell.

Concerning 2. and 3. we should probably implement at least the HTML and
markdown writers to get a feeling for how the AST format would impact
implementation. We might put a function that validates/cleans-up a table in
Writers.Shared.

—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
https://github.com/jgm/pandoc/issues/1024?email_source=notifications&email_token=AAI3OU76DCY5NMTMXL4D76TP25EYVA5CNFSM4AI26N5KYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGODX2OR2Q#issuecomment-502589674,
or mute the thread
https://github.com/notifications/unsubscribe-auth/AAI3OU3RG6S2JUMMQ7YOGF3P25EYVANCNFSM4AI26N5A
.

bpj on 18 Jun 2019

Great writeup @mb21 (and discussion) — thank you!

I agree that we can use a straightforward list of lists representation, without dependent types, and decide how to handle the edge cases. We created a short list of tables that this representation should not handle (e.g. 3 dimensional tables, cell splits; more below). Tables Concepts • Tables • WAI Web Accessibility Tutorials was quite helpful.

Or alternatively, polymorphic Cell a (I'll have to reread Trees That Grow). This might be useful to instantiate differently when writing markdown to save width, heights of a cell.

The principle here is: we can reduce code duplication by having the same data structure for tables in the AST and in writers, but writers need different information (e.g. the dimensions of the output structures in characters/pixels). The advantage of _Trees that grow_ is that there is no runtime cost.

Cell splits

dbaynard on 18 Jun 2019

👍1

I've not read this thread entirely but I'm currently using pandoc (2.7.2) with wkhtmltopdf and am finding that the same issue is occurring. I wondered if anyone can explain why this won't work when using wkhtmltopdf to generate the html -> pdf conversion?

Thanks very much!

welly on 25 Sep 2019

@welly, question like this might fit pandoc-discuss better. The issue tracker is for discussing the feature request. Currently the pandoc AST doesn't have a model for this so it isn't supported in pandoc yet.

ickc on 25 Sep 2019

Really cry for this feature.
Or any other tools to convert pandoc output of gfm to this format:

|  | 2 | 3 |
| --- | --- | --- |
| a | @cols=2: |
| b |  | test<br>ricky |
| c | @rows=2: | <br>Yes |
| e | No |

rickywu on 3 Jan 2020

I went looking for multi-span rows and columns in pandoc, and ended up here. My interest is not with the science community, but in the writing of government standards. We are currently experimenting with writing the standard texts in markdown and rst, and use pandoc to convert them to PDF (via docbook). But the lack of rowspan support in the tables make it impossible to represent the layout of the original docx. It would thus would be great if pandoc supported spans. Sorry for not having any code to contribute, but thought it would be useful to know about yet another use case.

petterreinholdtsen on 11 Feb 2020

👍7

I found this after a search about formatting / laying out tables also.
We're looking at CMS options and how to generate the following table (as an example) would be useful as we require some of our tables to be pivoted 90 degrees for simple stuff:

<table>
  <caption>Dates and amounts</caption>
  <thead>
    <tr>
      <th scope="col">Date</th>
      <th scope="col">Amount</th>
    </tr>
  </thead>
  <tbody>
    <tr>
      <th scope="row">First 6 weeks</th>
      <td>£109.80 per week</td>
    </tr>
    <tr>
      <th scope="row">Next 33 weeks</th>
      <td>£109.80 per week</td>
    </tr>
    <tr>
      <th scope="row">Total estimated pay</th>
      <td>£4,282.20</td>
    </tr>
  </tbody>
</table>

The markup is based on the UK Gov's Design System: https://design-system.service.gov.uk/components/table/ - but we have a use case for these types of tables.

glenpike on 13 Feb 2020

A strong use case for this are some 3GPP specs that contain bit fields, e.g. https://portal.3gpp.org/desktopmodules/Specifications/SpecificationDetails.aspx?specificationId=3111
Would love to be able to view them as markdown or org in my Emacs, but all of the bitfields are off ;(

ivan4th on 9 Mar 2020

For those who have been following this issue, we have a PR for new table types here:
https://github.com/jgm/pandoc-types/pull/66
I want to avoid excessive bikeshedding on this issue, but if you have final comments, now is the time. The type allows for column and row spans, short captions, attributes, multiple header rows, footers, intermediate headers, and overriding alignments at the cell level.

jgm on 3 Apr 2020

👍2

Is the Markdown syntax for these new features described in prose/with examples somewhere?

bpj on 3 Apr 2020

There's no markdown syntax. That's a separate issue, and it may be quite a while before that changes. You shouldn't even assume that we will provide a markdown syntax capable of representing all these distinctions. For now the focus is just on providing types capable of representing more complex tables, which can certainly be represented in other formats.

jgm on 3 Apr 2020

👍2

@jgm is there a timeframe for a binary release with these great improvements?? and is there a good way to contribute for input/output format handling?

lrosenthol on 30 Apr 2020

This can be closed now: we have these features in the AST.
We don't yet have support for them in readers and writers, though.
I've opened some issues for adding these to some of the most commonly used formats (you'll find them among the most recent issues).

jgm on 30 Apr 2020

🎉4 👍3 ❤2

Would documentations on this AST be available? And is there's any pointers on filter frameworks to pick this up (such as pandocfilter/panflute)?

Thanks @despresc for the effort of implementing it and @jgm, @mb21 for the code reviews. This is great progress!

ickc on 1 May 2020

👍2

There will be regular API docs once the new pandoc-types is released.

jgm on 1 May 2020

👀1

I've opened some issues for adding these to some of the most commonly used formats (you'll find them among the most recent issues).

#6344
#6317
#6316
#6315
#6314
#6313
#6312
#6311

reagle on 15 May 2020

👍3

Pandoc: Support for table column spans, table attributes in AST

Most helpful comment

All 107 comments

summary of the above

Basic requirements

What tables should be representable in the AST?

Interesting HTML cases

pandoc's options

ADT

Cell splits

Related issues