Pandoc: Comments in docx writer

Created on 23 Jun 2016  Â·  12Comments  Â·  Source: jgm/pandoc

We added track-changes comments to the docx reader with 8bb739f7ff353722981fe442ae0c137910604850. (See discussion in #2884). It would be nice to add them to the writer, just as we have with insertions and deletions.

Note that since commented-upon sections can extend across blocks, we have two spans:

.html <span class="comment-start" id="3" author="XYZ" date="DATE">comment</span>This is the text that is commented on.<span class="comment-end" id=3"></span>

Not the prettiest markup ever, but it seems to be best at capturing and the meaning of the source.

enhancement Docx writer

Most helpful comment

+++ Ian [Aug 13 17 01:51 ]:

Hm, so if I follow the formatting, I should be able to make a filter to
convert something like this bracketed span [here is some text]{.comment
comment="blah blah"} into that the DOCX writer can understand?

Yes, your filter must generate start/end spans with the
right attributes and unique ids, like this:

[Para [Str "I",Space,Str "want",Space,Span ("0",["comment-start"],[("author","Jesse Rosenthal"),("date","2016-05-09T16:13:00Z")]) [Str "I",Space,Str "left",Space,Str "a",Space,Str "comment."],Str "some",Space,Str "text",Space,Str "to",Space,Str "have",Space,Str "a",SoftBreak,Str "comment",Space,Span ("0",["comment-end"],[]) [],Str "on",Space,Str "it."]]

All 12 comments

Can you give the OOXML that should be emitted for the sample above?

It might be easier to look in tests/docx/comments.{docx,native}. The native of the first paragraph there maps to the following markdown:

~html
I want date="2016-05-09T16:13:00Z">I left a comment.some text to have a
comment on it.
~

The corresponding OOXML, with indentation, is in two files:

document/word.xml:

~xml
w:rsidR="0008235A" w:rsidRDefault="0008235A">


I want





some text to have a comment










on it.


~

and document/comments.xml:

~xml
w:date="2016-05-09T16:13:00Z" w:initials="jkr">
w:rsidR="0008235A" w:rsidRDefault="0008235A">










I left a comment.



~

Is there any progress on this issue?

This looks straightforward enough, and it would be handy to retain these things on round-trip.
I've added this to the pandoc 2.0 milestone.

Hm, so if I follow the formatting, I should be able to make a filter to convert something like this bracketed span [here is some text]{.comment comment="blah blah"} into that the DOCX writer can understand?

+++ Ian [Aug 13 17 01:51 ]:

Hm, so if I follow the formatting, I should be able to make a filter to
convert something like this bracketed span [here is some text]{.comment
comment="blah blah"} into that the DOCX writer can understand?

Yes, your filter must generate start/end spans with the
right attributes and unique ids, like this:

[Para [Str "I",Space,Str "want",Space,Span ("0",["comment-start"],[("author","Jesse Rosenthal"),("date","2016-05-09T16:13:00Z")]) [Str "I",Space,Str "left",Space,Str "a",Space,Str "comment."],Str "some",Space,Str "text",Space,Str "to",Space,Str "have",Space,Str "a",SoftBreak,Str "comment",Space,Span ("0",["comment-end"],[]) [],Str "on",Space,Str "it."]]

@iandol did you end up coming up with a Markdown to Word Comment filter? I did a quick search through your (very interesting repo) but couldn't find it. I'd be interested in it if you've created one. Thanks!

@kschach What is your intend? Pandoc almost fully supports track changes including comments during conversion from and to docx. And there is also a lua filter which enhances it to LaTeX/PDF and HTML and can also filter it out for other output formats.

Thanks for the reply! Is there a Pandoc Markdown syntax for adding Word comments? My goal is to write thesis drafts in Markdown, include comments for my advisor to read in Word, but not have to add HTML tags if possible. So something more plain-text friendly ala what iandol posted -- [here is some text]{.comment comment="blah blah"} – would be helpful if possible. Thanks!

@kschach — I didn't write a filter as I use Scrivener for all my writing and it allows me to transform its native RTF comments into the markup automatically on Compile so there was no need for me to write a Pandoc filter in the end, sorry.

The problem is that pandoc does not have a spec or AST support for comments. See #2873 for related discussion. You may want to try the pandoc preprocessor pancritic.

Nested comments are still problematic but not with the current syntax which the docx reader produce. I thought simplifying my lua filter to automatically add the author, ID and date attributes if not provided in the markdown. But for the date attribute it makes not much sence, except you don't want to track the time of comment/modification or do a roundtrip conversion from docx back to your md.

@kschach I have something. It's a haskell filter — are you set up for that?

I'll have to separate it from the rest of the module. Here's the outline, though it's missing imports (PB is Text.Pandoc.Builder). I'd need a short while to filter out the superfluous stuff.

main :: IO ()
main = toJSONFilter . inlineFilter

inlineFilter :: Inline -> Inline
inlineFilter = (0 &) . evalState . docxComment

pattern Comment on c = Span ("", ["comment"], [("comment", c)]) on
pattern Todo on t = Span ("", ["todo"], [("todo", t)]) on
pattern TodoEx on t = Span ("", ["todo", "experiment"], [("todo", t)]) on

-- TODO add date and author
pattern DocxCommentBegin i c = Span (i, ["comment-start"], []) c
pattern DocxCommentEnd i = Span (i, ["comment-start"], []) []

type CommentCount = "Comment" `Tagged` Int

docxComment
    :: MonadState CommentCount m
    => Inline
    -> m Inline
docxComment (Comment on c) = docxComment' on c
docxComment x = pure x

docxComment'
    :: MonadState CommentCount m
    => [Inline]
    -> String
    -> m Inline
docxComment' on c = state $ \i -> (,i+1) . Span PB.nullAttr . mconcat $
    [ [DocxCommentBegin (show i) . PB.toList . PB.str $ c]
    , on
    , [DocxCommentEnd (show i)]
    ]
Was this page helpful?
0 / 5 - 0 ratings

Related issues

acate picture acate  Â·  3Comments

XiaTeng1993 picture XiaTeng1993  Â·  3Comments

transientsolutions picture transientsolutions  Â·  3Comments

ocehugo picture ocehugo  Â·  3Comments

chrissound picture chrissound  Â·  4Comments