The new docx reader converts docx track changes into HTML. I suggest the following changes:
<ins> and <del> instead of <span class="insertion"> and `", as they are standard HTML tags and will render in most browsers.<span> including author and date for combined insertions/deletions, rather use author and date attributes on both the <ins> and <del> tag. This is probably desirable but technically difficult to get right because
there isn't a specific place in the AST for insertions and deletions.
On Sun, Aug 24, 2014 at 2:17 PM, Martin Fenner [email protected]
wrote:
The new docx reader converts docx track changes into HTML. I suggest the
following changes:
- use and
instead of and `", as
they are standard HTML tags and will render in most browsers.- don't use a wrapper including author and date for combined
insertions/deletions, rather use author and date attributes on both the
andtag.—
Reply to this email directly or view it on GitHub
https://github.com/jgm/pandoc/issues/1560.
We have the Spans in the AST. I think it would just be a matter of changing
the HTML writer so that it renders these spans with <del> and <ins>
rather than the spans. Probably the author and date attributes should
be included on whatever tags are used.
+++ mpickering [Aug 24 14 06:39 ]:
This is probably desirable but technically difficult to get right
because
there isn't a specific place in the AST for insertions and deletions.
On Sun, Aug 24, 2014 at 2:17 PM, Martin Fenner
[email protected]
wrote:The new docx reader converts docx track changes into HTML. I suggest
the
following changes:
- use and
instead of and `", as
they are standard HTML tags and will render in most browsers.- don't use a wrapper including author and date for combined
insertions/deletions, rather use author and date attributes on both
the
andtag.—
Reply to this email directly or view it on GitHub
https://github.com/jgm/pandoc/issues/1560.—
Reply to this email directly or [1]view it on GitHub.References
I get a bit worried about the sustainability of this approach in general especially as it causes unexpected results in a user has "insertion" class but also because you lose type guarantees and it makes code more difficult to understand in someone is unfamiliar.
It does add some complexity. @mfenner, why not just add some
CSS rules for these span elements, so they're displayed as
you like?
+++ mpickering [Aug 24 14 08:40 ]:
I get a bit worried about the sustainability of this approach in
general especially as it causes unexpected results in a user has
"insertion" class but also because you lose type guarantees and it
makes code more difficult to understand in someone is unfamiliar.—
Reply to this email directly or [1]view it on GitHub.References
One way to do this without an AST change would be to special-case these spans in the HTML writer.
One way to do this without an AST change would be to special-case these spans in the HTML writer.
I think a dedicated element would work better in the long run.. considering for example how the image/figure hack turned out, I think it's better to include more AST elements.
I've broadened the scope of this issue. If we decide to handle insertions/deletions better across pandoc, there are a few formats to consider.
In HTML and markdown, pandoc 2.4 currently does:
$ echo '<ins>foo</ins>' | pandoc -f html -t native
[Plain [Span ("",["underline"],[]) [Str "foo"]]]
$ echo '<del>foo</del>' | pandoc -f html -t native
[Plain [Strikeout [Str "foo"]]]
$ echo '~~foo~~' | pandoc -f markdown -t native
[Para [Strikeout [Str "foo"]]]
Looking at CriticMarkup, from this closed issue:
| critic markup | HTML | LaTeX |
| --- | --- | --- |
|{--[text]--}|<del>[text]</del>|\st{[text]}|
|{++[text]++}|<ins>[text]</ins>|\underline{[text]}|
|{~~[text1]~>[text2]~~}|<del>[text1]</del><ins>[text2]</ins>|\st{[text1]}\underline{[text2]}|
|{==[text]==}|<mark>[text]</mark>|\hl{[text]}|
|{>>[text]<<}|<aside>[text]</aside>|\marginpar{[text]}|
See also the LaTeX changes package.
And of course the existing docx --track-changes option. (In the code, look for AcceptChanges in the Docx reader.)
If we go with an AST change, it might seem like we simply could:
Strikeout element with a Deleted elementInserted elementHowever, Strikeout is an Inline element, so it's not clear how to handle changes that span multiple blocks. From pandoc-discuss:
It's not that easy. What kind of native output would this produce?
- My first list item. - My second {-- list item. - My third --} combined list item.Presumably a list with three items, the second and third of which
contain these special StartDelete and StopDelete markers.
But that doesn't tell you the structure you want after applying
the edits -- which is a list with two items. Special logic
for doing this sort of transformation would need to be included
somewhere. And there are much more complex cases I could come up
with.
It seems the fundamental question is whether this should be modelled in the pandoc AST like the HTML <ins> and <del> elements (which are block-level elements, but nonetheless force you to serialize the changes to a tree), or whether it should be more like a plain-text diff kind of thing, where we have starting and ending markers that can cross nodes in the pandoc AST (that's more what CiriticMarkup does). But if it's the second, you can just use an external preprocessor to diff your changes and handle the markers (like pancritic), and pandoc wouldn't have to have anything to do with it.
So, input welcome on how different formats handle this. Especially, I'm unclear on how docx --track-changes and LaTeX handle:
It's not that easy. What kind of native output would this produce?
- My first list item. - My second {-- list item. - My third --} combined list item.
As mentioned in that thread, the CriticMarkup spec actually recommends against these kinds of AST-breaking changes:
Avoid Newlines in CriticMarkup
Wrap Markdown Tags Completely
While it may support incomplete Markdown tags in the future, the CriticMarkup processor currently chokes on them. Avoid this:
I really love *italic {~~fonts*~>font-styles*~~}.Instead, wrap the asterisks completely:
I really love {~~*italic fonts*~>*italic font-styles*~~}.
The one exception to this seems to be for insertion/deletion of paragraph breaks, I'm not sure if there's a clean way to handle that case.
Don’t read too much into the current CriticMarkup “spec” though. It is not a proper parser but just a bunch of regex. (FYI I’m maintaining a fork of the reference implementation of CriticMarkup in the form of pancritic which provides Python 3 support among other things.)
Multiline (ie Newlines crossing) shouldn’t be a big problem but “boundary crossing” between markdown markups and CriticMarkup is.
If it is decided to support CriticMarkup (as a syntax, not its spec) in pandoc then we got to eventually decided on how it should behaves (ie a spec) which will most likely different from current one.
Is it much work to spec out these issues? CriticMarkup is becoming a standard among Markdown editors, but it is difficult to pass it through standard pandoc.
Most helpful comment
I've broadened the scope of this issue. If we decide to handle insertions/deletions better across pandoc, there are a few formats to consider.
In HTML and markdown, pandoc 2.4 currently does:
Looking at CriticMarkup, from this closed issue:
See also the LaTeX changes package.
And of course the existing docx
--track-changesoption. (In the code, look forAcceptChangesin the Docx reader.)If we go with an AST change, it might seem like we simply could:
Strikeoutelement with aDeletedelementInsertedelementHowever,
Strikeoutis anInlineelement, so it's not clear how to handle changes that span multiple blocks. From pandoc-discuss:It seems the fundamental question is whether this should be modelled in the pandoc AST like the HTML
<ins>and<del>elements (which are block-level elements, but nonetheless force you to serialize the changes to a tree), or whether it should be more like a plain-text diff kind of thing, where we have starting and ending markers that can cross nodes in the pandoc AST (that's more what CiriticMarkup does). But if it's the second, you can just use an external preprocessor to diff your changes and handle the markers (like pancritic), and pandoc wouldn't have to have anything to do with it.So, input welcome on how different formats handle this. Especially, I'm unclear on how docx --track-changes and LaTeX handle: