Pandoc: vimwiki reader appends .html to internal links

Created on 1 Apr 2019  Â·  30Comments  Â·  Source: jgm/pandoc

I'm trying to switch my wiki (many dozens of pages) from vimwiki to gfm. When trying pandoc, I noticed two potential issues. Here is a short test file demonstrating these points (test.txt).

Testing the issue
* [[mythings|My Things]]
* [[https://www.airbnb.com/reservation/itinerary?code=BYW9D5|AirBNB Apartment]]
* [[http://www.macraigor.com/hwproducts.htm|Hardware Products]]

Converting this using pandoc -f vimwiki -t gfm test.txt > test.md.txt results in
test.md.txt:

Testing the issue

  - [My Things](mythings.html)
  - [AirBNB
    Apartment](https://www.airbnb.com/reservation/itinerary?code=BYW9D5)
  - [Hardware Products](http://www.macraigor.com/hwproducts.htm)

Links to other wiki pages are converted to have an html extension. But I would have expected, perhaps erroneously, that it they would be converted to a markdown extension. And more often than not, but not always, unexpected line breaks happen in the tags for html links (see the last two lines of the test). This newline isn't valid markdown format and breaks the expected behavior.

I would have expected the output to be something like this:

Testing the issue
* [My Things](mythings.md)
* [AirBNB Apartment](https://www.airbnb.com/reservation/itinerary?code=BYW9D5)
* [Hardware Products](http://www.macraigor.com/hwproducts.htm)

My pandoc version is

pandoc 2.6
Compiled with pandoc-types 1.17.5.4, texmath 0.11.2, skylighting 0.7.5

running on OS X Mohave 10.14.4.

Vimwiki reader

Most helpful comment

Finally got back to wrapping up my vimwiki conversion. I just wanted make these remarks, in case anyone does this. Converting the below simple vimwiki link now results in the following by default, after the update:

echo '[[mythings|My Things]]' | pandoc -f vimwiki -t gfm
[My Things](mythings "wikilink")

Applying the lua filter from the release notes results in:

echo '[[mythings|My Things]]' | pandoc -f vimwiki -t gfm --lua-filter=fixlinks.lua
[My Things](mythings.html "wikilink")

For converting vimwiki to markdown format, I edited the lua script:

function Link(el)
    if el.title == 'wikilink' then
      -- el.target = el.target .. ".html"
      el.title = ''
    end
    return el
end

and the results are as follows:

echo '[[mythings|My Things]]' | pandoc -f vimwiki -t gfm --lua-filter=fixlinks2.lua
[My Things](mythings)

It now works perfectly for my application. Thanks for making this fix.

All 30 comments

Generally, pandoc was not designed as a markdown formatter/prettier, there are a few options like --wrap=none, see http://pandoc.org/MANUAL.html#general-writer-options and http://pandoc.org/MANUAL.html#markdown-variants, but I don't think you can change the list marker without post-processing.

This newline isn't valid markdown format and breaks the expected behavior.

Yes it is valid commonmark, which gfm is a superset of.

For future questions, please use the pandoc-discuss mailing list instead of this issue tracker. Thanks!

Noted. I wasn't expecting pretty, and I don't care about the list marker. Sorry to add to the confusion there. Looks like I'm wrong, having a line break in the middle of a tag is valid GFM, although it doesn't work with vimwiki. That's an issue with vimwiki, not pandoc.

My biggest issue is internal links won't work, because they point to non-existent html files, not markdown files. I thought that was indeed a bug, and that's why I posted here. I'll move this to the discussion list.

Sure, for the links you'll probably want to write a small pandoc lua filter that rewrites the links to whatever you need them to be... there's no way for pandoc to know that.

And more often than not, but not always, unexpected line breaks happen in the tags for html links (see the last two lines of the test). This newline isn't valid markdown format and breaks the expected behavior.

I guess you're talking about

That's perfectly valid Markdown. However, if you
don't like this kind of wrapped text, you can use --wrap=preserve.

Sure, for the links you'll probably want to write a small pandoc lua filter that rewrites the links to whatever you need them to be... there's no way for pandoc to know that.

I can understand that there's no way for pandoc proper to know. But the vimwiki reader module should know, as this is standard vimwiki format for local wiki-links. At first blush, I think it would be better if the the reader's logic chose to do nothing to the link, rather than append .html to the link name. But there may be other use cases I'm not considering.

Thanks, guys, for pointing out the --wrap option, it indeed solves that issue.

But the vimwiki reader module should know, as this is standard vimwiki format for local wiki-links. At first blush, I think it would be better if the the reader's logic chose to do nothing to the link, rather than append .html to the link name. But there may be other use cases I'm not considering.

Ah yes, I didn't see that. Makes sense, and would be consistent with what we have in other wiki readers:

~ echo '[[foo]]' | pandoc -f vimwiki                      
<p><a href="foo.html">foo</a></p>

~ echo '[[foo]]' | pandoc -f mediawiki          
<p><a href="foo" title="wikilink">foo</a></p>

~ echo '((foo))' | pandoc -f tikiwiki
<p><a href="foo">foo</a></p>

We would basically have to drop the last line in Vimwiki.hs#procLink... currently it works as you say only if the link ends in a slash, otherwise it appends .html. Maybe @ycpei, who wrote the vimwiki reader, can comment?

@mb21 It's been a while since I wrote the vimwiki reader, but if I remember correctly, I probably wrote it this way to reproduce the behaviour of Vimwiki the vim plugin (including its built-in vimwiki-> html converter) for consistency.

For example, if we have a test.wiki with the following

[[hello]]

and we have a corresponding hello.html in the wiki_html sub dir of the vimwiki wiki folder (which is the path of the vim plugin output after converting a vimwiki hello.wiki file),

pandoc -f vimwiki -t html wiki/test.wiki -o wiki_html/test.html should produce (as the built-in converter does)

<p><a href="hello.html">hello</a></p>

rather than

<p><a href="hello">hello</a></p>

also because the latter will result in a dead link.

Am I missing anything here?

Yes, I see your reasoning. In my case, I am trying to switch my entire vimwiki itself into markdown, and was trying to use pandoc + vimwiki-reader in order to change markdown styles. Just a one-time conversion. Once I have my vimwiki in markdown format, making html will be easy with pandoc.

@thestumbler I see. How do you convert markdown files with internal links like <p><a href="foo">foo</a></p> to html files where the internal links work? Without the .html in href they would be dead links no?

In vimwiki an internal link by default is [[foo]] which points to foo.xxx, where xxx the default wiki markdown extension is set by the user. I have mine set to .txt, other people use .md, .wiki, .markdown, etc.) When you execute Vimwiki2HTML or VimwikiAlltHTML, the vimwiki program generates html output using its default conversion algorithm. This conversion puts the .html extension in the link as would be expected, although I don't know the detailed logic.

I just did a test to double-check, and the markdown [[radio|Radio Stations]] converts to <a href="radio.html">Radio Stations</a>

The mediawiki reader adds a title wikilink to wiki links, see also #1983.

If we make vimwiki reader do the same thing, then the question becomes:

How does one transform links with the title wikilink by appending .html when using the html writer?

I have limited knowledge of pandoc usage, so I had a look at the pandoc manual, but I could not find anything relevant.

If there is a simple answer to the question, then perhaps we can modify the vimwiki reader code so that it adds a title wikilink to wiki links instead of appending .html, and add a line in the documentation explaining how to replicate the behaviour of vim plugin converter (not sure if it is worth it though), otherwise I suggest that we leave the code as it is.

Hmmmm, I had not considered that the vimwiki-reader pandoc module was primarily intended for html conversion, since that was natively provided by the vimwiki program. I assumed it's main role was to convert to different markdown formats. Once you're in a more standard markdown format, getting html is straightforward with existing converters. I guess it's not an easy fix.

How does one transform links with the title wikilink by appending .html when using the html writer?

I suppose, you're expected to write a simple lua filter that changes the URL if there is a wikilink class...

Adding the wikilink title seems a good idea; that makes it possible to use filters.

@jgm Sure, I'm personally fine with such a change if it is implemented, but how much will this impact the users? At least can we get a sample filter in the documentation in case anyone wants to convert from vimwiki to html?

The impact on users would be minimal, since link title isn't used for much.

@jgm Sure most people wouldn't care about link titles, but when I talked
about impact I was referring to the scenario when you convert say a group
of wiki files with wiki links pointing to each other to html. After the
proposed change to the vimwiki reader, the wiki links will become broken
links in html by default. Before the change they were not broken links. Also
see my comment above. Now I don't know how many users would use pandoc to
convert vimwiki to html (I don't), but on the safe side perhaps we should
include some instructions on how to get htmls without broken links no?

On Wed, 3 Apr 2019, 00:06 John MacFarlane, notifications@github.com wrote:

The impact on users would be minimal, since link title isn't used for much.

—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
https://github.com/jgm/pandoc/issues/5414#issuecomment-479225769, or mute
the thread
https://github.com/notifications/unsubscribe-auth/AJAdlPARnVuPdY13n3oPapa7EmPdedcjks5vc9R9gaJpZM4cUqtj
.

Yuchen Pei notifications@github.com writes:

@jgm Sure most people wouldn't care about link titles, but when I talked
about impact I was referring to the scenario when you convert say a group
of wiki files with wiki links pointing to each other to html. After the
proposed change to the vimwiki reader, the wiki links will become broken
links in html by default. Before the change they were broken links. Also
see my comment above. Now I don't know how many users would use pandoc to
convert vimwiki to html (I don't), but on the safe side perhaps we should
include some instructions on how to get htmls without broken links no?

Not sure I understand. I was not proposing any changes
to the link's target (URL). The change I'm proposing
is simply to add a link title 'wikilink'. That would
allow filters to be used to adjust URLs so that links
are preserved. (How this is to be done may differ from
one wiki setup to another.)

Yuchen Pei notifications@github.com writes:
@jgm Sure most people wouldn't care about link titles, but when I talked about impact I was referring to the scenario when you convert say a group of wiki files with wiki links pointing to each other to html. After the proposed change to the vimwiki reader, the wiki links will become broken links in html by default. Before the change they were broken links. Also see my comment above. Now I don't know how many users would use pandoc to convert vimwiki to html (I don't), but on the safe side perhaps we should include some instructions on how to get htmls without broken links no?
Not sure I understand. I was not proposing any changes to the link's target (URL). The change I'm proposing is simply to add a link title 'wikilink'. That would allow filters to be used to adjust URLs so that links are preserved. (How this is to be done may differ from one wiki setup to another.)

I see - thanks for clarifying. I was referring to the change of both adding a link title wikilink and removing the trailing .html in the link target. I'm OK with the change of just adding the link title without modifying the link target, since it is almost purely cosmetic.

Good, agreed. @ycpei do you want to create a PR for this or should I just go ahead and change it?

removing the trailing .html in the link target.

that would, however, make it more consistent with the other wiki readers, as mentioned in my comment above... but yes, it's a trade-off I suppose.

@mb21 sorry, I missed this point before.
I agree with you that it would be good to be consistent with the way the other wiki formats are treated.
Adding the .html suffix makes sense if you're converting to HTML, but not if you're converting to other formats (e.g. a different wiki format).
So, I revise my recommendation: I think we should not add the .html suffix (to be consistent with other wiki readers), and we should add the wikilink title.

@jgm The change is not about adding .html, but to remove it. Doing so will
result in broken links - see my comments above.

On Thu, 4 Apr 2019, 19:56 John MacFarlane, notifications@github.com wrote:

@mb21 https://github.com/mb21 sorry, I missed this point before.
I agree with you that it would be good to be consistent with the way the
other wiki formats are treated.
Adding the .html suffix makes sense if you're converting to HTML, but not
if you're converting to other formats (e.g. a different wiki format).
So, I revise my recommendation: I think we should not add the .html
suffix (to be consistent with other wiki readers), and we should add
the wikilink title.

—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
https://github.com/jgm/pandoc/issues/5414#issuecomment-480000128, or mute
the thread
https://github.com/notifications/unsubscribe-auth/AJAdlBxdMui4lBjQ6St3dH40ZkrQo1Zwks5vdjzEgaJpZM4cUqtj
.

If we are to remove the .html, can we at least add a sample filter in the documentation in case anyone wants to convert from vimwiki to html using pandoc?

@jgm The change is not about adding .html, but to remove it. Doing so will
result in broken links - see my comments above.

Looking at the original issue:

* [[mythings|My Things]]

becomes

  - [My Things](mythings.html)

So, the vimwiki reader is evidently adding an .html extension, right?
The suggestion is: don't add this .html extension, just use mythings as the URL, and add wikilink as link title. This would make the vimwiki reader behave like the other wikiformat readers.
Note the difference in current behavior:

% pandoc -f mediawiki
[[mythings|My Things]]
^D
<p><a href="mythings" title="wikilink">My Things</a></p>
% pandoc -f vimwiki
[[mythings|My Things]]
^D
<p><a href="mythings.html">My Things</a></p>

If you need the html extension for your application, it can easily be added using a lua filter that matches on links with wikititle. You'll probably want to do this for output to HTML. But for many other output formats, e.g. markdown or mediawiki, you may want another extension or no extension.
For example, for output to mediawiki, you want no extension, but currently you get the .html:

% pandoc -f vimwiki -t mediawiki
[[mythings|My Things]]
^D
[[mythings.html|My Things]]

@jgm I see - that's a good point. I didn't think about conversion from
vimwiki to other wiki formats. Thanks for the explanation.

On Fri, 5 Apr 2019 at 06:24, John MacFarlane notifications@github.com
wrote:

@jgm https://github.com/jgm The change is not about adding .html, but
to remove it. Doing so will
result in broken links - see my comments above.

I'm confused. Looking at the original issue:

  • [[mythings|My Things]]

becomes

So, the vimwiki reader is evidently adding an .html extension, right?
The suggestion is: don't add this .html extension, just use mythings as
the URL, and add wikilink as link title. This would make the vimwiki
reader behave like the other wikiformat readers.
Note the difference:

% pandoc -f mediawiki
[[mythings|My Things]]
^D

My Things

% pandoc -f vimwiki
[[mythings|My Things]]
^D

My Things

If you need the html extension for your application, it can easily be
added using a lua filter that matches on links with wikititle. You'll
probably want to do this for output to HTML. But for many other output
formats, e.g. markdown or mediawiki, you may want another extension or no
extension.
For example, for output to mediawiki, you want no extension, but currently
you get the .html:

% pandoc -f vimwiki -t mediawiki
[[mythings|My Things]]
^D
[[mythings.html|My Things]]

—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
https://github.com/jgm/pandoc/issues/5414#issuecomment-480143836, or mute
the thread
https://github.com/notifications/unsubscribe-auth/AJAdlDkO10RoE65J14jCrYgO-IhubdWlks5vdtAAgaJpZM4cUqtj
.

Updated to pandoc-2.7.2. New behavior promptly broke all my vimwikis (I use default vimwiki syntax). Lua filter provided in the release notes on github solved the problem by adding .html extension to wikilinks.

__Please, add this lua filter to the documentation. Currently this info is hardly discoverable.__

The changelog is exactly where you should look when you're wondering what has changed from one version of pandoc to the next. This change was also mentioned prominently in the release announcement.

@jgm Thank you for making the change and including the instructions in the
commit message.

On Fri, 5 Apr 2019 at 19:09, John MacFarlane notifications@github.com
wrote:

Closed #5414 https://github.com/jgm/pandoc/issues/5414 via 4f572dd
https://github.com/jgm/pandoc/commit/4f572ddf6941b8fb0ad7a5835216c708998444f0
.

—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
https://github.com/jgm/pandoc/issues/5414#event-2256413632, or mute the
thread
https://github.com/notifications/unsubscribe-auth/AJAdlPN-s-qrouQCLhgRKZEY9yt4qnGSks5vd4M3gaJpZM4cUqtj
.

Finally got back to wrapping up my vimwiki conversion. I just wanted make these remarks, in case anyone does this. Converting the below simple vimwiki link now results in the following by default, after the update:

echo '[[mythings|My Things]]' | pandoc -f vimwiki -t gfm
[My Things](mythings "wikilink")

Applying the lua filter from the release notes results in:

echo '[[mythings|My Things]]' | pandoc -f vimwiki -t gfm --lua-filter=fixlinks.lua
[My Things](mythings.html "wikilink")

For converting vimwiki to markdown format, I edited the lua script:

function Link(el)
    if el.title == 'wikilink' then
      -- el.target = el.target .. ".html"
      el.title = ''
    end
    return el
end

and the results are as follows:

echo '[[mythings|My Things]]' | pandoc -f vimwiki -t gfm --lua-filter=fixlinks2.lua
[My Things](mythings)

It now works perfectly for my application. Thanks for making this fix.

Was this page helpful?
0 / 5 - 0 ratings