Encoded whitespaces are valid for windows in folder names. I first thought that was a windows bug. The %20 could be a variable in cmd, thats why I escaped it with a double %% (https://ss64.com/nt/syntax-esc.html).
Command:
pandoc -o ./output.html --template="D:\\example2\\my%%20templates\\template.html" ./docs/index.md
Output:
But now I can see at the output that the encoded whitespace gets decoded. Looks like pandoc handle it like a real whitespace, but its not.
> pandoc -o ./output.html --template="D:\\example2\\my%20templates\\template.html" ./docs/index.md
pandoc: D:\\example2\\my templates\\template.html: openBinaryFile: does not exist (No such file or directory)
/\
[HERE]
On closer inspection, this seems to happen on Unix as well. On MacOS:
$ touch "foo%20bar.html"
$ echo test | pandoc -s --template ./foo%20bar.html
Could not find data file templates/./foo%20bar.html
Probably something wrong in downloadOrRead...?
Yes, this should only happen for URLs, not regular filenames.
I tracked down the URI-unescaping for local file
reads. It was introduced here:
https://github.com/jgm/pandoc/commit/b80577b395cc2ea9c3b50910f7e0e77975112613
in response to
https://github.com/jgm/pandoc/issues/1427
Note that pandoc's markdown reader parses  as
[Para [Image ("",[],[]) [] ("My%20Image.png","")]]
with %20 for the space, so we really do need to
URI-unescape the paths even when reading local files.
I'm not sure what the solution is, exactly. Currently we're using common code for fetching templates and fetching images that get referred to in the document itself, and this is extremely convenient. This problem is also not going to be a common one. So a case could be made for doing nothing.
A workaround, if you need a literal %20 in your filename, is the URI-escape that as %2520; I guess for the Windows shell you'd need %%2520!
A more extreme alternative would be to keep the literal spaces in the image path in the AST, and not URI-escape until this is needed for HTML rendering. That would make a lot of sense, but perhaps there's a reason I've forgotten for doing it the other way -- and changing this might introduce bugs because of places in the code that expect the old behavior.
A more extreme alternative would be to keep the literal spaces in the image path in the AST, and not URI-escape until this is needed for HTML rendering.
I agree that this would be the cleanest approach. But it's indeed a change that could potentially break a lot of things... which we should have tests for. Also, how does commonmark handle this? I guess the spec doesn't say to URL-escape path names when parsing into the AST?
I think paths and urls are similar, but have basically different purposes and therefore special characters in paths are allowed which should be escaped in urls.
That's why I think they should be treated differently in general.
Currently commonmark doesn't even allow spaces in the
target. A recent (unreleased change) allows spaces
inside link destinations in pointy brackets only.
Mauro Bieg notifications@github.com writes:
A more extreme alternative would be to keep the literal spaces in the image path in the AST, and not URI-escape until this is needed for HTML rendering.
I agree that this would be the cleanest approach. But it's indeed a change that could potentially break a lot of things... which we should have tests for. Also, how does commonmark handle this? I guess the spec doesn't say to URL-escape path names when parsing into the AST?
--
You are receiving this because you commented.
Reply to this email directly or view it on GitHub:
https://github.com/jgm/pandoc/issues/4863#issuecomment-419350075
Jason Schilling notifications@github.com writes:
I think local paths and urls are similar, but have basically different purposes and therefore special characters in paths are allowed which should be escaped in urls.
That's why I think they should be treated differently in general.
One problem is that it's not generally clear from the source
when you have a URL and when it's a local path. This
could be either a relative URL or a relative path, for example:

One problem is that it's not generally clear from the source when you have a URL and when it's a local path.
Indeed, this bites us here again. While this works:
~ echo 'hi ' | pandoc -o foo.pdf
[WARNING] Could not fetch resource 'this/foo.png': replacing image with description
this fails (note the slash at the beginning of the path):
~ echo 'hi ' | pandoc -o foo.pdf
pandoc: /this/foo.png: openBinaryFile: does not exist (No such file or directory)
@mb21 I think this is a somewhat different issue; it has to do with the "root" location rather than different rules for escaping in paths and URLs. Note that if you specify a URL on the command line, pandoc will set the "root location" internally and use it when trying to fetch images starting with /. That all works very smoothly, but we might consider providing a way to set the root location manually, as suggested in #4894.