The included document just contains a title. This is ignored by
pandoc, so converting the document results in empty output.
$ pandoc --version
pandoc 2.1.4
$ pandoc -t native test.docx
[]
The document contains some text styled as Title like this:
<w:body>
<w:p>
<w:pPr>
<w:pStyle w:val="Title"/>
<w:rPr/>
</w:pPr>
<w:r>
<w:rPr/>
<w:t>Accessibilità delle immagini</w:t>
</w:r>
</w:p>
<w:p>
...
A design problem is to decide what to turn the title into, as the
hierarchy of sections is already used to map the DOCX heading
styles. I just see two alternatives, neither of which is very
appealing:
Shouldn't it be converted to "title" metadata?
As if it were in the yaml section of a markdown document?
it makes sense to derive a title metadata out of this element, although that does not imply that we want to drop the data itself
in fact i think that currently pandoc turns the title into metadata, since Title is listed among the metaStyles
after reading something about metadata i think that the DOCX reader is working as expected. still, we would like any format conversion to preserve the visible content of the document as much as possible, i think that we all agree on this.
i think that the possible improvement here is in the readers, and specifically, the RST reader my team is currently using. i will rename the issue in order to reflect this
after renaming i can't help but thinking that the topic is a bit controversial. in the case of a DOCX document, styled paragraphs are central in the visible layout of a document. On the other hand, in the case of pandoc's markdown, metadata look more like something we want to hide from the view of a reader.
If we want to stick to the idea of metadata as visible information, then we could change the DOCX reader by adding styled paragraphs to both the metadata and the data.
Differently, if we want every writer to decide about metadata rendering, i would change the RST writer in order to show something to the user at least in the cases of a title and subtitle meta
Sounds like you're looking for http://docutils.sourceforge.net/docs/ref/rst/directives.html#metadata-document-title
Basically, the responsibility of rendering metadata lies in the template, not the writer itself. e.g:
$ pandoc -D html | grep '$title'
<title>$if(title-prefix)$$title-prefix$ – $endif$$pagetitle$</title>
<h1 class="title">$title$</h1>
thanks @mb21, referring to the meta title directive and the use of templates is really to the point. so for our use case, using the default template fixes the problem perfectly. we could say that your comment closes the issue.
i'd say that the behaviour is anyway a bit confusing: some very visible content in the input document is not visible at all in the output document, unless an option is used. personally, i would expect the title to be there as a first level heading by default, and i would be happy to propose a pull request changing this. it's a matter of design and usability, i guess. i thought that templates were a way to structure what would anyway be written in the output document, while they seem to have a different role.
since you mention the _title_ directive, i guess that your suggestion is to render the title with that directive? that would help not to think that there is an error, and it would enable carrying the metadata on if the RST is parsed again. that would make the title disappear anyway if the RST is converted into HTML for example.
anyway i feel like i already made enough fuzz about a problem that gets fixed by adding the -s option. let's see whether this gets any interest from anybody else, otherwise i guess that we can close this
With -s everything works as expected:
% pandoc test.docx -t rst -s
============================
Accessibilità delle immagini
============================
There is just one confusing thing -- one would expect the -t native to produce the metadata, even without -s. In fact, the metadata IS being parsed: compare -t json. This is because of the way the 'native' writer works: it only prints metadata if you specify -s. We could change that, but it is somewhat handy for making tests shorter, etc., and it's probably more trouble than it's worth to change.
Is this documented? I skimmed the manual and couldn't really find it.
Are you using -s for a standalone document?
hftf notifications@github.com writes:
Is this documented? I skimmed the manual and couldn't really find it.
--
You are receiving this because you commented.
Reply to this email directly or view it on GitHub:
https://github.com/jgm/pandoc/issues/4583#issuecomment-385271797
John MacFarlane jgm@berkeley.edu writes:
Are you using -s for a standalone document?
Sorry - replied without noticing the context above.
This is what you want documented.
To me the current behavior is consistent with the rest of
the formats: title and other metadata are added in the template
and hence only appear with -s. Is RST particularly surprising
in this respect?
I was referring to native, which might warrant a special mention.
@hftf its listed under -t and -f options: http://pandoc.org/MANUAL.html#general-options (check out the new alphabetically sorted list!)
I guess it must be really unclear what I'm talking about. The relevant context is the comment by @jgm last week, located right above my first comment, which mentions this behavior, among other things:
This is because of the way the 'native' writer works: it only prints metadata if you specify
-s.
I suggested to document that behavior. I searched the manual (for "standalone" and "metadata") and only found this in the YAML section, but to me it implies that it only works for Markdown output.
Btw, it is pretty uncharitable to interpret my comment completely out of context (but thanks for showing me the new sorted list). Of course I already knew about native and of course I wasn't suggesting readding native to the list of formats. (Though if you think I couldn't even see that, perhaps you're implying it's not documented enough, then?)
@hftf i think that you suggestions fall into the description of #4584. I don't think that anybody here is uncharitable, people are just busy and sometimes have no attention to pay to the full context of a message
I just added this to the manual for --standalone:
"For native output, this option causes metadata to be included; otherwise, metadata is suppressed."
Does that do the trick? If it does, then I think the issue can be closed.
it's helpful, i'd say that this issue can be closed