pandoc 2.3.1 on Ubuntu16.04 on WSLinput.md).custom-style="Source Code" and custom-style="Bullet List 1" divs. ::: {custom-style="Source Code"}
Source Code
:::
::: {custom-style="Bullet List 1"}
Bullet List 1
:::
pandoc -t docx -o docx.docx input.md
-f docx+styles optionpandoc -f docx+styles -t markdown -o output.md docx.docx
output.md::: {custom-style="SourceCode"}
Source Code
:::
::: {custom-style="BulletList 1"}
Bullet List 1
:::
Source Code vs. SourceCode/ Bullet List 1 vs. BulletList 1 styles are treated as different thing in Word. There would be problem when running pandoc -t docx -o docx2.docx output.md
This is what we get in docx.docx, style.xml:
<w:style w:type="paragraph" w:customStyle="1"
w:styleId="BulletList1"><w:name w:val="Bullet List 1" /><w:basedOn
w:val="BodyText" /><w:qFormat /></w:style><w:style w:type="paragraph"
w:customStyle="1" w:styleId="SourceCode"><w:name w:val="Source Code"
/><w:basedOn w:val="BodyText" /><w:qFormat /></w:style><w:style
w:type="paragraph" w:customStyle="1" w:styleId="SourceCode"><w:name
w:val="Source Code" /><w:basedOn w:val="Normal" />
As you can see, the w:name has the spaces, and the w:styleId does not.
I'm not sure how custom styles work - maybe @jkr can chime in here.
We filter out the spaces in newParaPropToXML to make a valid styleId:
newParaPropToOpenXml :: String -> Element
newParaPropToOpenXml s =
let styleId = filter (not . isSpace) s
...
I believe this was implemented because word complains if styleIds have spaces. To the best of my undersanding, and recollection, the document.xml only makes reference to styleId so to reference custom styles we need to deal with these.
As I understand it, the difference between w:name and w:styleId is that the former is visible through the UI while the latter is what is referenced internally. Being identical modulo spaces is just a convention: in a US word, the style ID is Header1 and the name is Header 1. But in an internationalized Word, the styleId is Header1 and the name could be <insert some unicode here>.
The easiest, and most consistent thing, would be to document that custom-styles should not have spaces. We could also pop out a warning if spaces are removed in the filter step above.
It's possible that the rethinking around #5052 could help refine this further, but I think that's mainly a separate issue: matching on pre-existing styles.
follow-up: looking at the code again:
newParaPropToOpenXml :: String -> Element
newParaPropToOpenXml s =
let styleId = filter (not . isSpace) s
in mknode "w:style" [ ("w:type", "paragraph")
, ("w:customStyle", "1")
, ("w:styleId", styleId)]
[ mknode "w:name" [("w:val", s)] ()
, mknode "w:basedOn" [("w:val","BodyText")] ()
, mknode "w:qFormat" [] ()
]
Note that the w:name is based on the original string, spaces and all. So we have both of them there to match on in the reader. So from the reader side, this does actually seem to be a companion issue to the one raised in #5052 (ie whether we match on name or styleid).
@jrk, thought I could offer a minor correction:
But in an internationalized Word, the styleId is
Header1
This is not usually the case. In all cases I've looked at, if name has Unicode characters, styleId is a seemingly arbitrary short string of a letter and a number, e.g. a7. This holds for Word-generated documents, and Word will rewrite built-in styles (like Header 1) in this manner when, e.g. the document is created in Pandoc and then opened and re-saved in Word.
I found styles extraction by @niszet

I would like to explain about this screenshot. This is an R's output. I used officer package and its function.
The document wwww.docx shown in the picture is a docx file which generated by pandoc without --reference-doc option. And it was saved after changed on Word.
The style Title is shown as 琛ㄩ on JP setting MS Word. So the style id can not use its original style name. Maybe this is MS Word's natural behaviour because it also happens when I use docx which is not generated by Pandoc. (But in the docx, style_name is still "Title"...).
I think, original docx and regenerated docx should be same (regenerated meand docx -> md -> docx by Pandoc).
Now, the markdown which generated by docx uses style_id but style_name is used when the markdown is convert to docx.
So, how about adding an option to select using style_id or style_name? (This is @sky-y 's idea. But I would like to have this option to choose.)
In this case, we can get the same style in original and regenerated docx even if reference-doc is used, I think.
Fixed.
Most helpful comment
Fixed.