Issue
I've encountered a type of Docx file (ones that are exported from Quip), which do not export their images when using --export-media and -t markdown.
However, if the Docx is loaded into Word application, and then saved out, then the images will correctly export. This suggests it might be a file formatting issue, but the document renders fine in Word, and I compared the document.xml in these two files however I couldn't spot any distinct different in the structures.
Test Files
I have attached two files:
Test.docx - the original exported file, containing 2 embedded images
Test2.docx - the original exported file, loaded into and then saved out from Word
Reproduction
pandoc "Test.docx" --verbose --extract-media=test_media --atx-headers -f docx -t markdown -o "Test.md"
Result: No images are exported
Expected: two images to be exported to test_media folder
pandoc "Test2.docx" --verbose --extract-media=test_media2 --atx-headers -f docx -t markdown -o "Test2.md"
Result: 2 images are exported to test_media2 folder, as expected.
[INFO] Extracting test_media2\media\image1.png...
[INFO] Extracting test_media2\media\image2.png...
Environment
Running Pandoc version 2.7.3 on Windows 10, 64-bit.
Attachments
Test.docx
Test2.docx
Thanks for a great tool.
The relevant part from Test.docx:
<wp:docPr id="10" name="media/JIcACABwiXP.png"/>
<a:graphic xmlns:a="http://schemas.openxmlformats.org/drawingml/2006/main">
<a:graphicData uri="http://schemas.openxmlformats.org/drawingml/2006/picture">
<pic:pic xmlns:pic="http://schemas.openxmlformats.org/drawingml/2006/picture">
<pic:nvPicPr>
<pic:cNvPr id="0" name="media/JIcACABwiXP.png"/>
<pic:cNvPicPr/>
</pic:nvPicPr>
<pic:blipFill>
<a:blip r:embed="rId10"/>
<a:stretch>
<a:fillRect/>
</a:stretch>
</pic:blipFill>
<pic:spPr>
<a:xfrm>
<a:off x="0" y="0"/>
<a:ext cx="5352176" cy="4219662"/>
</a:xfrm>
<a:prstGeom prst="rect"/>
</pic:spPr>
</pic:pic>
</a:graphicData>
</a:graphic>
Probably related (or the same issues): https://github.com/jgm/pandoc/issues/1810 and https://github.com/jgm/pandoc/issues/5394
Do you know by what tool or word version the docx was generated?
The tool that generated Test.docx was the Salesforce Quip app. Entirely possible their exported docx markup is somehow at fault here, but it did seem like a valid docx so thought I'd report it as an issue here.
Test2.docx was generated by Word for Office 365, V16.0 32-bit, simply by opening Test.docx and then "saving as" Test2.docx - no other modifications to the doc.
The xml for Test2.docx is:
<wp:docPr id="9" name="media/JIcACA7YtNb.png"/>
<wp:cNvGraphicFramePr/>
<a:graphic xmlns:a="http://schemas.openxmlformats.org/drawingml/2006/main">
<a:graphicData uri="http://schemas.openxmlformats.org/drawingml/2006/picture">
<pic:pic xmlns:pic="http://schemas.openxmlformats.org/drawingml/2006/picture">
<pic:nvPicPr>
<pic:cNvPr id="0" name="media/JIcACA7YtNb.png"/>
<pic:cNvPicPr/>
</pic:nvPicPr>
<pic:blipFill>
<a:blip r:embed="rId5"/>
<a:stretch>
<a:fillRect/>
</a:stretch>
</pic:blipFill>
<pic:spPr>
<a:xfrm>
<a:off x="0" y="0"/>
<a:ext cx="5352176" cy="2961313"/>
</a:xfrm>
<a:prstGeom prst="rect">
<a:avLst/>
</a:prstGeom>
</pic:spPr>
</pic:pic>
</a:graphicData>
</a:graphic>
Ah yes, they indeed look similar. Probably the key is in the document.xml.rels files which contains also:
<Relationship Id="rId10" Target="media/JIcACABwiXP.png" Type="http://schemas.openxmlformats.org/officeDocument/2006/relationships/image"/>
Btw. in neither LibreOffice nor Apple Pages the images show up...
Useful to know they dont render properly in other apps since that's indicative of some type of malformed document, in which case its definitely an issue for Quip.
Posting up the xml for reference...
Test.docx document.xml.rels
<Relationships xmlns="http://schemas.openxmlformats.org/package/2006/relationships">
<Relationship Id="rId3" Type="http://schemas.openxmlformats.org/officeDocument/2006/relationships/settings" Target="settings.xml"/>
<Relationship Id="rId4" Type="http://schemas.openxmlformats.org/officeDocument/2006/relationships/webSettings" Target="webSettings.xml"/>
<Relationship Id="rId5" Type="http://schemas.openxmlformats.org/officeDocument/2006/relationships/image" Target="media/image1.jpg"/>
<Relationship Id="rId6" Type="http://schemas.openxmlformats.org/officeDocument/2006/relationships/fontTable" Target="fontTable.xml"/>
<Relationship Id="rId7" Type="http://schemas.openxmlformats.org/officeDocument/2006/relationships/theme" Target="theme/theme1.xml"/>
<Relationship Id="rId1" Type="http://schemas.openxmlformats.org/officeDocument/2006/relationships/styles" Target="styles.xml"/>
<Relationship Id="rId2" Type="http://schemas.microsoft.com/office/2007/relationships/stylesWithEffects" Target="stylesWithEffects.xml"/>
<Relationship Id="rId8" Type="http://schemas.openxmlformats.org/officeDocument/2006/relationships/numbering" Target="numbering.xml"/>
<Relationship Id="rId9" Target="media/JIcACA7YtNb.png" Type="http://schemas.openxmlformats.org/officeDocument/2006/relationships/image"/>
<Relationship Id="rId10" Target="media/JIcACABwiXP.png" Type="http://schemas.openxmlformats.org/officeDocument/2006/relationships/image"/>
</Relationships>
Test2.docx document.xml.rels
<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<Relationships xmlns="http://schemas.openxmlformats.org/package/2006/relationships">
<Relationship Id="rId8" Type="http://schemas.openxmlformats.org/officeDocument/2006/relationships/theme" Target="theme/theme1.xml"/>
<Relationship Id="rId3" Type="http://schemas.openxmlformats.org/officeDocument/2006/relationships/settings" Target="settings.xml"/>
<Relationship Id="rId7" Type="http://schemas.openxmlformats.org/officeDocument/2006/relationships/fontTable" Target="fontTable.xml"/>
<Relationship Id="rId2" Type="http://schemas.openxmlformats.org/officeDocument/2006/relationships/styles" Target="styles.xml"/>
<Relationship Id="rId1" Type="http://schemas.openxmlformats.org/officeDocument/2006/relationships/numbering" Target="numbering.xml"/>
<Relationship Id="rId6" Type="http://schemas.openxmlformats.org/officeDocument/2006/relationships/image" Target="media/image2.png"/>
<Relationship Id="rId5" Type="http://schemas.openxmlformats.org/officeDocument/2006/relationships/image" Target="media/image1.png"/>
<Relationship Id="rId4" Type="http://schemas.openxmlformats.org/officeDocument/2006/relationships/webSettings" Target="webSettings.xml"/>
</Relationships>
I will try to see if I can isolate which difference in the files is the cause of this.
To help with comparing docx files, I wrote a little shell script, https://github.com/jgm/diff-docx
This saves the trouble of unzipping and tidying.
(I've now put this in the tools/ directory of this repository instead of its own repository.)
https://github.com/jgm/diff-docx returns 404 error.
Ah, it looks like the repository https://github.com/jgm/diff-docx is removed in favor of pandoc/tools/diff-zip.sh (see 83a0104).
Most helpful comment
To help with comparing docx files, I wrote a little shell script, https://github.com/jgm/diff-docx
This saves the trouble of unzipping and tidying.
(I've now put this in the tools/ directory of this repository instead of its own repository.)