Hi
This is my first use of PHPWord. I created a docx file with LibreOffice (do not ask why I'm not using MS Office :) )
The document contains a block (a single paragraph) to be cloned. The resulting docx file appears to be empty (still using LibreOffice).
I noticed the paragraphs are different from the sample_23 provided by PHPWord and those in my document created by LibreOffice.
The following code has been code formated (with Eclipse) to check the XML structure.
Sample_23 provided by PHPWord repository (block tag only):
<w:p w:rsidR="00C0566D" w:rsidRPr="003B08B6" w:rsidRDefault="00C0566D"
w:rsidP="00C0566D">
<w:r>
<w:t>${CLONEME}</w:t>
</w:r>
</w:p>
Paragraph generated by LibreOffice (block tag only)
<w:p>
<w:pPr>
<w:pStyle w:val="Corpsdetexte" />
<w:pageBreakBefore w:val="false" />
<w:rPr></w:rPr>
</w:pPr>
<w:r>
<w:rPr></w:rPr>
<w:t>${itemtypeBlock}</w:t>
</w:r>
</w:p>
output file (cloned blocks appears to be nested)
<w:p>
<w:pPr>
<w:pStyle w:val="Corpsdetexte" />
<w:p>
<w:pPr>
<w:pStyle w:val="Corpsdetexte" />
<w:rPr></w:rPr>
</w:pPr>
<w:r>
<w:rPr></w:rPr>
<w:t>cloned paragraph</w:t>
</w:r>
</w:p>
<w:p>
<w:pPr>
<w:p>
<w:pPr>
<w:pStyle w:val="Corpsdetexte" />
<w:rPr></w:rPr>
</w:pPr>
<w:r>
<w:rPr></w:rPr>
<w:t>cloned paragraph</w:t>
</w:r>
</w:p>
<w:p>
<w:pPr>
<w:p>
<w:pPr>
<w:pStyle w:val="Normal" />
<w:rPr></w:rPr>
</w:pPr>
I began to debug and found the following :
Want to back this issue? Post a bounty on it! We accept bounties via Bountysource.
I finally found a working regex on my document. It has been designed to run also with the sample_23 document.
public function cloneBlock($blockname, $clones = 1, $replace = true)
{
$xmlBlock = null;
preg_match(
//'/(<\?xml.*)(<w:p.*>\${' . $blockname . '}<\/w:.*?p>)(.*)(<w:p.*\${\/' . $blockname . '}<\/w:.*?p>)/is',
'/(<\?xml.*)(<w:p( [^>]*)?>([\s]*<.*>)?\${' . $blockname . '}(<.*?>[\s]*)?<\/w:p>)(.*)(<w:p( [^>]*)?>([\s]*<.*>)?\${\/' . $blockname . '}(<.*?>[\s]*)?<\/w:p>)/is',
$this->tempDocumentMainPart,
$matches
);
if (isset($matches[6])) {
$xmlBlock = $matches[6];
$cloned = array();
for ($i = 1; $i <= $clones; $i++) {
$cloned[] = $xmlBlock;
}
if ($replace) {
$this->tempDocumentMainPart = str_replace(
$matches[2] . $matches[6] . $matches[7],
implode('', $cloned),
$this->tempDocumentMainPart
);
}
}
return $xmlBlock;
}
I submit it here for review, and if it passes the tests I'll make a pull request.
As the regex is not easily readable, the following explains how it works
(<\?xml.*)
is a greedy XML tags eater, which stops when we reach the nearest
(<w:p( [^>]*)?>([\s]*<.*>)?\${' . $blockname . '}(<.*?>[\s]*)?<\/w:p>)
This second part handles attributes that may be found in
As few as possible XML tags match after the block tag until we reach /w:p
Note I left some [s] for debug purpose, and this should not affect a real life document (but we may consider to remove them)
The begin of a block should be on its own paragraph strictly alone (I mean : without any text) (if I understand the original regex)
(.*)
This sub regex matches the XML code to be cloned, until we reach the the paragraph with the end block tag (seel below)
(<w:p( [^>]*)?>([\s]*<.*>)?\${\/' . $blockname . '}(<.*?>[\s]*)?<\/w:p>)
This sub regex is similar to the begin block tag : the neareast previous
Hope this helps to improve PHPWord.
This regex works better but I'm affraid the pattern ( http://www.regular-expressions.info/catastrophic.html
@btry hey man, your regex is so :fire: :fire: :fire:
Thank you so much
Hi
I'm no longer using phpword. It seems some other proposals were done. If they merged in a release, it shall work.
@nicoder, when I designed my regex I was able to repeatedly grow a document full of tables, from a template of 5 pages to a document reaching about 40 pages without any problem. I may still have samples. If you wish to see them just ask (I need to redact them for confidentiality needs)
Most helpful comment
I finally found a working regex on my document. It has been designed to run also with the sample_23 document.
I submit it here for review, and if it passes the tests I'll make a pull request.
As the regex is not easily readable, the following explains how it works
is a greedy XML tags eater, which stops when we reach the nearest of the begining of the searched tag (see sub regex below)
This second part handles attributes that may be found in and the fewest tags to reach the begin of the block tag.
As few as possible XML tags match after the block tag until we reach /w:p
Note I left some [s] for debug purpose, and this should not affect a real life document (but we may consider to remove them)
The begin of a block should be on its own paragraph strictly alone (I mean : without any text) (if I understand the original regex)
This sub regex matches the XML code to be cloned, until we reach the the paragraph with the end block tag (seel below)
This sub regex is similar to the begin block tag : the neareast previous and the nearest next /w:p with the end of block tag as fixed point in the document.
Hope this helps to improve PHPWord.