It appears that when getLibXmlLoaderOptions() is called, it globally affects external xml reading.
This is bad, because just reading excel files with this library has a global side effect, and it breaks Soap.
I created a PR with a failing test: #73
FYI it's because of these lines:
https://github.com/PHPOffice/PhpSpreadsheet/blob/develop/src/PhpSpreadsheet/Settings.php#L346
https://github.com/PHPOffice/PhpSpreadsheet/blob/develop/src/PhpSpreadsheet/Settings.php#L363
For anyone that's looking for a workaround for your SOAP calls or other external xml reading, call libxml_disable_entity_loader(false) before instantiating SoapClient or other things that involve reading external xml
Can confirm. Our Symfony app breaks after using PhpSpreadsheet because it fails to properly read XML files.
@amcsi thx for mentioning a work around, works fine.
It would be appreciated it someone can come up with a proper fix that is covered by unit tests.
Without digging into it I suppose something like remembering previous global settings and re-set them afterwards might be the way to go. What do you think ?
I did a bit or research on this.
Apparently, the incriminated piece of code calls the libxml_disable_entity_loader() function which is global and affects all libxml-bases libraries, so basically, every other library/code that works with XML is impacted by this. The function call was added 3 years ago for security reasons (CVE-2014-2054).
It seems that that call should be left in the codebase, so maybe the best workaround would be to make an XML reader "factory" for each XML library (we have all of them here: SimpleXML, DOMDocument, and XMLReader) that wraps the code required to open the XML file. Here is an example taken from the last link:
$oldValue = libxml_disable_entity_loader(true);
$dom = new DOMDocument();
$dom->loadXML($xml);
libxml_disable_entity_loader($oldValue);
Settings::getLibXmlLoaderOptions() has 40 occurrences in total, so this would be a fairly big change.
Edit: Just wanted to clarify. The change wouldn't be that big per se, but of the four classes involved, only one is covered by unit tests. I'm not sure if this is something that could be changed with confidence without having all the necessary tests in place.
Unfortunately unit tests are lacking and there isn't much we can do except to add new one progressively.
Your suggestion sounds good to me as long as we are sure that those options only have effect when loading the document, and not later when using the objects. If that would be the case, that would mean we would have to keep "our" options until the end of usage of the reader/writer, which may be hard to do. Is there are a to confirm or infirm that ?
Also on a side note and slightly out of scope, but I wouldn't mind getting rid of SimpleXML in favor of DOMDocument. If it's not too much work, not too much risky, and could simplify our code base (future factories), then that could/should be done.
@agopaul Would you really have to unit test every usage? I would think that you could just create a wrapper class around DOMDocument/SimpleXML, use and unit test those. You already assume that DOMDocument and SimpleXML work as expected due to being external classes, so if you just created a wrapper, you could get away with just testing the differences.
Theoretically, that is true, but there is a marginal risk of breaking something due to unexpected behavior of the XML libraries involved.
Also, I'm not 100% sure that a wrapper would do the job: like @PowerKiKi said, we don't know if the options are only used while loading the XML or if they are also used in other parts of the libraries.
Maybe I'll do some testing on the weekend.
In the meantime, if this is breaking your application, a quick workaround would be using the libxml_disable_entity_loader() function after you've used PhpSpreadsheet and re-enable the entity loader.
Coincidently I ran into an error due to this very issue just yesterday when a test suite on a project that I'm working on was failing unexpectedly and sure enough, it was because of this function calls.
Anyway, the whole point of disabling the external entities loader via libxml_disable_entity_loader() is to prevent XXE/XEE attack vectors.
Digging inside the codebase I've found that @MarkBaker actually fixed the XXE/XEE vulnerability in another way after the commit that added the calls to the libxml_disable_entity_loader() function: https://github.com/PHPOffice/PHPExcel/commit/0ab614fd952f82f9b7a9280731daa2300e6b000c
So there are 2 fixes for the same problem.
A few considerations on @MarkBaker's solution:
libxml_disable_entity_loader() is not: https://bugs.php.net/bug.php?id=64938libxml_disable_entity_loader() just disables the external entity loading whereas this implementation just scans for entities and throws an Exception if one is found. That, of course, is because (I assume) none of the XML files parsed by PhpSpreadsheet should use XML entities.I looked up how the Zend Framework team fixed this and they created a library that parses XML safely and they adopted both approaches (libxml_disable_entity_loader() and Mark's). The lib uses one or the other depending on the thread safety of the current PHP context.
Links:
In other words, I think it's safe to say that these libxml_disable_entity_loader() calls inside the Settings class can just be removed.
I think this is probably a safe change and we wouldn't need to have a complete test coverage of the readers to do it. I can work on a PR if needed.
Thanks for your in-depth analysis. If I understand things correctly we can remove libxml_disable_entity_loader() if and only if every single loaded XML is checked with BaseReader::securityScanFile() beforehand. It seems it was done by Mark, but it would be best to double-check. If you can do so, then I'll merge your PR.
Closed via #113
Most helpful comment
Can confirm. Our Symfony app breaks after using PhpSpreadsheet because it fails to properly read XML files.