Is your feature request related to a problem? Please describe.
The Azure Data Lake Storage Gen2 BlobServiceClient.listBlobContainers() call falls into an infinite loop in my project, which already has an older version of woodstox-5.0.3.jar on the classpath that cannot be removed. With the older woodstox library, the NextMarker attribute is not deserialized properly into the BlobContainersSegment model. The value gets deserialized as "" (empty String) instead of null when the list is exhausted. The PageableIterable interprets the empty String as "start from the beginning", and successive calls to next() make additional requests to Azure listing the contents in an infinite loop.
Describe the solution you'd like
I would like a way to expose alternative constructors of XmlMapper used in JacksonAdapter to developers (maybe through an SPI) so I can specify the correct XMLInputFactory (the one for woodstox-6.0.2 that is a transitive dependency of azure-core) used by the XmlFactory instead of the default behavior, which uses javax.xml.stream.XMLInputFactory.newInstance() resolving to the wrong woodstox library.
An SPI wouldn't be required, maybe the call in JacksonAdapter could be modified to specify a new XMLMapper( new com.ctc.wstx.stax.WstxInputFactory() ). I could then simply shade the Azure dependencies following what has been previously suggested for jackson to avoid all conflicts.
Describe alternatives you've considered
I have rebuilt jackson-dataformat-xml and modified the XmlMapper() constructor to use the constructor. However, I don't really care for building and distributing a custom jackson-dataformat-xml library.
@anuchandy I believe this is similar to another issue you are actively working on, could you link it.
@gapra-msft @rickle-msft FYI
@alzimmermsft thanks! The other issue was due to the IntelliJ plugin using custom a classloader, which has its special class-path that wont contains woodstox dependency. This issue, based on the description, seems due to the hosting environment forcing old woodstox-5.0.3.jar, we may have to use a different approach. But you're right that both issues have the same symptoms from the application perspective.
@jonjarvis, thanks for reporting this and the proposal. Let me take a look and get back.
@jonjarvis, was looking into the proposed approach i.e. changing the initialization in JacksonAdapter to:
XMLMapper(new com.ctc.wstx.stax.WstxInputFactory())
While this is possible without adding an extra dependency and looks safe, there is a hidden issue. Jackson uses SPI specifically to enable users to bring in a different XmlFactory impl if needed by adding that impl as a dependency. So making the above change will disable that scenario and break any existing customers using that approach.
Let me explore more options and respond.
@jonjarvis, I looked into jackson-xml module code-path and think we have a solution to override the platform injected old dependency:
Here is a sample project that shows how to shades and relocates it.
storage-wstx.zip
The next step is to instruct XmlFactory too use relocated wstx types, which I discovered by scanning jackson codebase, this is done by setting the following two property:
System.setProperty("javax.xml.stream.XMLInputFactory",
"com.microsoft.shaded.com.ctc.wstx.stax.WstxInputFactory");
System.setProperty("javax.xml.stream.XMLOutputFactory",
"com.microsoft.shaded.com.ctc.wstx.stax.WstxOutputFactory");
Let me know how it goes.
@anuchandy - Thank you for the suggestion, but I tried it (before filing this ticket) and it did not work. Our project has an OSGi layer, which changes the FactoryFinder that parses those properties in javax.xml.stream.XMLInputFactory (so different OSGi bundles can resolve to the correct bundle). Woodstox 5.0.3 is exported to our default app classloader, and so it ignores the class in my private classloader.
@jonjarvis thanks for the response.
How about if you build a shaded storage-lib that include relocated javax.xml.stream and use it finder to parse the properties? I've updated the sample to have additional shading and relocation. storage-wstx.zip
and update the properties like below:
System.setProperty("com.microsoft.shaded.javax.xml.stream.XMLInputFactory",
"com.microsoft.shaded.com.ctc.wstx.stax.WstxInputFactory");
System.setProperty("com.microsoft.shaded.javax.xml.stream.XMLOutputFactory",
"com.microsoft.shaded.com.ctc.wstx.stax.WstxOutputFactory");
@anuchandy - To be a little more clear. As far as I can tell, the FactoryFinder that OSGi (Apache Felix) somehow injects into the Java runtime instead completely ignores those System properties. I watched in the debugger as it was resolving the javax.xml.stream.XMLInputFactory through some OSGi package (I don't recall what it was), but it used an inline class $FactoryFinder instead of the default one that I didn't have source code for. Here's an article that mentions how this can happen for various SPI (https://blog.osgi.org/2013/02/javautilserviceloader-in-osgi.html)
I was not successful in packaging my code in an OSGi bundle either (nor did I want to). The only way I was successfully able to get it to work was by changing jackson-dataformat-xml and swapping the jar with my custom one.
Our project has two different models for loading plugins. The traditional and preferred way is through an inverse classloader system. Each plugin we load gets placed in a folder, and all jars in the folder are used first by a custom classloader, then resolved through a parent class loader when not found in the plugin code folder. The other method of loading plugins is through OSGi, and I've had mixed results using it. Unfortunately, in cases like this the OSGi capabilities disable the standard behavior.
Interesting, thanks for the details on the env.
In the issue description, it was mentioned that it is possible to apply some shading, If we take that route and uses a shaded storage lib, I was expecting the shaded jackson-dataformat-xml uses the relocated com.microsoft.shaded.javax.xml.stream.FactoryFinder which looks for shaded com.microsoft.shaded.com.ctc.wstx.stax.WstxInputFactory|WstxoutputFactory. That is what I was trying to achieve using the latest sample, does that work in your env?
@anuchandy - Thanks for the suggestion. I will give it a try today and let you know how it goes.
@jonjarvis sure.
Between, I was reading about Apache Felix OSGi; as per the documentation, the Java ServiceLoader|FactoryFinder patterns often require special handing in the OSGi environment due to the non-modular design of these loading facilities. I see OSGi bundles use the service loader mediator concept to discover service providers. I'm very new to OSGi world but just want to check whether that route applicable/validated.
Even with the service loader mediator route, I'm not sure it is supported for the mediator to override the older woodstox injected by the hosting env. If not, does some variant of shading works? I guess we need to see how OSGi bundles generally solve these types of conflicts issues via shading+relocation.
@anuchandy - Thank you for the extra references. I'm trying to avoid using OSGi in general though. I was able to restructure my maven build (I created a module to generate the shaded jar for azure datalake and azure identity) as suggested using the shade plugin to relocate the stax-api for javax.xml.stream used by jackson-dataformat-xml. I like this solution better than rebuilding jackson-dataformat-xml, so thank you for the suggestion/example.
While this works, I still have reservations shading all the dependencies into an uber jar (and then explaining the two seemingly random System.setProperty calls). Its not urgent, but I think it would be good to offer developers a way to customize the way azure-core creates the JacksonAdapter.
Excellent, happy to hear that you're able to use the azure storage+identity libs in your setup, yeah, the relocation of underlying stax-api was the tricky part.
Regarding your ask on alternatively enabling users to configure JacksonAdapter, we have some work in progress in this space. See these modules 1, 2. Hopefully, libraries allow the user to provide a configured adapter soon.
That said, there is one thing about your environment that I'm puzzled, and I thought of sharing once you're unblocked, and now you're :).
I tried using woodstox-5.0.3 in a simple console app, and it indeed deserialized empty node to null not as empty string like you see. I wonder the woodstox-5.0.3.jar in your classpath is a modified version (hence different from one in maven).
Or, maybe the vendored stax-api Finder (that inline class $FactoryFinder) in your OSGi env is using a specialized classloader for which injected woodstox-5.0.3.jar is not visible, maybe jar is visible only for bundler classloader; hence Jackson might be switching to its default XML reader. The default XML reader is known to unable to identify the empty node that always represents such nodes as an empty string.
In any case, with the current shading method, things are supposed to work as expected, but I want to let you know before closing this ticket.
@anuchandy - Thank you for your help. I'm really not sure about woodstox-5.0.3. As far as I know, it is not a custom build. Maybe it is something to do with configuration... I know that is the offender because when I remove the bundle, the version 6 library is used and things work correctly. Thanks again!
@jonjarvis thanks for the response.
I'm closing this issue then, feel free to open a new issue if you run into more issues. good day!