Runtime: System.Xml.XPath to support XPath 2, XPath 3 and their XSLT variants

Created on 10 Jul 2015  ·  99Comments  ·  Source: dotnet/runtime

Motivation

System.Xml.XPath currently conforms with XPath 1.0 [W3C-xpath-1] and XSLT 1.0 [W3C-xslt-1] standards, but not XPath 2.0 [W3C-xpath-2], XPath 3.0 [W3C-xpath-3], XPath 3.1 [W3C-xpath-3.1], XSLT 2.0 [W3C-xslt-2] and XSLT 3.0 [W3C-xslt-3].

The missing standard implementations in BCL are required by many consumer scenarios, for which .NET applications rely on third party libraries. One of the chief scenario is Content Query Web Part (CQWP) in SharePoint, where the users' XSLT code can be drastically minimized if v2 is supported by System.Xml.XPath. As for most parts, there are backward compatibility fallbacks available, that is; the code written in XSLT 2 precisely, can be expressed verbosely in XSLT 1 and since so forth.

Pitfalls

Unfortunately, (besides the existing third-party libraries' APIs) I do not have an off-hand -- concrete -- method list to propose, as it requires further brain-storming on whether to auto-select processor based on the input or to explicitly separate the namespaces (System.Xml.XPath2 and System.Xml.XPath3).

The point to ponder being; since the sub-languages XPath 2 and XPath 3 intrinsically facilitates backward compatibility modes, see XPath 2: J.1.3 Backwards Compatibility Behavior and XPath 3: 3.10 Backwards Compatible Processing, should the API be any different than the existing one and let consumers select the standard mode?

api-suggestion area-System.Xml

Most helpful comment

Being top voted != guarantee it will be invested into. SW development is more complicated than that and votes are just one angle how to get info about customer needs and prioritize them.
Also look above for my explanation of associated costs (super high), security risks and ongoing security maintenance cost - in https://github.com/dotnet/corefx/issues/2295#issuecomment-336193617

It shouldn't be a surprise that similar passion and frustration "why is it not fixed yet" is expressed on almost every high profile issue and on quite a few 2-3 upvoted issues.

Incidentally, we reopened the funding discussion again couple of days ago internally (no guarantee how it will end!) ... just demonstrating we are not ingoring feedback/upvotes, it is just sometimes more involved than one would think.

All 99 comments

/cc @krwq @KrzysztofCwalina @piotrpMSFT

Sorry I never hearth of this project and have no idea about the goal but at my search for XPath2 in .NET I also saw this initiative: https://xpath2.codeplex.com/ and https://qm.codeplex.com/ made by the same author. The last one is mentioned in http://dev.w3.org/2006/xquery-test-suite/PublicPagesStagingArea/

+1

Please make this happen

We need API proposal. Anyone wants to do that?

Just wanted to add that this is one of the top 5 issues on UserVoice for .net at the moment. I know many devs who really need this. Lack of support for v2 has bit me many times in SharePoint and Umbraco. Similarly, I know of developers who have had to use unfamiliar stacks for application integrations (e.g. EDI/BizTalk type projects), simply due to lack of any support here. While I am not really capable of helping, I and many, many others am very keen to see this happen.

https://visualstudio.uservoice.com/forums/121579-visual-studio-ide/category/31481--net
https://visualstudio.uservoice.com/forums/121579-visual-studio-ide/suggestions/4450357-implement-xslt-3-0-for-net
https://visualstudio.uservoice.com/forums/121579-visual-studio-ide/suggestions/3795831-native-support-for-xpath-2-0-or-xslt-2-0-in-net

@alirobe thanks for the context, that is useful!
To clarify: the ask on UserVoice is for XSLT 3.0. If we can implement it on top of .NET Standard 2.0, then also Desktop might benefit from it (out-of-band package). If it is not possible or if it is costly, then maybe it is incentive for people to move to .NET Core ;-)

cc @sepidehMS @krwq

@danmosemsft maybe something to focus on after 2.0/2.1? You were collecting a list ...

cc @terrajobst

@karelz no problem! I don't really care where it gets implemented first, I just want a Microsoft API implementation _somewhere..._ I'm sure that there will be enough demand to get that implementation surfaced everywhere. XPath & XSLT all need updating. XPath alone would be a great start, but they do kind of go together. :)

Should it be raised as an issue here? Happy to do that if needed. I'm not familiar with the way MS works on this stuff, but as you've said, an a API proposal (followed by test suite) would be an obvious starting point, and actually fairly project-agnostic.

Also, thanks so much for taking this seriously! This is something users have been crying for Microsoft to do for over 10 years. Anyone who does this will be a hero to tens of thousands of enterprise developers. I'm smiling just thinking of all the work-around code I will be able to delete! :)

@karelz it's already on my list. If I understand @krwq it seems it could be done while keeping our XML library netstandard 2.0 compilant. It is just resourcing and it's great to keep gathering evidence to bubble up the list when 2.0 is out the door.

@danmosemsft that's great news that we already have it on road map!

@alirobe no need to file issues in .NET Standard -- .NET Standard is basically the common interface/intersection between Desktop (.NET Framework), .NET Core and Xamarin. If this can be implemented on top of .NET Standard, then all platforms will benefit. If it can't, it will be part of .NET Core future version, waiting for other platforms (Xamarin, Desktop) to catch up or be implemented as out-of-band package for those platforms, before we can add it to .NET Standard.

Also, thanks so much for taking this seriously!

BTW: We always take customer feedback and votes seriously. It sometimes might not look like that due to communication hiccups, or due to technical limitations (e.g. some changes in Desktop are breaking - a big no-no), but rest assured we do take it seriously (I believe it's true for all Microsoft, but I can at least guarantee it is true on .NET team).
Of course, we can't in all cases commit to dates when things will be delivered (we have to align work with other priorities and other products, like Desktop - and we prefer to not communicate date when we are not 100% sure to avoid broken promises.
In some cases we can't even commit if particular APIs will be delivered ever, especially when they are outside of our team/division ownership -- we have to work with partner teams inside Microsoft to come up with plans, and sometimes that takes time, and requires alignment of business priorities (the reality of large corporations). Nevertheless, in all cases, customers and success of .NET platform are in the center of our mind.

@danmosemsft @karelz - If we plan to reuse existing APIs I believe we will need some new switch/enum for tiny behavioral changes around parsing to not break any existing apps. Breaking changes would need to be opt-in. Anything else I believe is currently producing errors and would just start working after work is done. (I'd rather make people be more explicit about which version of XPath they choose)

At minimum this will be few new properties (hopefully just one property and perhaps a new XPathExpression constructor). We would need to figure out major breaking changes between XPATH 1.0 <--> 3.0 and figure out advantages and disadvantages of each solution

@krwq this is interesting topic. We hit similar spec incompatibilities in Http space - dotnet/corefx#13036.

If you think about this XPath case, can you imagine adding the spec-version choice as argument to constructor? Or is the relevant functionality exposed (also) as static methods?
If it is via static methods (which is the case of dotnet/corefx#13036), then we either have to add spec-version argument to all of them, or create another class, or something. If that's the case, I'd like to start some general (in principle) API design discussions for these kinds of spec-versioned APIs -- please let me know where you think it falls. Thanks!

@karelz quick look it seems to me that we only have two overloads which take XPath expression: XPathExpression.Compile. I can see couple of options:

  • add a static property which would globally change the behavior (I'm not expecting anyone use two different versions of XPath in one project)
  • add new overloads - I do not like this approach as many overloads are simply confusing
  • a new class which would inherit from XPathExpression (i.e. XPath2Expression; XPath3Expression) which would set the flag per instance and never actually expose it directly

I don't think it matters too much which option we choose - most likely people will always want the newest XPath and I'm not actually expecting breaking changes to hit too many people since even spec claims that the breaking changes were made because syntax was confusing.

New class approach is probably the most discoverable since intellisense will suggest those options but likely will be a messier implementation with not much benefit.

IMO static property because it is simplest and you do it once per app with not much downside.

@krwq

it seems to me that we only have two overloads which take XPath expression: XPathExpression.Compile

There are other methods that take XPath, for example XmlNode.SelectSingleNode or System.Xml.XPath.Extensions.XPathSelectElement.

add a static property which would globally change the behavior (I'm not expecting anyone use two different versions of XPath in one project)

What if I use two libraries, one wants to use one version of XPath, the other one another version? I can imagine that happening quite easily, so I don't think static property would be a good option. (Or will libraries have to set the property before every XPath operation? That could work if the property was thread-static, but would be annoying.)

a new class which would inherit from XPathExpression

How would that work for other XPath methods? Considering my two examples, System.Xml.XPath.Extensions.XPathSelectElement could probably work by creating e.g. System.Xml.XPath2.Extensions.XPathSelectElement, since it's an extension method. But I don't see how would something similar work for XmlNode.SelectSingleNode.

Agreed with @svick that we need options per library, that's why we need to design the static APIs carefully ... I'll dig deeper into this case and will try to start the general API design pattern discussion.
@svick do you have any recommendations? (you seem to be quite familiar with the API surface)

@svick - you're right, those APIs all call XPathExpression.Compile in the end but your point about two different dependencies using different version is basically killing static property option.

Possibly we could add overloads which take XPathExpression instead of string although I believe that would become quite annoying to use but maybe it wouldn't be too bad - what do you think?

you seem to be quite familiar with the API surface

Not really, I just googled for XPath on XmlDocument and XDocument and found the two methods. But I do have some ideas on how the API could look.

The current state is:

```c#
namespace System.Xml {
public abstract class XmlNode {
public XmlNode SelectSingleNode(string xpath);
}
}

namespace System.Xml.XPath {
public static class Extensions {
public static XElement XPathSelectElement(this XNode node, string expression);
}

public abstract class XPathExpression {
    public static XPathExpression Compile(string xpath);
}

}

There are other methods (like `XPathNavigator.Compile`) and overloads that use XPath; the three methods above should be sufficiently representative, considering it's an instance method, an extension method and a static method.

## Option 1: namespaces

Each version of XPath gets its own namespace:

```c#
namespace System.Xml.XPath2 {
    public static class Extensions {
        public static XElement XPathSelectElement(this XNode node, string expression);
        public static XmlNode XPathSelectNode(this XmlNode node, string expression);
    }

    public abstract class XPathExpression {
        public static XPathExpression Compile(string xpath);
    }
}
namespace System.Xml.XPath3 {
    …
}
…

Advantages:

  1. The code using these methods would be very clean.
  2. Switching a version of XPath could be mostly done very easily one file at a time just by changing a using.

Disadvantages:

  1. It's not discoverable: if I have code using XPath 1.0, the IDE is not going to be helpful with figuring out how to switch to XPath 2.0 or that it's even possible.
  2. Instance methods (like XmlNode.SelectSingleNode) have to be changed to extension methods and renamed.
  3. This approach bloats the API by quite a lot (each version of XPath requires one new namespace, several new classes and many new methods).

Option 2: version parameter

Each XPath method gets new overloads taking XPathVersion:

```c#
namespace System.Xml {
public abstract class XmlNode {
public XmlNode SelectSingleNode(string xpath, XPathVersion version);
}
}

namespace System.Xml.XPath {
public static class Extensions {
public static XElement XPathSelectElement(this XNode node, string expression, XPathVersion version);
}

public abstract class XPathExpression {
    public static XPathExpression Compile(string xpath, XPathVersion version);
}

public enum XPathVersion {
    XPath10,
    XPath20,
    …
}

}
```

Advantages:

  1. It's discoverable: you can find out how to switch from XPath 1.0 to XPath 2.0 by looking at the overloads of the method in the IDE.
  2. Adding a new version of XPath requires minimal API surface changes (adding a single enum member).

Disadvantages:

  1. The code using these methods has to specify the version of XPath over and over.

Conclusion

From the usage standpoint, I think I prefer option 1, even though it has its issues. I don't like the option of passing XPathExpression around (suggested by @krwq) much: it results in very verbose code and I don't see how is it better than option 2, since it still means adding new overloads to all XPath methods.

@svick - thanks for the input

Option 1. Adding a namespace per version - you always need to create new namespace - I do not like that as any changes to XPath standard will make us add new namespace and types.

I'm not a fan of XPathExpression overload because the syntax will get quite annoying.
Advantage is that after you add that overload the advantage is that you only add it once per version and no need to further add any overloads. The disadvantage is that string overload would always use XPath1 which will get confusing.

Option 2. I think it is as good as we can get. - my vote goes for that. Easy to add to any existing places - new version is just a new enum. For future updates we can use existing overload

Note that this is likely not only that 2 things built on top of XPathExpression.Compile - I'm expecting we will need to add something to XSLT and other places we likely missed although considering that is just adding an overload which takes an enum it doesn't matter too much if we miss it - anyone can easily contribute and fix any gaps

I have been using XPath2.Net by StefH for a while now. It works very well, although it has some minor disadvantages; the main one (for me) being that it keeps the compiled XPath2 expression and the runtime environment in one object, which is not thread-safe.

It (obviously) uses a separate namespace, and I have never experienced that as a problem. I would think that those who know XPath2 (or 3) have no problem using that exclusively. It is almost completely compatible with XPath1. Therefore, I would favor option 1 (adding a namespace). Once you get used to it, you will never want to look back (which a version parameter forces you to do).

Option 3 could be what XPath2.Net does, add a XPath2Expression class, and XNode.XPath2Select() etcetera (see the XPath2.Net documentation).

What I would very much like to see is the possibility to define variables that can be used in the XPath expression. For example (XPath2.Net):
public object Evaluate(IContextProvider provider, IDictionary<XmlQualifiedName, object> vars)

Another feature that I like a lot is the ability to have user-defined functions. In XPath2, these are added to a function table, like
functionTable.Add(XmlReservedNs.NsXQueryFunc, "generate-id", 0, XPath2ResultType.String, (context, provider, args) => ...);

In my application, I repeat a set of XPath computations often (as in 100,000 times or more), and being able to compile the XPath expression is important for efficiency and performance.

@svick @nverwer I think we should get to some conclusions with these.

IMO here is what we should do:

  • Create enum XPathVersion as suggested by @svick in one of the options
  • Create XPathExpression.Compile which takes new enum
  • Any place which takes XPath string as an input we should add more overloads i.e. XPath2Select; XPath2SelectSingleNode etc.

that should give us combination which is easy to manage (no new namespace) and easily discoverable (and no need to pass additional arg each time).

Please let me know if you like/dislike this. Once we agree on this we should be able to officially propose new APIs and make a plan for doing the feature work.

PS. @nverwer AFAIK you can define variables for current implementation in .NET too: https://weblogs.asp.net/cazzu/30888 - not super intuitive but definitely possible

@krwq

Any place which takes XPath string as an input we should add more overloads i.e. XPath2Select; XPath2SelectSingleNode etc.

So, to add a new version of XPath, you would need to add a new overload to all these methods? I'm not sure that's better than having each set of overloads as extension methods in a separate namespace when it comes to managing it.

It would also pollute your completion lists with all these methods you're never going to use (since most people are likely going to stick with a single version of XPath).

@svick we would have to create namespace per each class using XPath - if we put extension methods in the xpath itself you would get circular dependency. One option would be to reuse XPathNavigator or IXPathNavigable (I believe those should be independent of XPath version - possibly except what I wrote below) and add extension methods to them instead of each class using XPath and do not touch any of the existing methods - the downside of that would be that in some cases you would need to call CreateNavigator in some cases.

Other thing we also need to think about is that XPathNavigator.Select(string) is virtual which I'm not sure how it would work once we add more versions. I think I'll need to experiment with these a little bit and see what can be done and what can't.

Can something like this pattern help?
https://githubengineering.com/scientist/

@alirobe we already use similar pattern to compare different XPathNavigator implementations - this is generally a convenient approach when you need to test something really quickly when having two or more similar implementations (in XPathNavigator case it was XPathDocument vs XPath.XDocument vs XPath.XmlDocument - one of them was considered more mature and less likely to have bugs). In this case I believe the risk is much lower since XPath2 and 3 mostly extend existing standard and there is very little which actually changes

Cool. Good to know it's just a "naming things" problem.

I think we should also consider the impact of adding new code to the size of the applications. AOT toolchains (.NET Native, CoreRT, Xamarin) all use tree shakers to avoid including code which the app won't use. But in order for these to work, the dependency on the new code has to be discoverable at compile time. This typically means that if the only difference is a value of a parameter the tree shaker will not figure it out. For the most part the tree shakers can't figure out actual values for parameters. So having the new functionality in either a new namespace or type would be preferable from this point of view.
This would only apply if we were to implement the new functionality as a separate code base internally. If it would simply extend the existing XPath internals to support the new features it might be next to impossible to avoid the size increase in the apps.

@vitek-karas would treeshaker be able to figure out those kind of patterns?

    enum Foo { a, b }

    static void Bar(Foo foo)
    {
        if (foo == Foo.a)
        {
            // something pulling deps
        }
        else
        {
            // something pulling more deps
        }
    }

    static void Main()
    {
        Bar(Foo.a);
    }

If not could you provide how would you write simple branching so that treeshaker will remove unused path?

Unfortunately our tree shakers can't figure out branching like that currently (not 100% for ILLinker, but .NET Native will not for sure). We can obviously tweak the tools, but it gets complicated really fast. Usually the code is not as simple as above, and if the value if passed through a field and so on... we run into trouble.
What seems to work is things like:

  • The feature is enabled only if the app calls into a specific method, so something like EnableXPath3. Or similarly for a property setter.
  • The feature is enabled only if the app uses a specific type, so new XPath3Settings or new XPath3Expression...

In all these cases the tree shaker would not include the method/property/type if the app didn't use the feature. With that we could refactor the framework to then only pull the expensive pieces of code from those methods/properties/types, and let the rest go through a simple interface or something similar.

That said, if we plan to build XPath3 as just an extension of the existing XPath engine, this whole tree shaking idea is probably moot, since there would only be one large piece of code (the one XPath engine) and we would need it for all XPath queries regardless of which version they would use.

If anyone else would value and use this support, please thumb up the top post to help us prioritize vs. other ports.

I'm wondering whether an assembly level attribute could set the desired version in case none is specified in the apps in case it's implemented in the current namespaces. If nine is specified, use the 1.0 standard. If your assembly specifies one then for all code executing directly from your assembly will use the version that's specified. When calling into another assembly which she if it's a different version, it will use those for methods and types constructed from that context...

A similar construct could used with a using statement, akin to a Transaction context. That would at least alleviate the need for 100 extra enumerated parameters all over your code to opt into a higher version.

As to the discoverability of the extra namespaces / assemblies, a Roslyn rule+fix could solve that. It could also help resolve minor api incompatibilities when going to a higher version.

Looking at Saxon, they use setLanguageVersion("3.0") to determine version. Assuming there are no copyright / IP issues, it would make sense to keep consistent with their approach, since many will have been using the Saxon engine in lieu of the .Net support, so this makes the switch simpler.

http://www.saxonica.com/documentation9.5/expressions/

Please consider making async methods - for custom XPath/XSLT functions doing IO etc.

Great to see that this is on the agenda.

When thinking about an API for XPath 2.0 or XPath 3.1, do bear in mind that the type system is much richer than XPath 1.0. For example, an XPath 2.0 expression can return a sequence of strings, or a sequence of integers; an XPath 3.1 expression can also return a map or an array (or even a function!). (So an API that's conceived entirely around the idea of navigating a tree of nodes may be conceptually misaligned.) This applies to input values as much as return values: it's important to make it easy to supply parameters for XPath expressions (you want to discourage people from building XPath expressions by string concatenation because of the code injection risk).

Newer version support of XSLT would also benefit BizTalk Server and Azure Logic App XML transform, both which build upon .NET's support of XSLT.

Is this being worked on?

@stephen-lim not currently being worked on. This seems like a gap that is relatively easy for 3rd party libraries to fill. We would be open to community contributions, if we could agree on an API proposal.
@karelz

Here are some facts I collected from XML area experts in June:

  • The overall XPath2+XPath3 (+XPath3.1) work was costed as 3y investment. Security will be a big deal (all input is untrusted).

    • There is option to add incrementally useful parts (e.g. DateTime as first-class type, ~100 DateTime functions, RegEx function, built-in functions, primitive types, new data model and versioning), without covering corner cases -- 1y cost.

  • XSLT2 support was costed as another 1y.

Given all these fairly high costs and the fact 3rd party solutions exist (which seems to be more than reasonable workaround), I think it is more valuable for BCL team to invest into areas which do not have any existing alternatives yet. At least for now.
That said, if anyone from community is motivated to contribute towards the effort, we could come up with iterative plan how to enable parts of the work in waves in an experimental package, or something like that.

I believe currently there are no 3rd party options available for .NET Core and won't be for the forseeable future. The most popular one being Saxonica.com for .NET will not work because it relies on IKVM.NET (Java) and IKVM already announced will not be supporting .NET Core. The developer lost faith in .NET and no longer wants to pursue IKVM. Read his weblog here. The other one is Altova.com but it runs on COM so that won't work.

XSLT support is one of the most requested feature (1857 votes) in UserVoice but it's disappointing to hear no one is willing to invest in it.

Please reconsider.

@stephen-lim thanks for your info and link to UserVoice (I was not aware of that). It is definitely valuable information for us and we take it into account during planning.

I want to point out that while Jeroen stopped working on IKVM.NET, it does not preclude its port to .NET Core under different name.

Overall, I think it would be good to find out if any of the 3rd party libraries have intentions to move either to .NET Standard or .NET Core (incl. Saxonica.com).

Regarding XSLT upvotes - let's not forget the upvotes started piling up before .NET Core existed, therefore were targeted at .NET Framework, where the workaround exists in the form of 3rd party libraries.
Better indicator is IMO the number of upvotes on this issue - currently 47, which is pretty high up in CoreFX repo at (position 4). Although, I bet that some of those upvotes are not about .NET Core, but about general availability of the APIs also on .NET Framework, but we can ignore that for now.

We face a tough decision: The investment is pretty high and therefore it needs to be either deprioritized (as it is now), or it has to come at the cost of investments elsewhere, e.g. DirectoryServices, logging, fast consistent networking stack, CollectibleAssemblies, general performance improvements, just to mention a few.

Just to clarify: It is still on our backlog (as I hinted in previous reply), it is just not something we plan to prioritize right now. We are open to further feedback and information, and we are open to change our prioritization based on more data.

@stephen-lim are there other options out of the list posted above?

For example, I grabbed the first one, saxon9he-api, and ran apiport and it shows as 100% compatible. I didn't try using it.

In Visual Studio 2017 from August onwards, .NET Core apps will accept nuget packages even if they only claim to support desktop. In such a case it will warn and the onus is on the developer to run apiport and to test their scenarios, but in many cases we've found the libraries work fine and just aren't packaged explicitly for .NET Core 2.0 or .NET Standard 2.0 yet. If and when we find such libraries, we can reach out to help their owners repackage them.

@danmosemsft thanks for sharing the list. A few of them are not XSLT processors. If you read the comments posted in that link, you'll see others have evaluated and none of them will work for .NET Core.

We don't necessarily need XSLT 3.0. Is there a possibility to implement XSLT 2.0 as a start? That should reduce the amount of work needed and still bring a lot of improvements. XSLT 1.0, being the first release, is lacking a lot of features that is specifically addressed in 2.0

@stephen-lim I see only comments which tried to run it on .NET Core 1.x. .NET Core 2.0 has much larger area surface. I think it is fair to expect they may just work on .NET Core 2.0.

XSLT2 has the cost of 1y (not sure what the diff of XSLT3 is). And it depends on XPath2, so implementing "just that" doesn't make it suddenly cheaper/easier :(.

@karelz One of the problems with Saxonica (the only one that is open source) is it relies on IKVM, which is a huge binary because it's trying to bridge between the Java and .NET world. A lot of folks are hoping for a native implementation in pure .NET for a very long time. Some of the postings go as far back as 2013. This is the original User Voice request with 802 votes for XSLT 2.0. Later when XSLT 3.0 came out and MS still haven't implemented it, a new request was started to push for XSLT 3.0 (the one with 1857 votes) going into .NET Core by that time.

@stephen-lim does Saxonica rely on all of IKVM, or maybe just on its sub-components? (lightweight RefEmit maybe)

@karelz I'm not sure. Here is the published Saxon API that may shed some light and can provide a basis for future implementation.

@karelz Unforunately, the sort of developers who need XSLT 3.0 and XML stuff done are simply not the sort of people who are typically found on Github giving thumbs to a .NET core issue.
Developers use whatever documentation they're given with regards to XML. I appreciate that almost nobody is an XML enthusiast. It's not a hobby tech. Despite that, this issue is the no. 4 most voted/commented issue here.

This work would be extremely significant. It would unleash _all sorts_ of business application developers to achieve significantly more with their data and processes, and it would have flow-on effects all over Microsoft's own code-bases. For instance; doing more interesting things with WCF configs, web.configs, XAML, Office OOXML (docx xslx pptx et al), various web services, and much more. Microsoft should be pushing forward XML & data interoperability standards with .NET.

When .NET was introduced, XML was most of what drove the application architecture and the vision that came along with it. XML is not just 'some library in .NET'. The idea that we would use a third party library for the parsing the piece of connecting infrastructure that symbolizes the entire original intent of .NET doesn't pass the smell test, at least to me. There's a reason this is a top UserVoice issue. It's painful for everyone. This is an opportunity to address that pain, and justify the shift to core.

This whole discussion around API versioning, to me, just reflects fear. The entire point of .NET core is surely to _move past the fear_, and fix core issues. I’m not sure one could really get much more core to the vision of .NET than XML. Perhaps I’m wrong, but that’s how I remember .net starting out.

Please, get the edge/whatwg team involved, get the spec guys involved, sort the issues out. The failure to deal with this causes fragmentation, and worst of all, it causes Java projects. This is why hundreds and thousands of enterprise integration developers who are living on a burning Oracle/J2EE platform simply can't use Microsoft. These are surely a prime audience for core. I would love to see Microsoft be a leader in this space, rather than a drag on the industry. Let's get up to date with standards, and let's start pushing them forward.

When I glance through the Github and UserVoice, I feel the roadmap decisions are almost arbitrarily based on what's easy and what's cool, and not necessarily what's needed for business. In this case, the people have voted loudly for XSL 2/3 only to be shot down. Why even have a vote in the first place?

It reminds me of users asking Microsoft to build a Web standard compliant IE. It took years to arrive and is now too little, too late. Our team has seriously considered moving the development to Java. Only .NET will continue to bleed developers for as long we still have big gaps like this that cannot be ignored.

@stephen-lim, I have seen a start-up adopt Java + IaaS over Azure as a platform for this reason alone. They created an entire custom crafted stack that basically replicates BizTalk Services. BizTalk would have done everything they wanted, except that the MS stack (admittedly non-core) couldn't handle XML transform (schematron) requirements, which were built into third party systems backed by legislation. This has been the deciding factor in canning or scaling back so many projects that I know would could have moved the world forward (and netted in licensing profit for MS). Large data interchange contracts are the most heart-breaking point for the stack to let you down. It indicates a proprietary mindset which is surely a thing of the '90s at this point.

A project I was using (Wyamio/Wyam#340) prepares for a shift to netstandard. This is why the discarded saxon and no longer support xslt 2.0. Unfortunatly I need this and for now I can't update to newer Versions of the project untill I refactored the saxon part out of there code. Fortunatly there code is very modular.

But I will still miss the step into netstandard wich I find sad 😢

@maxtoroq Thanks for sharing. Exselt is likely dead because it's been kept in Beta since 2013. I tried contacting them several times and they never replied. XmlPrime is probably out of reach for many because their redistribution license is around USD $6000 per annum. As of now, none of them in the list have announced .NET Core compatibility or have a clear path to do so.

In the beginning of 2016, Exselt was still alive. I used it and spoke with its creator, Abel Braaksma at XML Prague. However, I think that the lack of (paid) interest has lowered its priority for Abel. Exselt was used in a workshop at XML Prague, and I thought it was pretty good.

@karelz Silverlight/XAML issues, InfoPath issues, Classic -> Modern SharePoint issues, and BizTalk issues, can all be tracked back to lack of this functionality. Again, all Office documents are XML-based. Think of the developer hours wasted creating new experiences and not updating old ones, because of inadequate transformation tooling. The lack of this functionality is not just bad for third party developers. It's fundamentally hampering the competitiveness of existing Microsoft products.

Interesting part is: product-wise, Microsoft is the largest consumer of XML serialization in the world, just search how many *proj and *.config files alone are in existence for MSBuild execution.. then every enterprise product by Microsoft relies on or primarily supports XML data. If IBM and Oracle have heavily invested in XML techs in past two decades to continuously implement new standards, Microsoft should too.

VS validates every single project file against XML schema, yet .NET doesn't support six years old XSD Schema 1.1 standard and all the rest of X-technologies beyond 1.0 standard. If you are curious what XSD 1.1 + XPath 3 can achieve that 1.0 can't, take a look at the biggest feature we miss every day and night in .NET -> "Assertions": https://blogs.infosupport.com/exploring-cool-new-features-of-xsd-1-1/.

The investment is pretty high

Wouldn't it always be the case? Either it will never happen, or it has to start at some point. And if it has to happen at some point, then I think every team in Microsoft that uses XML-based techs can contribute / share the cost for the effort to implement latest recommendations in CoreFX:

https://www.w3.org/standards/techs/xpath
https://www.w3.org/standards/techs/xmlschema

@karelz can you reconsider this request? You got a large number of people requesting for this feature. Please help us push this through.

@stephen-lim we are aware that this is in top 2-3 most upvoted issues on CoreFX repo and we repeatedly take it into consideration when planning. If/when we decide to invest in the space, we will update the issue.

Another use-case I didn't found mentioned in this issue, is, with XPath 2 / XSL 2 support, we are able to use schematron (http://schematron.com/) for xml data / business rules validation.

There's a few .net projects who try to fill this gap, but they lack full schematron support or are outdated and no longer maintained.

With X* 2+ support, we automatically get support for schematron file transformations.

XML / XSD / Schematron is heavily used by standards like UBL (Universal Business Language) and derivatives.

Understandably, implementing this is a major undertaking, but having native support in .NET would be a major win in this space for businesses, who now need to rely on something like Saxion, and/or create a bridge between java tools and libs.

Adding another comment here because there is still no sign of XPath 3 \ XSLT 3 implementation post .NET Standard 2.0 release. Any further progress @karelz @danmosemsft

@hmobius we have no work planned here at this time. The libraries team are working on other things this release such as support for winforms/WPF apps, IoT, ML, JSON, UTF8, updated networking stack, lower allocations, etc. I realize this isn't what you want to hear but we are being transparent about priorities.

Any updates on this implementation of XPath 3.0?

Nope, the above still holds. We currently do not have any plans to invets in this area. It may change post-3.0 or later.

@stephen-lim we are aware that this is in top 2-3 most upvoted issues on CoreFX repo and we repeatedly take it into consideration when planning. If/when we decide to invest in the space, we will update the issue.

If this is the top 2-3 most upvoted issues in the corefx repo, then why is it not being prioritized before some of all the other stuff that is OUTSIDE the top voted items. Its a bit weird to say the least. We have been asking for this for year - and I guess we all manage without it (we resort to specific java apps to solve our needs most of the time), but it is a bit annoying to have to work around.

Being top voted != guarantee it will be invested into. SW development is more complicated than that and votes are just one angle how to get info about customer needs and prioritize them.
Also look above for my explanation of associated costs (super high), security risks and ongoing security maintenance cost - in https://github.com/dotnet/corefx/issues/2295#issuecomment-336193617

It shouldn't be a surprise that similar passion and frustration "why is it not fixed yet" is expressed on almost every high profile issue and on quite a few 2-3 upvoted issues.

Incidentally, we reopened the funding discussion again couple of days ago internally (no guarantee how it will end!) ... just demonstrating we are not ingoring feedback/upvotes, it is just sometimes more involved than one would think.

Given all these fairly high costs and the fact 3rd party solutions exist (which seems to be more than reasonable workaround), I think it is more valuable for BCL team to invest into areas which do not have any existing alternatives yet. At least for now.

The only alternative we could use was Saxon. It is built in Java and uses IKVM to interop with the .net library. Not just it is slow but also that I can't use the dotnet standard/core for my applications. It is not a show stopper but certainly not the desired state to be in.

Seriously this got booted to next year/release? SMH

I reached out to XmlPrime (https://www.xmlprime.com/xmlprime/) and they confirmed that they have completed .NET Core support now. This is a commercial offering, so this isn't a solution for everyone. If you try this - it would be great to post your results back here to help others in the community.

I reached out to XmlPrime (https://www.xmlprime.com/xmlprime/) and they confirmed that they have completed .NET Core support now. This is a commercial offering, so this isn't a solution for everyone. If you try this - it would be great to post your results back here to help others in the community.

Slightly off-topic, but now I'm curious (since we're in urge need of such a thing).

The website doesn't seem updated yet with any new information (or downloads) about that one, but I'd be also interested in information on it, if it has async API what the performance is and if it utilize the new .NET Core APIs such as Span<T>/Memory<T> and/or pipelines? Especially compared to Saxon.NET (via IKVM on the full .NET Framework) and the .NET XSLT Processor ?

And whens that one supposed to get released to the public?

P.S. How about a proposal to acquire this guys? :P

It seems Microsoft does not invest in APIs where third parties are already providing products in (see SFTP). However I think this is different. XML is a core component used throughout the Microsoft ecosystem, and should be treated as such.

I think XPath 3 also support JSON which would be a good addition to the new JSON API's.


I would also like to see Visual Studio tools supporting higher versions of XSLT.
In the past I had used a 3rd party library for .NetFramework. But for Visual studio constantly complained about the XSLT files in my project since it only understood XSLT 1.1 (I think).

XmlPrime is not usable for many projects. Their licensing is very restrictive and expensive that is unreasonable for many open source projects and small businesses.

Please consider adding support for XSL 3 in .NET core. This is a much requested feature. It's long overdue.

My concern with XmlPrime is their website has not been updated since what appears to be 2018 🤷‍♂ and direct email to their sales email address has gone unanswered so far. If their responsiveness to a potential sale and their attention to detail in regard to their website content is any indication of their product quality, we should all have some reservations about paying for that product.

Actually, XmlPrime's pricing reflects the cost of producing an advanced piece of technology. Be careful what you ask for: Microsoft's reluctance to implement these standards is strongly affected by (some) users' reluctance to pay for them.

What about a cost proposal to work out the code and a "gofundme" campaign to pay for it? I think there's enough demand for it, we all could throw in $100 and this would get done within the year.

All the XML specifications (save for a few by OASIS) were developed by the W3C and were meant to be part of modern web infrastructure such as web browsers. The shift from XHTML probably hampered that effort, but nonetheless people expected this to be core infrastructure (i.e. part of platforms).

What about a cost proposal to work out the code and a "gofundme" campaign to pay for it? I think there's enough demand for it, we all could throw in $100 and this would get done within the year.

I'm happy to fund the $100 but how do you know it's enough to get developed? I think XSL is not a simple implementation. It takes a lot of hard work to build.

It seems pretty clear given how long this issue has been around that it really isn't a priority for Microsoft, and that this needs to be an open source effort. It's also clear that implementing XSLT is not a trivial thing. There is a list of projects here but the only one we might be interested in is a form of XPath2.net. Saxon is open source but only in Java so maybe there is scope for a port to .NET rather than the transpiled .NET version currently available. The plus side at least is that the test suite is available as XSLT, XPath (and XQuery) are clearly defined standards.

What about a cost proposal to work out the code and a "gofundme" campaign to pay for it? I think there's enough demand for it, we all could throw in $100 and this would get done within the year.

I'm happy to fund the $100 but how do you know it's enough to get developed? I think XSL is not a simple implementation. It takes a lot of hard work to build.

It most certainly would be an effort to get public support for this. I would think you would start with the individuals who up-voted this issue on Microsoft's user voice site. From there, spreading the initiative among .NET user groups, etc. I would think 1,000 devs/companies offering $100 each would do the trick to get the effort underway and to a working beta release. 🤷‍♂

@michaelhkay

Be careful what you ask for: Microsoft's reluctance to implement these standards is strongly affected by (some) users' reluctance to pay for them.

I don't think that's true. Our primary motivations for doing platform features are:

  1. Is this a core concern for many users?
  2. Would adding it to the platform benefit the feature?
  3. Is this a feature that we likely need as a building block for other platform features?

I'm not aware of cases where pricing of external components have influenced our decision; however, the availability of widely used external libraries (commercial or not) does influence our assessment of how beneficial/harmful our involvement would be.

In the case of XSLT 3, I think our interest (or lack of thereof) is informed by the direction of the web/client industry as a whole. Right now, I can't see a world where supporting it would likely become a priority for us.

Thanks Immo. So that sums it up gang that's it's likely not going to happen. OSS initiative will be the only solution here. Getting technical expertise and developers to dedicate the effort to implement something similar to Saxon or XmlPrime is a relatively large undertaking.

Pretty much, which is why I'm closing this.

@terrajobst

In the case of XSLT 3, I think our interest (or lack of thereof) is informed by the direction of the web/client industry as a whole. Right now, I can't see a world where supporting it would likely become a priority for us.

I think that's been my frustrations for a long time. In my view XSLT is much less useful for the "traditional" web/client activities than it is for a more generalized standard data transformation framework. I've used XSLT in several project for that type of role, to good effect. However, the restriction of only having XSLT 1.0 as part of the standard environment limits capabilities and further adoption for those other applications. It's a catch-22.

I've been waiting for XSLT > 1.0 for over 10 years now. Sounds like it's still not going to happen in standard libraries.

@michaelhkay

Be careful what you ask for: Microsoft's reluctance to implement these standards is strongly affected by (some) users' reluctance to pay for them.

I don't think that's true. Our primary motivations for doing platform features are:

  1. Is this a core concern for many users?
  2. Would adding it to the platform benefit the feature?
  3. Is this a feature that we likely need as a building block for other platform features?

I'm not aware of cases where pricing of external components have influenced our decision; however, the availability of widely used external libraries (commercial or not) does influence our assessment of how beneficial/harmful our involvement would be.

In the case of XSLT 3, I think our interest (or lack of thereof) is informed by the direction of the web/client industry as a whole. Right now, I can't see a world where supporting it would likely become a priority for us.

If this is the determining factor, then we can argue the case:

  1. Is this a core concern for many users?
    Yes, XSLT 2+ was one of the top 3 most requested feature back when it was voted through the VisualStudio UserVoice. See archive link has 2817 votes "Implement XSLT 3.0 for .NET"

  2. Would adding it to the platform benefit the feature?
    Absolutely, there are no available 3rd party open source, free or otherwise affordable solution for open source projects and small businesses. XSLT 2 or 3 brings a wealth of improvement that fixes the shortcomings of XSLT 1.0 increasing productivity.

  3. Is this a feature that we likely need as a building block for other platform features?
    Yes, XSLT is a standard. It is widely used in:

  • Sharepoint
  • SQL Server has native support for XML column and XPath query. One can even write managed code to return transformed XML using XSL.
  • Many popular CMS like Umbraco, DNN still use XSLT to transform XML data for display
  • Many large enterprise still use XML (probably more than JSON) and need ability to manipulate the XML easily.

@stephen-lim your examples show that XML and XSLT are widely used but not v3 specifically.

@stephen-lim your examples show that XML and XSLT are widely used but not v3 specifically.

SQL server partially supports Xpath v2. There isn't wide support for v3 because Windows/ASP.NET software like Sharepoint, DNN, Umbraco ultimately rely on the .NET libraries, which only supports XSLT v1. On the other hand, you can find many more examples of v2 and v3 support in Java apps.

The short story is thousands of developers have been asking Microsoft to support v2 for the last 10 years. At one point, Microsoft said they would strongly consider implementing XSLT 2, but that stopped as soon as they started working on LINQ and XQuery. Fast forward today, the v3 spec is out and the hope is that Microsoft should add support for v3, if not v2.

@stephen-lim your examples show that XML and XSLT are widely used but not v3 specifically.

Well, to be honest, XSLT 2.0 and XPath 2.0 would be a huge improvement already. XSLT 1.0 is very very limiting (major blockers being lack of user defined functions - You just have templates, but these can't be used as part of XPath Expressions), same applies for XPath 2.0 (Lot of functions missing, no wildcard for Namespaces (i.e. no `/*:elementName``)l

Sure, XPath 3.0 and XSTL 3 would be awesome (i.e. exception throwing and try/catch from XSLT). But XSTL 1.0 is just seriously lacking to much features to really consider it.

I'm rather tempted to extract the whole XSLT processor as an Java-based Microservice, rather than falling back to XSLT 1.0/XPath 1.0 (Saxon.NET via IKVM.NET on .NET Framework is not an option)

As far as Saxonica is concerned, we are eagerly awaiting technical details of what Microsoft is proposing to offer under the "Java interoperability" feature promised in the .NET 5 announcement; that will determine our forwards path for Saxon on .NET. If anyone knows of any details that have been published since the May 2019 announcement, please share!

Regarding Saxon and .NET Core, IKVM is obviously shelved. Why not take the runtimes, decomiple using something like DotPeek to C#, and refactor to .NET Core, then implement Saxon to use those libs? I'm sure the IKVM folk wouldn't mind considering they've abandoned ship?

We're looking at a number of options (which is why we really want to know what .NET 5 will offer), but obviously we're very keen to avoid forking the source code.

Regarding Saxon and .NET Core, IKVM is obviously shelved. Why not take the runtimes, decomiple using something like DotPeek to C#, and refactor to .NET Core

Not sure what you mean. IKVM.NET is open source... there is just no one to take it over. IKVM.NET author already offered others to take over the project under the condition it's renamed to something else.

But not sure how much sense that makes anyways, since (as far as I know) it required a lot of changes for each new JRE version, which now ship bi-annually rather than once every 3-5 years

@michaelhkay Completely understand about forking the source code but I for one would be very interested in working on a port around XSLTXPath in .NET using the new features we have in C#. I'm curious if rather than forking the source code, we could fork \ port the code for the test suite and work from there.

There are good test suites for XSLT 3.0, XPath 3.1, and XQuery 3.1 on GitHub, and we're happy to share our test drivers. The bulk of the test material is in XML files and is 100% portable; creating a test driver to run the tests on a particular platform is a fairly trivial exercise. The only other requirement is API testing, which is specific to each platform/API/language-binding. But the source code for the product itself is 600K lines of Java so that's a major undertaking.

I would assume the problem with SAXON isn't so much the source code being in Java, as it having dependencies on third party Java libraries, which may be hard to decouple.

No, that's not the case. Saxon's dependencies on third party (non-JDK) libraries are very easily isolated and decoupled. Where such dependencies exist (e.g on the ICU-J library) you can either port the third party code as if it were part of Saxon, or you can make do without it.

There are good test suites for XSLT 3.0, XPath 3.1, and XQuery 3.1 on GitHub, and we're happy to share our test drivers. The bulk of the test material is in XML files and is 100% portable; creating a test driver to run the tests on a particular platform is a fairly trivial exercise. The only other requirement is API testing, which is specific to each platform/API/language-binding. But the source code for the product itself is 600K lines of Java so that's a major undertaking.

Where the test cases? Can you give a link?

https://github.com/w3c/xslt30-test (XSLT 3.0)
https://github.com/w3c/qt3tests (XQuery 3.1, XPath 3.1)
https://github.com/w3c/xsdtests (XSD 1.1)

In each case the test suites also include tests for earlier versions, labelled as such in the test metadata.

+1, spec suites are the way to go for realistic and reliable conformance testing. I have some experience with writing spec suite adapter for Sass' and YAML's .NET implementations. If the porting effort transpires out in open, I am willing to contribute. :)

I reached out to XmlPrime (https://www.xmlprime.com/xmlprime/) and they confirmed that they have completed .NET Core support now. This is a commercial offering, so this isn't a solution for everyone. If you try this - it would be great to post your results back here to help others in the community.

A .Net Core trial version of XmlPrime 4.1.3 is now available as a signed NuGet package.

Just send me a message or drop us an email ( [email protected] ) saying what area you would like to test it in and we will send you a download link.

Micah Edwards.
XmlPrime.

@MicahEdwards What are the costs for the full product? You don't display them online. Some pages say that I can purchase licenses online but then when I go to those pages, I'm told that I can't purchase it online.

So I guess the simple question is where can I see a breakdown of your prices? I shouldn't need to contact you to get these - they should just be available on your website.

Was this page helpful?
0 / 5 - 0 ratings