Runtime: GetHashCode() randomization will cause XML serialization to be inconsistent

Created on 27 Mar 2020 · 5Comments · Source: dotnet/runtime

We commonly use the following pattern for regression tests of the reporting part of our application:

We create a report. The result is a complex object
We serialize the report, using XML
We compare the result against a known version.
If anything changed, the test will fail

It's a simple approach that works very well to detect regression. Comparing the files will usually give a clue of what's going on.

Recently, we have migrated our test assemblies to .net core and now we are running into an issue: Namespace declarations are serialized in a different order between test runs:

Run 1
<ImportSpecification xmlns:xsd="http://www.w3.org/2001/XMLSchema" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">

Run 2
<ImportSpecification xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:xsd="http://www.w3.org/2001/XMLSchema">

The namespaces are stored in the class XmlSerializerNamespaces, which internally uses a Dictionary<string, string> to store the namespaces. The order in which namespaces are enumerated depends on the result of the method GetHashCode(). In .net framework, the result of the GetHashCode() was consistent between runs. This behaviour has changed for .NET core as detailed in this issue.

In .NET framework, the behaviour for GetHashCode() can be changed using the App.config setting <UseRandomizedStringHashAlgorithm />. For .NET core, this is not possible. Is there a way to make serialization of namespaces stable between runs? This could be done by simply storing the order in which namespaces are added in the class XmlSerializerNamespaces. I'd be happy to provide a pull request.

area-Serialization untriaged

Source

AmbachtIT

Most helpful comment

I suggest changing the method of differencing to canonicalize the xml and then directly compare those outputs. The canonicalization transforms are in the System.Cryptography.Xml namespace because they're used in the digital signing process.

Wraith2 on 27 Mar 2020

👍2 ❤1

All 5 comments

Wraith2 on 27 Mar 2020

👍2 ❤1

Note that your current approach is fragile anyways, because the ordering of elements is not guaranteed to be based on anything. The iteration order is a side effect of multiple internal implementation details for the various manipulation methods (Add, Remove, etc), and should not have been relied upon in this fashion.

Clockwork-Muse on 27 Mar 2020

👍2

I agree the approach it's fragile, because there are no guarantees of serialization order in the XmlSerializer. That being said, there are scenario's (like the one I described), where a stable serialization order is a desirable feature.

AmbachtIT on 28 Mar 2020

Sure, but the nature of xml itself is that order isn't required for documents to be considered functionally equivalent. That's why canonicalization exists for it, to define a single layout for comparison.

What you're asking for is to revert changes made in core so that private implementation details are the same as netfx, that isn't sensible or something you should expect to happen. No promises were made about the order of serialization and while it may have been the same on previous versions of netfx it is not on core. Now you get to choose how you work around the invalid assumption of your test.

Wraith2 on 28 Mar 2020

👍1

I think this is resolved.