Runtime: JsonSerializer polymorphic deserialization support

Created on 28 Jun 2019  路  28Comments  路  Source: dotnet/runtime

public class FooBase
public class FooA : FooBase
public class FooB : FooBase
List<FooBase> SerializationObject = new List<FooBase> { new FooA(), new FooB() }

Serialized with Newtonsoft.JSON

JsonSerializerSettings Settings = new JsonSerializerSettings
{
    SerializationBinder = new KnownTypesBinder
    {
        KnownTypes = new List<Type> { typeof(FooA), typeof(FooB) }
    },
    TypeNameHandling = TypeNameHandling.Objects
};
List<FooBase> FooBases = new List<FooBase>() { new FooA(), new FooB() };
var json = JsonConvert.SerializeObject(FooBases, Settings);

will be:
[{"$type":"FooA","NameA":"FooA","Name":"FooBase"},{"$type":"FooB","NameB":"FooB","Name":"FooBase"}]

When using System.Text.Json.JsonSerializer to deserialize the json, FooA and FooB are both type of FooBase. Is there a way that JsonSerializer supports inheirited classes? How can I make sure the type will be the same and not the base class it inherits from?

area-System.Text.Json enhancement json-functionality-doc

Most helpful comment

I appreciate that the community are trying to backfill this missing functionality as a stopgap measure, but I have to join the chorus of voices here: it's crazy to release a JSON serialiser without polymorphic serialisation/deserialisation support, for an _object orientated language_, and then push people towards using it instead of Newtonsoft. I get it brings perf/allocation improvements, and that really is very much appreciated, but it also needs to actually be capable of real-world serialisation! :zany_face:

All 28 comments

Figured it's somehow possible using JsonConverter, which is for some reason completely undocumented https://docs.microsoft.com/en-us/dotnet/api/system.text.json?view=netcore-3.0 but I have no idea how to deserialize without copying each property by hand like it's shown in this example https://github.com/dotnet/corefx/issues/36639. Copying properties by hand seems extremely odd as classes may have lots of them and maintainability isn't given in such a case anymore. Is there a way I can simply deserialize into a specific type in the converter?

Or am I missing something and there is a simple solution like in Newtonsoft.Json ?

public class FooAConverter : System.Text.Json.Serialization.JsonConverter<FooBase>
{
    public override bool CanConvert(Type type)
    {
        return typeof(FooBase).IsAssignableFrom(type);
    }
    public override FooBase Read(ref Utf8JsonReader reader, Type typeToConvert, JsonSerializerOptions options)
    {
        reader.Read();
        var typeProperty = reader.GetString();
        if (typeProperty != "$type")
            throw new NotSupportedException();
        reader.Read();
        var typeValue = reader.GetString();
        FooBase value;
        switch (typeValue)
        {
            case "FooA":
                value = new FooA(); //<-- return deserialized FooA?
                break;
            case "FooB":
                value = new FooB(); //<-- return deserialized FooB?
                break;
            default:
                throw new System.Text.Json.JsonException();
        }
        while (reader.Read())
        {
            //How to deserialize without copying every property at this point?
            if (reader.TokenType == JsonTokenType.EndObject)
            {
                return value;
            }
        }
        throw new NotSupportedException();
    }

    public override void Write(Utf8JsonWriter writer, FooBase value, JsonSerializerOptions options)    {    }
}

cc @steveharter

It's pretty awkward that JsonConverter doesn't give you a way to recurse back into the "regular" flow (i.e. "not handled by this particular converter") for deserializing subcomponents.

This seems to make the whole design only useful for very simple cases, with few properties and no nested custom types.

Figured it's somehow possible using JsonConverter, which is for some reason completely undocumented https://docs.microsoft.com/en-us/dotnet/api/system.text.json?view=netcore-3.0

It's in the sub-namespace Serialization: https://docs.microsoft.com/en-us/dotnet/api/system.text.json.serialization?view=netcore-3.0

I have no idea how to deserialize without copying each property by hand like it's shown in this example

and

t's pretty awkward that JsonConverter doesn't give you a way to recurse back into the "regular" flow (i.e. "not handled by this particular converter") for deserializing subcomponents

Yes that is known - the existing converters are bare-bones converters that let you do anything you want, but also require you do essentially do everything for a given type (but you can recurse to handle other types). They were primarily intended for data-type converters, not for objects and collections.

We are working on making this better in 5.0 timeframe by providing object- and collection- based specific converters that would make this much easier, as well as improve performance in certain cases.

but you can recurse to handle other types

@steveharter That seems like it could help. Can you give an example of how to do this?

We are using custom converters like the following for inherited class support. I'm pretty sure deserialization is abysmal from a perf point of view, but it works and doesn't need much code and serialization works as it should. So as a Workaround you can use somthing like this:

public class ParameterTransferObjectConverter : JsonConverter<ParameterTransferObject>
{
    public override ParameterTransferObject Read(ref Utf8JsonReader reader, Type typeToConvert, JsonSerializerOptions options)
    {
        using (var doc = JsonDocument.ParseValue(ref reader))
        {
            ParamType typeDiscriminator = (ParamType)doc.RootElement.GetProperty(@"parameterType").GetInt32();
            var type = ParameterManager.GetTransferObjectType(typeDiscriminator);
            // Enhance: doc.RootElement.GetRawText() has likely bad perf characteristics
            return (ParameterTransferObject)JsonSerializer.Deserialize(doc.RootElement.GetRawText(), type, options);
        }
    }

    public override void Write(Utf8JsonWriter writer, ParameterTransferObject value, JsonSerializerOptions options)
    {
        var type = ParameterManager.GetTransferObjectType(value.ParameterType);
        JsonSerializer.Serialize(writer, value, type, options);
    }
}

Thanks @ANahr, indeed performance must be abysmal in comparison, but this allowed me to at least get deserialization to work till System.Text.Json matures, in my opinion it's a mistake for Microsoft to encourage people to use it, it should still be flagged as preview

Marking this as the issue to track polymorphic deserialization for 5.0.
Serialization is tracked in https://github.com/dotnet/corefx/issues/38650.

From @Milkitic in dotnet/corefx#41305

For some reasons that Json for configuration use, I did a test on how the System.Text.Json handles the interface deserialization. I read the API and can't find anything about this function. First I tried IEnumerable<T>. Here is the code:

using System.Collections.Generic;
using System.Text.Json;

namespace dotnet30test
{
    class Program2
    {
        static void Main(string[] args)
        {
            var obj = new TestSerializationObj2
            {
                Names = new HashSet<string> { "1", "2" }
            };
            var sysContent = JsonSerializer.Serialize(obj, new JsonSerializerOptions { WriteIndented = true, });
            var sysNewObj = JsonSerializer.Deserialize<TestSerializationObj>(sysContent);
        }
    }

    public class TestSerializationObj2
    {
        public IEnumerable<string> Names { get; set; }
    }
}

The Names property is forced to convert from Hashset<string> to List<string>, but it works successfully indeedly.
And then I try the custom interface, it throws JsonException. What I think that needed is something like Json.NET's auto type name handling. Here is the code:

using System;
using System.Collections;
using System.Collections.Generic;
using System.IO;
using NewtonJson = Newtonsoft.Json;
using SysJson = System.Text.Json;

namespace dotnet30test
{
    class Program
    {
        static void Main(string[] args)
        {
            var obj = new TestSerializationObj
            {
                CustomObj = new DefaultCustom()
            };

            var newtonContent = NewtonJson.JsonConvert.SerializeObject(obj,
                new NewtonJson.JsonSerializerSettings
                {
                    TypeNameHandling = NewtonJson.TypeNameHandling.Auto,
                    Formatting = NewtonJson.Formatting.Indented
                });
            var newtonNewObj = NewtonJson.JsonConvert.DeserializeObject<TestSerializationObj>(newtonContent,
                new NewtonJson.JsonSerializerSettings { TypeNameHandling = NewtonJson.TypeNameHandling.Auto }); // works

            var sysContent = SysJson.JsonSerializer.Serialize(obj, new SysJson.JsonSerializerOptions { WriteIndented = true, });
            var sysNewObj = SysJson.JsonSerializer.Deserialize<TestSerializationObj>(sysContent); // throws exception
        }
    }


    public sealed class TestSerializationObj
    {
        public Dictionary<string, int> Dictionary { get; set; } = new Dictionary<string, int>() { ["a"] = 0, ["b"] = 1 };
        public ICustom CustomObj { get; set; }
    }

    public class DefaultCustom : ICustom
    {
        public string Name { get; set; } = "Default Implementation";
        public int Id { get; set; } = 300;
    }

    public interface ICustom
    {
        int Id { get; set; }
    }
}

Thanks!

Please try this library I wrote as an extension to System.Text.Json to offer polymorphism: https://github.com/dahomey-technologies/Dahomey.Json

Why has this been removed from 5.0 roadmap?! We can currently not deserialize any POCO class, as well as any class which has an interface. This is definitely not a minor issue as both is widely used. Its a blocking issue for anyone currently using Newtonsoft. And its a known issue since about a year, so there was plenty of time. I'm sorry to say but I'm disappointed if this is something we need to wait another whole year when it ships with .NET 6 instead of .NET 5. And the bad taste also comes from the fact that serialization of these classes is supported already. So we have JSON support which only works into one direction only, it does not feel finished and ready to be shipped with .NET 5.

I've just discovered this thread and I decided to publish a solution I've been using in my projects.
It's an implementation of abstraction converter factory very easy to introduce.
It might be helpful till the issue will be closed :)
https://github.com/lwardzala/Json.Abstraction

I appreciate that the community are trying to backfill this missing functionality as a stopgap measure, but I have to join the chorus of voices here: it's crazy to release a JSON serialiser without polymorphic serialisation/deserialisation support, for an _object orientated language_, and then push people towards using it instead of Newtonsoft. I get it brings perf/allocation improvements, and that really is very much appreciated, but it also needs to actually be capable of real-world serialisation! :zany_face:

Just because the language is polymorphic doesn't mean messages should be, json and XML are NOT polymorphic . At best you use a tagged union which some languages may interpret as data for a potentially polymorphic object and should be more in the app domain than the plumbing eg model the simple message and build the object from it. If your real OO where is your SOLID/ separation of concerns ? . Many consumers of messages are not OO and its pretty bad practice IMHO to put this on the wire and i have seen it cause tons of time wasting and issues. Its even better when you need to pass it to some other part of an organization or a partner who likely wont be using the same lib /language.

So my 2c / opinion for what its worth is I consider it bit like immutable strings, stop developers doing potentially bad things you really don't need in most cases. A community / extra lib is about the right level in terms of you can pull it in but you are forced to ask do i really need this or is it for legacy sake / i don't have time / money to change it now.

KISS

So If I have an API based on contracts that share common properties I cannot make use of the language features is that it? I will need to replicate code? potencially increasing Technical Debt?

C# is ( until now ) an OOP language, so why are we building libraries that remove from us the usage of the basic language features?

If the language nativelly has feature why are we creating that limitation?

Simple example:

public class ApiResult<T>  {

    public T Payload {get; set;}

}
public class PaginationApiResult<T> : ApiResult<IEnumerable<T>>
{
   public int Take {get; set; }
   public int Skip {get; set; }
  public long TotalRecords {get; set;}
}

Some other examples like Having a contract on my Frontend Api where I agregate data from multiple backends ( each with their own responsabilty) I want to have the capacity to minimize the number of requests to my Frontend API so reusing models helps me where I return a General Description of my Api Model and another contract where I return some detailed description of that model:

public class PersonGeneralInfo {

public string CitizenId {get; set;}
public string FirstName {get; set;}
public string LastName {get; set;}
public DateTime BirthDate {get; set;}

}

public class EmployedPersonInformation : PersonGeneralInfo {

public DateTime StartedAt {get; set;}
public string CompanyName {get; set;}

}

public class PersonHealthInfo : PersonGeneralInfo {

public double Height {get; set; }
public double Weigth {get; set; }
public double BloodPresure {get; set; }

}

public class PersonLocationsInfo : PersonGeneralInfo {

public Address Home {get; set; }
public Address Work {get; set; }
public IEnumerable<Address> FavoritesRestaurants {get; set; }

}  

Many of these consumers are generated from swagger to typescript or C# applications or even Java Applications.

@bklooste you are only partially right. There are a lot of projects where people intentionally use C# on the server and on the client (for example Blazor) and these projects don't have to interact with components written in other languages. If it is necessary programmers can design non polymorphic interfaces but if it is not necessary we shouldn't have artificial limitations.

Every real business application uses decimal data type for monetary calculations. This data type is not supported in JavaScript and plenty of other languages. Does that mean we shouldn't use it in our programs? No, we need this data type and we don't care that some decimal numbers cannot be deserialized properly in JavaScript.

We use C# because it is an object oriented language and we also want to use polymorphism in JSON serialization/deserialization.

Honestly, I'm getting tired of this discussion about the language and polymorphic. I totally got the point of "OOP is C# only" and "In the internet JSON isn't meant to support polymorphic". But here is the fact: If this JsonSerializer is meant to be a SERVER/INTERNET thing where OOP is out of scope.. then it should have been moved into System.Web, same as the existing JavascriptSerializer. Then nobody here would complain about missing polymorphic support. Because its god damn clear that this is a SERVER/INTERNET thing only.

But... it isn't in System.Web, so people complain about missing polymorphic support and they complain for a good reason. Even the old XmlSerializer supports polymorphic without too much additional code.

So please stop and close this tab if you aren't interested in polymorphic support and simply accept the fact that many developers are and the way Microsoft has announced and design this JsonSerializer (Its not in System.Web and its many times compared against Newtonsoft.JSON which DOES support polymorphic) gives us the right to complain and ask for this missing feature.

Thanks for your attention and have a nice day, no matter if you agree with me or not.

Im not even talking about http ... durable functions for example use newtonsoft to store state between activities i think... probably due to some of the current limitations such as this one. I 100% agree that serialization should be considered to be used not only on the comunication world, but also to store state for example, ( REST is transfer state and databases is to store state) and for that use case i would need to have a persistence model without OOP polymorphic feature? I think without Polymorphic support on Text.Json we will be reducing the usecases instead of being a more generic json serialization available to any use case.

I am publishing and consuming Kafka messages with their bodies in JSON format and I feel like the security concerns that blocked System.Text.Json from implementing polymorphic deserialization don't apply here.

I'm always happy to be educated about security but the way I see it is that the interactions between the Kafka cluster and my applications are closed to outside actors and the JSON is always going to be trusted.

So my preference would be for System.Text.Json to support this, and have it off by default to prevent people from shooting themselves in the foot if they do have untrusted input. But for trusted input scenarios I'm allowed to turn it on and use it.

Here's my attempt:

```c#
public class TypeMappingConverter : JsonConverter
where TImplementation : TType
{
[return: MaybeNull]
public override TType Read(
ref Utf8JsonReader reader, Type typeToConvert, JsonSerializerOptions options) =>
JsonSerializer.Deserialize(ref reader, options);

public override void Write(
Utf8JsonWriter writer, TType value, JsonSerializerOptions options) =>
JsonSerializer.Serialize(writer, (TImplementation)value!, options);
}

Usage:

```c#
var options =
   new JsonSerializerOptions 
   {
     Converters = 
     {
       new TypeMappingConverter<BaseType, ImplementationType>() 
     }
   };

JsonSerializer.Deserialize<Wrapper>(value, options);

Tests:

```c#
[Fact]
public void Should_serialize_references()
{
// arrange
var inputEntity = new Entity
{
References =
{
new Reference
{
MyProperty = "abcd"
},
new Reference
{
MyProperty = "abcd"
}
}
};

var options = new JsonSerializerOptions
{
WriteIndented = true,
Converters =
{
new TypeMappingConverter()
}
};

  var expectedOutput =

@"{
""References"": [
{
""MyProperty"": ""abcd""
},
{
""MyProperty"": ""abcd""
}
]
}";

// act
var actualOutput = JsonSerializer.Serialize(inputEntity, options);

// assert
Assert.Equal(expectedOutput, actualOutput);
}

[Fact]
public void Should_deserialize_references()
{
// arrange

var inputJson =
@"{
""References"": [
{
""MyProperty"": ""abcd""
},
{
""MyProperty"": ""abcd""
}
]
}";

var expectedOutput = new Entity
{
References =
{
new Reference
{
MyProperty = "abcd"
},
new Reference
{
MyProperty = "abcd"
}
}
};

var options = new JsonSerializerOptions
{
WriteIndented = true
};

options.Converters.AddTypeMapping();

// act
var actualOutput = JsonSerializer.Deserialize(inputJson, options);

// assert
actualOutput
.Should()
.BeEquivalentTo(expectedOutput);
}

public class Entity
{
ICollection? _References;
public ICollection References
{
get => _References ??= new HashSet();
set => _References = value;
}
}

public interface IReference
{
public string? MyProperty { get; set; }
}

public class Reference : IReference
{
public string? MyProperty { get; set; }
}
```

I have implemented something about that available here on one of my github repos. Im writing something about that on my blog ( I do not write that much but the important stuff I think is in that repo )

Something that has yet to have been brought up is people with no web expectations at all who are simply using the JSON format as the excellent human-readable, compact interchange format that it is for use cases totally outside of web traffic. Limiting our ability to handle polymorphic data, like that from classes inherited from interfaces, is silly and shortsighted.

This is the only thing stopping us from moving to System.Text.Json. We _really_ want to, but we have to have polymorphic support both ways. It might not be "as intended" or "best practice", but we use JSON for serializing most of our business state to/from durable storage. When you are working with incredibly dense object graphs that can change on a daily basis (over 1000 properties across 100+ types), you don't have the luxury of mapping one-property-per-column to your persistence layers and rolling migrators each time. At least not at our scale. JSON serialization of our business models has proven to be, by orders of magnitude, the most sustainable way to manage a huge aspect of our codebase. As a result, we prefer to have full control over serialization at all levels. We have inheritance hierarchies 4-5 types deep that are serializing like a dream w/ Newtonsoft's TypeNameHandling.Auto. No special attributes, helper properties, interfaces, or converter nonsense required.

Due to the level of integration we have with Newtonsoft and the terabytes of existing JSON state we have on customer servers, we will probably have to roll something in-house that is 1:1 compatible. We will probably never be able to use this library. Despite this, I like to bring my case to bear because I am sure there are others who use JSON as a storage format for complex data sets, and of these there are probably some who would love to be able to use polymorphism for better modeling in code.

Certainly for web-based usage, a polymorphic deserialization proposal is suspect at best. I would say even for an internal application, allowing the client to drive type selection is not a good idea. We do not want to use any form of polymorphic serialization on our AspNetCore hosts. Only for private or other trusted back-end code that reads/writes business entities to/from various byte streams.

I've just discovered this thread and I decided to publish a solution I've been using in my projects.
It's an implementation of abstraction converter factory very easy to introduce.
It might be helpful till the issue will be closed :)
https://github.com/lwardzala/Json.Abstraction

Thanks, this one did it for me!

[security hat on]

Friendly reminder - unrestricted polymorphic deserialization is a remote code execution vector. Using Newtonsoft.Json with TypeNameHandling set to any value other than _TypeNameHandling.None_ is insecure unless you have also configured a binder. We hold ourselves to that same bar: Microsoft first-party code which uses Newtonsoft.Json as a serializer is forbidden from touching the TypeNameHandling property. The exception process is intentionally onerous, even with a binder in place.

The safe way to support this is to have a fixed mapping of allowed types. In order to be secure, this mapping must be provided _before_ deserialization takes place. For example:

{
    "$type": "Dog",
    "Name": "Fido",
    "Age": 8,
    "Breed": "Terrier"
}

You can imagine the deserializer having access to a Dictionary<string, Type>, where the value of the $type property is used as a lookup key into this dictionary, and the Type value is used as the actual type to be instantiated and to have its members populated.

// assume the allowed mapping is specified via the 'options' parameter below

Options options = new Options();
options.AllowedMappings["Dog"] = typeof(Contoso.Models.Dog);

Animal animal = Deserialize<Animal>(payload, options);
Console.WriteLine(animal.GetType()); // prints 'Contoso.Models.Dog'

Importantly:

  • Type.GetType(string) and similar APIs must __never under any circumstance__ be called with untrusted input. This implies that feeding the value of the $type field into Type.GetType is disallowed.

  • The caller must pre-populate the mapping dictionary _before_ the call to deserialize, or there must be some other deterministic and documented way for the deserializer to discover the set of allowed types based solely on statically available type information from the caller. (This can also be provided by a source analyzer or other AOT mechanism.)

  • The mapping dictionary should avoid being clever in attempting to support things like open generics or collections (e.g., List<>, T[]). This may allow unintended recursive nesting, which can re-introduce security holes into the application.

These guidelines will drive any implementation of a polymorphism-enabled deserializer provided by the library. This is required by the security review process Microsoft follows during development.

[security hat off - please have a pleasant day]

[security hat on]

Friendly reminder - unrestricted polymorphic deserialization is a remote code execution vector. Using Newtonsoft.Json with TypeNameHandling set to any value other than _TypeNameHandling.None_ is insecure unless you have also configured a binder. We hold ourselves to that same bar: Microsoft first-party code which uses Newtonsoft.Json as a serializer is forbidden from touching the TypeNameHandling property. The exception process is intentionally onerous, even with a binder in place.

The safe way to support this is to have a fixed mapping of allowed types. In order to be secure, this mapping must be provided _before_ deserialization takes place. For example:

{
    "$type": "Dog",
    "Name": "Fido",
    "Age": 8,
    "Breed": "Terrier"
}

You can imagine the deserializer having access to a Dictionary<string, Type>, where the value of the $type property is used as a lookup key into this dictionary, and the Type value is used as the actual type to be instantiated and to have its members populated.

// assume the allowed mapping is specified via the 'options' parameter below

Options options = new Options();
options.AllowedMappings["Dog"] = typeof(Contoso.Models.Dog);

Animal animal = Deserialize<Animal>(payload, options);
Console.WriteLine(animal.GetType()); // prints 'Contoso.Models.Dog'

Importantly:

  • Type.GetType(string) and similar APIs must never under any circumstance be called with untrusted input. This implies that feeding the value of the $type field into Type.GetType is disallowed.
  • The caller must pre-populate the mapping dictionary _before_ the call to deserialize, or there must be some other deterministic and documented way for the deserializer to discover the set of allowed types based solely on statically available type information from the caller. (This can also be provided by a source analyzer or other AOT mechanism.)
  • The mapping dictionary should avoid being clever in attempting to support things like open generics or collections (e.g., List<>, T[]). This may allow unintended recursive nesting, which can re-introduce security holes into the application.

These guidelines will drive any implementation of a polymorphism-enabled deserializer provided by the library. This is required by the security review process Microsoft follows during development.

[security hat off - please have a pleasant day]

Thanks for the guideline! I'll try to review and adapt that in my solution to make it easier to implement and safer as well.

@GrabYourPitchforks would be ok to provide polymorphism support via something like TypeResolver that mimics ReferenceResolver but for $type metadata? This way different polymorphism mechanism could be provided depending on security requirements (eg not every app holds sensitive data or some live in isolated enviroments meaning data is always trusted)

Was this page helpful?
0 / 5 - 0 ratings

Related issues

omariom picture omariom  路  3Comments

v0l picture v0l  路  3Comments

GitAntoinee picture GitAntoinee  路  3Comments

chunseoklee picture chunseoklee  路  3Comments

omajid picture omajid  路  3Comments