Runtime: [System.Text.Json] serialize/deserialize any object

Created on 26 Sep 2019  路  18Comments  路  Source: dotnet/runtime

I try to convert my Newtonsoft.Json code to System.Text.Json and I came along a problem.
I need to serialize/deserialize any object. I don't know the objects type at compile time.

With Newtonsoft.Json I did this:

JsonSerializerSettings jsonSerializerSettings = new JsonSerializerSettings() { TypeNameHandling = TypeNameHandling.All };

// serialize
object value = GetSomeObject();
string json = JsonConvert.SerializeObject(value, jsonSerializerSettings);

// deserialize
object value = JsonConvert.DeserializeObject(json, jsonSerializerSettings);

The serialized json contained a "$type" with the full name of the type (namespace, classname, assemblyname).
With this info the deserializer knew what object to instantiate.

Is this also possible with System.Text.Json? How?

area-System.Text.Json

Most helpful comment

I understand the security risk for exposed JSON, but for internal handling where the JSON is _not_ exposed to the public, a TypeNameHandling.All would be of great help. It doesn't even have to be on by default, but you have to turn it on somewhere deep in JsonSerializerOptions. That way people will still have the possibility of using it, but knowingly taking the security risk (if any).

I, too, serialize types containing object, and the type of this should be preserved when deserializing.

All 18 comments

I would always recommend specifying the type you're expecting no matter what. Alternatively, wrap your serialized objects in an envelope that contains the type manually.

I don't know the type at compile time.
If I have to store the type manually, then I can't use the JsonSerializer. Or can I?
From what I found Utf8JsonWriter/Utf8JsonReader are the only way to do that, but this is very complicated. In Newtonsoft.Json these are single method calls.

Is this also possible with System.Text.Json? How?

Having the JSON payload define the type being created during deserialization isn't something we'd want to support in S.T.Json and certainly not by default. The object you get back with S.T.Json will be a box'd JsonElement which encapsulates the raw JSON token itself, and you would have to process that to turn into the concrete type that you need.

I don't know the type at compile time.

Maybe you could try augmenting your object model to have a "type" property, and then use that (after deserialization) to figure out at runtime which type to new up from the JsonElement, but I haven't given that a try. Otherwise, yes, you would have to use the low-level reader/writer. There are no single method call capabilities for this use case within the JsonSerializer.

cc @steveharter, @bartonjs

The decision to leave out TypeNameHandling.All-equivalent functionality was intentional. Allowing the payload to specify its own type information is a common source of vulnerabilities in web applications. In particular, configuring Newtonsoft.Json with TypeNameHandling.All trivially allows the remote client to embed an entire executable application within the JSON payload itself, so that during deserialization the web application will extract and run the embedded code.

For further reading, see https://www.blackhat.com/docs/us-17/thursday/us-17-Munoz-Friday-The-13th-Json-Attacks.pdf and https://www.blackhat.com/docs/us-17/thursday/us-17-Munoz-Friday-The-13th-JSON-Attacks-wp.pdf.

I do understand the security concerns. Maybe the user could configure a list of allowed assemblies where the type can come from. I will certainly do this in my code.

I added $type manually in these methods:

private string Serialize(object value)
{
    using var stream = new MemoryStream();
    using var writer = new Utf8JsonWriter(stream);
    writer.WriteStartObject();

    writer.WritePropertyName("$type");
    var typeName = $"{value.GetType().FullName}, {value.GetType().Assembly.GetName().Name}";
    writer.WriteStringValue(typeName);

    var str = JsonSerializer.Serialize(value);
    using var doc = JsonDocument.Parse(str);
    foreach (var property in doc.RootElement.EnumerateObject())
        property.WriteTo(writer);

    writer.WriteEndObject();
    writer.Flush();

    return Encoding.UTF8.GetString(stream.ToArray());
}

private object Deserialize(string json)
{
    using var doc = JsonDocument.Parse(json);

    if (!doc.RootElement.TryGetProperty("$type", out var typeProp))
        throw new Exception("No '$type' in json.");
    var typeName = typeProp.GetString();
    var type = ParseTypeName(typeName);

    var obj = JsonSerializer.Deserialize(json, type);
    return obj;
}

It works, but here I need to convert the object to a string to a JsonDocument which I then write to a Utf8JsonWriter.

var str = JsonSerializer.Serialize(value);
using var doc = JsonDocument.Parse(str);

This is not in a hot path in my app, but I'd still prefer not to do that indirection (without doing too much refection myself).

Similarly in Deserialize where I first call JsonDocument.Parse(json) to get the $type and then call JsonSerializer.Deserialize(json, type).

I reiterate:

Allowing the payload to specify its own type information is a common source of vulnerabilities in web applications.

Almost any possible implementation of _ParseTypeName_ I can think of will have a security vulnerability of some type. If you _really_ need some type of polymorphic behavior, one potentially safe way to do this would be to have $type as an integer (not a string), then maintain a mapping 1 -> TypeA, 2 -> TypeB, etc. But even this would have to be tightly controlled.

One possible way to work around the situation you're experiencing now where you need to disassembly and reconstruct the payload is to keep the type information separate from the object data. For example, instead of:

{
    "$type": 1,
    "field1": "value1",
    "field2": "etc."
}

Try:

{
    "$type": 1,
    "data": {
        "field1": "value1",
        "field2": "etc."
    }
}

If you maintain the invariant that the $type field must appear first in the stream, it's possible to instantiate the reader, quickly read the $type field, map it to the expected type to be deserialized, then call the deserializer. There shouldn't be much indirection or repeated work done while processing such a payload.

Ok, thanks! We'll have to think about the type more.

I managed to use a Utf8JsonReader to read just the first property $type and then deserialize to that type.

But how can I optimize serialization? Can I get the JsonSerializer (or anything else) to write to a Utf8JsonWriter?

So that I can avoid the diversion over string and JsonDocument as in this code:

var str = JsonSerializer.Serialize(value);
using var doc = JsonDocument.Parse(str);
foreach (var property in doc.RootElement.EnumerateObject())
    property.WriteTo(writer);

Can I get the JsonSerializer (or anything else) to write to a Utf8JsonWriter?

Yes. There's a static method on JsonSerializer that writes to a Utf8JsonWriter directly (just like how there is a method that works with Utf8JsonReader for deserializing).

https://github.com/dotnet/corefx/blob/48363ac826ccf66fbe31a5dcb1dc2aab9a7dd768/src/System.Text.Json/ref/System.Text.Json.cs#L439

https://docs.microsoft.com/en-us/dotnet/api/system.text.json.jsonserializer.serialize?view=netcore-3.0#System_Text_Json_JsonSerializer_Serialize__1_System_Text_Json_Utf8JsonWriter___0_System_Text_Json_JsonSerializerOptions_

Thanks, I didn't see that overload. That makes the workaround much better.

That workaround suffices for my current needs. I still think, that the type will have to be added for some circumstances like (de)serializing interfaces and base types (and lists of these). But this is most probably something for .NET 5.

I don't need this issue anymore, but I don't know if you want to keep it for .NET5. If not, feel free to close it.

There are legitimate use case for adding the $type property, the breeze js library depends on it for example. Unsafe de-serialization is an separate matter. you can object to supporting that and still let us add the $type property easily.

I understand the security risk for exposed JSON, but for internal handling where the JSON is _not_ exposed to the public, a TypeNameHandling.All would be of great help. It doesn't even have to be on by default, but you have to turn it on somewhere deep in JsonSerializerOptions. That way people will still have the possibility of using it, but knowingly taking the security risk (if any).

I, too, serialize types containing object, and the type of this should be preserved when deserializing.

I have the same issue. I am writing an apis to collect data from Microsoft Azure DevOps. Their serialized json contains $type. So I am at a loss as how to deserialize these fields.

Wanted to comment on another use case...it looks like this prevents System.Text.Json from being used with xunit. Without the ability to deserialize into the actual objects and not wrapped JsonElements, I can't output IEnumerable<object[]> from my custom JSON file xUnit DataAttribute class.

If this works in the way I think it works, this is an automatic blocker from adoption System.Text.Json.

The argument to not support polymorphic deserialization because it opens vulnerabilities and then suggesting to use an envelop as a wrapper that does exactly that makes no sense. It opens the same vulnerabilities and halves the performance because now you have to (de)serialize the data twice. In this scenario newtonsoft outperforms stjson easily. (My usecase doesn't involve websites anyway so why?) But maybe c#9 can help to support polymorphic deserialization for data classes (if data classes support inheritance at all). Also if you would allow this only given a base type that is known to be secure there wouldn't be any issues.

I understand the security risk for exposed JSON, but for internal handling where the JSON is _not_ exposed to the public, a TypeNameHandling.All would be of great help. It doesn't even have to be on by default, but you have to turn it on somewhere deep in JsonSerializerOptions. That way people will still have the possibility of using it, but knowingly taking the security risk (if any).

I, too, serialize types containing object, and the type of this should be preserved when deserializing.

I also serialize types containing interfaces. The serializer must know what the concrete type was when deserializing. I don't want to set up type mapping etc.

My workaround currently:

JsonSerializer.Serialize((object)data)

For enumerable objects:

// ToList() is needed!
JsonSerializer.Serialize(data.Cast<object>().ToList())

I have figured out the same approach works for me, is there any drawbacks to this though I am wondering?

Was this page helpful?
0 / 5 - 0 ratings

Related issues

bencz picture bencz  路  3Comments

v0l picture v0l  路  3Comments

jamesqo picture jamesqo  路  3Comments

EgorBo picture EgorBo  路  3Comments

matty-hall picture matty-hall  路  3Comments