Orleans: Json serialization, references, wrong ordering

Created on 13 Jun 2019  路  18Comments  路  Source: dotnet/orleans

Using Json serialization when you have multiple references to the same object usually creates a graph such that the first time an object is encountered it has the properties and an $id which identifies that instance. Thereafter, in the JSON a $ref property is used to identify that pre-defined object and a reference is inserted.

I have a situation where the $ref comes _before_ the definition of the object, which means that is being deserialized to null.

This is a real headscratcher, because I cant reproduce the same problem in isolation using a simpler object graph and JSON.NET. Butmy fairly complex grain state exhibits the problem _every time_.

Has anyone ever experienced this?

Below is a (snipped) copy of my JSON demonstrating the problem. You'll notice 2 "$ref" : "5" that come before the definition of "$id": "5":

{
  "$id": "1",
  "$type": "Redacted.GrainSTateTypeName, Redacted.GrainAssembly",
  "Block": {
    "$id": "2",
    "$type": "Redacted.TypeName, Redacted.Assembly",
    "Terms": {
      "$type": "Redacted.TypeName.SubTypeName[], Redacted.Assembly",
      "$values": [
        {
          "$ref": "5"
        }
      ]
    },
    "Action": "Transition",
    "Driver": {
      "$id": "27",
      "$type": "Redacted.OtherTypeName, Redacted.Assembly",
      "Terms": {
        "$type": "Redacted.TypeName.SubTypeName[], Redacted.Assembly",
        "$values": [
          {
            "$ref": "5"
          }
        ]
      },
     [....snip...]
    "currentStateRecord": {
      "$id": "4",
      "$type": "Redected.TypeName.StateTypeName, Redacted.Assembly",
      "State": 1,
      "Terms": {
        "$type": "Redacted.TypeName.SubTypeName[], Redacted.Assembly",
        "$values": [
          {
            "$id": "5",
            "Name": "qty",
            "$type": "Redacted.TypeName.SubTypeName, Redacted.Assembly",
            "value": 10
          }
        ]
      }
   [...snip...]
   }
}

Most helpful comment

I suspect it will do. Its not the same change I made locally - which was to make those columns json, but given the conversation above, the use of text should be fine.

All 18 comments

I've been tearing my hair out with this issue for a while. And no sooner as I gave up and posted this I found that Postgres rather unhelpfully decides to re-order the properties of a json object.

Now the order of properties _shouldn't_ atter - except that JSON.NET requires the object defined before the ref. So Postgres quite unhelpfully kills JSON.NET's ability to deserialize an object graph with references.

This issue can be closed. But hopefully this helps someone else in future.

@jam13c I may be wrong, but I think you'll find that the column is a jsonb column. If you change it to just a json column it will not be reordered.

I did try that, but the same seems to be happening. Docs say it shouldn't but it does

Thanks @ashtonkj

Ah I needed to update the table, the WriteToStorage function and the query for the same in the WriteToStorage data row in the orleansquery table. All of which referred to jsonb

You might want to consider changing postgres ADO implementation to using json rather than jsonb. The combination of jsonb and json.net does not play nicely together for the above-mentioned reasons.

I also have an issue because of the key re-ordering in Postgres. (also lost a lot of hair before I found the reason)

with json.net it is possible to serialize / deserialize abstract types, but you have to set TypeNameHandling = TypeNameHandling.Auto in the serializer settings. With that, it adds a $type property in the json, to tell what is the concrete type. But this property must be on top, otherwise it doesn't work, unless you also set MetadataPropertyHandling = MetadataPropertyHandling.ReadAhead

I don't know if there is a specific reason to use jsonb instead of json, but if not, it's probably safer for the serializer if we can store the payload in a json column.

I'm not sure what if anything we should do here.

The problem is that depending on the DB type and the column Type, complex deserialization scenarios may fail because JSON.Net expects metadata properties to be on top, which is not guaranteed.

So for Postgres, a possible solution is to use a json column instead of jsonb to ensure that the stored value and the fetched value are the same.

So for Postgres, a possible solution is to use a json column instead of jsonb to ensure that the stored value and the fetched value are the same.

Would that require a change in the SQL scripts for Postges or in the ADO.NET storage provider?

Yes @sergeybykov - to work with postgres as my ADONET provider, I have had to change both the database column, and the query to use json rather than jsonb. No change needed to the provider code.

This problem is not just limited to the metadata. The way JSON.NET works with references also breaks in postgres jsonb - as the definition of the object must come before the reference to it in the object graph. This is a little bit of a violation of JSON, as the ordering of properties should not matter - but that is the way json.net has been implemented.

So for Postgres, a possible solution is to use a json column instead of jsonb to ensure that the stored value and the fetched value are the same.

Would that require a change in the SQL scripts for Postges or in the ADO.NET storage provider?

I think only the SQL scripts

@veikkoeeva Do you agree?
If so, then a PR for this is welcome.

As I understand how @jam13c describes the issue, just modifying the scripts.

If someone goes to modify the scripts, I don't think there are backwards compatibility problems to deployments done already. I link also to https://github.com/dotnet/orleans/issues/5738 since that too looks like just requiring a modification (CockroachDb doesn't implement all of the PostgreSQL protocol). About half a year ago @palpha wrote in https://gitter.im/dotnet/orleans?at=5a54c843ce68c3bc74988667 the SQL could be more standards based.

If we modify the script, I wonder if it is useful to use a json column rather than a text column. Most of the other implementations use text / clob / varchar(max) / ...

The only use case is to store a string and to read the exact same string without doing anything json specific at DB level. Isn't it?

@srollinet I can explain some rationale. I'm veering towards keeping as text values, but my opinion is easily swayed.

SQL Server uses NVARCHAR since and if a check for JSON is needed, it can be defined (see here). I assume one reason for this split is saving resources if the checks can be made out of database. What comes to the actual script design, it is open-ended on purpose so that if the users want to, they can modify the scripts (as explained in them) and in this case add checks (or schemas to XML).

For MySQL the LONGTEXT type predates versions that had a native JSON type. I don't know if it would be useful to change this format. Reading from https://dev.mysql.com/doc/refman/5.7/en/json.html sugggests JSON is stored in more compact form and queries to subelements are more effective, but Orleans reads always everything at once and I could imagine checking proper JSON takes a bit of resources as in case of adding the check separately in SQL Server. Also here, like with SQL Server, the indexing does preserve clear-text identifiers if one wants to access the data "off-Orleans" to do subqueries or for something else as well as for conflict resolution (for the sake of efficiency, hashed identifiers are indexes, collisions resolves from the clear-text values) and in that case I think everything works OK if someone wants to change the types as ADO.NET reads them as strings in any case.

PostgreSQL was contributed later and I haven't myself done much with it. Though I'm grateful it was constributed and has found use. :) The aforementioned noted, maybe PostgreSQL could have text columns for JSON too, but it's better to see how those who use more that particular database decide.

@veikkoeeva
Thanks for the explanations :)

What comes to the actual script design, it is open-ended on purpose so that if the users want to, they can modify the scripts (as explained in them)

This is probably a good argument for text values, as it will probably fit most users needs (efficient because no validation is done and the data is not altered so it will fix the de-serialization issue). I don't know how the other users use it, but I guess in most of the case it is used as a black box and the main concern is that it works ;)

Users that have more complexe use cases can easily change the column type.

For MySQL the LONGTEXT type predates versions that had a native JSON type. I don't know if it would be useful to change this format. Reading from https://dev.mysql.com/doc/refman/5.7/en/json.html sugggests JSON is stored in more compact form and queries to subelements are more effective

I would be careful about that, this is exactly why jsonb type is a problem!

@jam13c Does my PR fix your issue?

I suspect it will do. Its not the same change I made locally - which was to make those columns json, but given the conversation above, the use of text should be fine.

Was this page helpful?
0 / 5 - 0 ratings

Related issues

guopenglun picture guopenglun  路  3Comments

gabikliot picture gabikliot  路  4Comments

DixonDs picture DixonDs  路  4Comments

danvanderboom picture danvanderboom  路  3Comments

scharada picture scharada  路  3Comments