(note: follow-up for #265)
Current implementation of @JsonUnwrapped deserialization requires handling of all other properties first, and only then reconstructing unwrapped objects. This works ok via fields/setters, but can not work with @JsonCreator passed values.
But with a major rewrite it should be possible to make this work. This probably requires more support for actually determining unwrapped-properties a priori, instead of just buffering anything that doesn't match. Or maybe it only requires more logic in handling of creator.
Any update on this? Would be nice with a way to get @JsonUnwrapped working for deserialisation as well.
@OverlyExcessive It works just fine with deserialization if you use setters (or fields). Note that you do NOT have to pass ALL properties via creator; mixing and matching creator and setters/fields is legal.
This is not to say it wouldn't be great to allow passing unwrapped properties via Creator methods, just that this is the one aspect with unwrapped properties that does not work.
@cowtowncoder
You're correct, thanks!
+1 Any update on this?
Implementation of this is necessary to use @JsonUnwrapped in Kotlin.
Hi ! This is my first attempt a contributing to the Jackson code base. I hope you will like my proposal to fix the current issue. It is a work in progress though. Comments and feedback are welcome.
I am working on a Kotlin project and I really want to be able to have an unwrapped property as a parameter of a creator method so, in order to fix this, I started to have a look at the code. I must say that I am a bit surprised by the amount of duplicated code in the deserialization process. That's in part because execution paths branch off very early and we end-up using a different method for each one of the annotation combination.
I made a graph to illustrate what is going on:
deserialize.pdf
I think that relying on a more generic approach would help to simplify the code, which would have many benefits, like the ability to easily implement the requested feature. The goal is to have only one main execution path for all the different scenarios.
@cowtowncoder, you identified the problem pretty well: we obviously cannot wait until the end to construct the unwrapped property since we need sometimes need it in the creator method.
My proposed approach is to build a tree of operations that need to be performed in order to successfully deserialize an object of a given class. That tree can then be used to perform the deserialisation itself.
_propertyBasedCreator is true, we need to have all the properties required to call one of the creator methods. Same thing for other classes, although the list of properties to gather will be zero.@JsonUnwrapped, perform the same analysis recursively. After this step is complete, we will end up with a hierarchy of required operations. Let's consider those two classes, for example:public class Location {
public int x;
public in y;
@JsonCreator Location(
@JsonProperty(value = "x", required=true) int x,
@JsonProperty(value="y", required=true) int y
) { /*...*/ }
}
public class Test {
public String name;
public Location location;
@JsonPropery(required=true) public String someOtherThing;
@JsonProperty public String someOptionalThing;
@JsonCreator Test(
@JsonProperty(value="name", required=true) String name,
@JsonUnwrapped @JsonProperty(value="location", required=true) Location location
) { /* ... */ }
}
We would have this hierarchy of required operations to perform:
Instantiate class Test
|- Call creator function
|- Deserialize property name
|- Instantiate class Location
|- Call creator function
|- Deserialize property x
|- Deserialize property y
Once the operation tree is built, then...
getResult() method on the root object.@JsonUnwrapped properties first.If a requirement is not met, we would be able to generate a hierarchical exception like
Exception: Failed to instantiate class Test
Caused by Exception: Failed to instantiate class Location
Caused by Exception: Failed to deserialize required property "x"
Caused by Exception: Required property "x" is missing
I suggest that we go one step further down the generalization road and have a required deserialization operation to perform for each required property of the class. And optional properties should also be represented by the same kind of operations. Those optional operations can also have dependencies on required operations. For one thing, the class needs to be instantiated first, right ? For the example above, we would have this acyclic directed graph to build:
operations.pdf
Here are the rules that would be used to build the graph:
Tokens will be fed to the deserialization operations like explained in the first part of this document. One advantage, here, is that we don't need a separate buffer; all the needed deserialization operations are already present in the tree, and this is where the value will be stored while waiting for the class to be instantiated. That is because the set operation will be blocked as long the the instantiate operation it depends on has not been performed.
For a given class, a template of the graph would have to be built once for all. After that, we could copy it quickly when performing each new deserialization.
@plcarmel First of all, thank you for investigating this, and writing above analysis.
I agree on parts of amount of duplication being unfortunate; parts of this are for optimization, although earlier choices then do lead to combinatorial explosion. This is further compounded by separate builder-based deserializer and even properties-as-array variant.
Now: on rewrite as suggested... I have 2 main concerns:
Of these, (1) is easy enough to measure, although probably only after implementation is complete enough. It could be that my concerns are overblown.
And perhaps it could be possible to avoid construction of various state objects for cases where more complex processing is not needed.
Regarding (2), I'd be tempted to suggest that if such rewrite was to be done, it should go with Jackson 3.x, to reduce likelihood of major issues with 2.x usage and some of extensions.
I am guessing that changes would be needed to internal APIs, potentially breaking some use cases where users use sub-classing for handlers -- "advanced" usage that is using extension points that have not necessarily been design as such (but that in Java are difficult to really indicate as "do not extend", due to various reasons).
With that, I would be interested in seeing how a prototype could be made. I don't think I have time to work on this directly this, but I would do my best to help, read code and so on.
ps. Something that just occurred to me is that one area where I have actually thought about alternate implementation is non-blocking/async parsing -- jackson-core has such mode, but databind can not really use it. Deserialization in non-blocking mode would by necessity require different approach, feeding (push model) tokens, to be feasible (IMO). This is different from streaming level where pull-like approach is still feasible (and which is why it has been implemented).
So one possible idea would be to start prototyping "alt databind" that would be built using push approach. If and when it worked well for that side, proving sufficient efficiency (non-blocking parsing itself is within 10% of blocking one, based on my testing), perhaps retrofit/rewrite of "standard" input would be more practical?
@cowtowncoder, what you say makes a lot of sense and I sure don't want to break anything on a module that is used by every Java project on earth, nor slow it down.
I think that building a prototype for the push approach is a good idea, as those two things fit together pretty well.
I still have to figure out the details of how I am going to come up with something that can be tested quickly, and with a reasonable effort.
For future reference, here's the feature request on jackson-core for a non-blocking parsing: https://github.com/FasterXML/jackson-core/issues/57
@plcarmel Yes. Also: it goes without saying that any incremental improvements for existing handling, simplifications, are welcome if you find some. Sometimes working on a prototype other orthogonal improvements are found as well.
Wrt async handling: parsing itself is fully implemented for streaming core: but the question is how to make databinding work with it -- right now things would not work since databinding can not really deal with JsonToken.NOT_AVAILABLE indicator.
In addition, Smile format (~= binary json) implements it, and if there was enough interest, implementations should be quite straight-forward for CBOR, and at least possible for most other formats (like protobuf and avro)
I think an easy option to have the deserialization perform well is to optimize the operation graph after it has been created, just like a compiler would do with the graph created from source code.
Large sections of the graph can be replaced with just one node that performs vanilla processing using the current algorithm, for example (provided deserialization is done in blocking mode).
@cowtowncoder
I wrote a prototype, and it works. However, I just realized upon completing it that what I had was an LL parser ... which could have been done more easily and efficiently have I known it from the start.
50 years later, here I come... oh well.
At least I can now formalize my proposal :
For parsing a given class...
(Optional) Optimize the rule table by replacing large number of rules by one single rule for those sequences of terminals that can be parsed easily, i.e. vanilla parsing, for example.
Run the parser
Et voila !
Who amongst us has not rewritten a LALR(1) parser at least once in their life! :-D
Actually it has been 25 years since I last read the Dragon Book so I have to re-check which class of parsers LL was.
Looking back now, thank you for the diagrams btw. Would you mind my reusing of them, if I was to write a short Medium blog post about how "standard" Jackson POJO deserialization works? (if I find time to do it)
deserialize.pdf in particular is very useful visualization.
@cowtowncoder
LL(k) parsers are the simpler and less powerful ones. They start with the S symbol, and then do substitutions based on the symbol at the top of the stack and the k next tokens.
More precisely,
LL parsers apply substitutions that expand the number of terminals, then discard them once they are matched. This is mostly what I did with my tree structure. The tree was the fully expanded set of rules and the job of the parser was to remove each and every single node that was not optional.
Optional nodes were the fully expanded, but not yet applied, or committed to, rules. One thing I realized was that my solution would not be able to parse arrays with everything always fully expanded. An array has an indefinite length (unless constrained), so I started to think about how to implement on-the-fly expansions. At this moment, I started to have flashbacks from my compiler class...
Also, with my trees, dependencies were used to make sure the parser was committed to one sub-tree until fully collapsed. And that collapsing idea mirrors the LL parser action of removing matched terminals from the stack of symbols, until there is nothing left. My solution was more loose because, without specifying the dependencies correctly, the parser would jump between sub-trees, which is undesirable in the context of parsing.
Finally, I had a method in each node that was called canHandleNextToken(), which is basically
the look-ahead used to choose between different substitutions.
LR parsers, as far as substitutions are concerned, work the other way around. Instead of predicting the structure, they push terminals on the stack and try to reduce the number of symbols by applying the rules in the opposite direction. They are bottom-up parsers and are more powerful because they have more information available to chose between substitutions.
For this simple class:
class X {
public int a;
public String b;
}
We would generate rules likes these bellow.
The syntax:
S ->
| look-ahead: *
¯¯¯¯¯¯¯¯¯¯¯
#1 start_object { obj = new X(); },
CLASS_X_PROPERTY_LIST,
end_object
CLASS_X_PROPERTY_LIST ->
| look-ahead: end_object
¯¯¯¯¯¯¯¯¯¯¯
#2 ϵ
| look-ahead: *
¯¯¯¯¯¯¯¯¯¯¯
#3 CLASS_X_PROPERTY,
CLASS_X_PROPERTY_LIST
CLASS_X_PROPERTY ->
| look-ahead: field_name[a]
¯¯¯¯¯¯¯¯¯¯¯
#4 field_name[a],
value_int { obj.a = parser.intValue(); }
| look-ahead: field_name[b]
¯¯¯¯¯¯¯¯¯¯¯
#5 field_name[b],
value_string { obj.b = parser.stringValue(); }
In rule 1, from the start symbol, we go directly to the expected structure of X, since we are expecting to deserialize an object from this class, after all.
Now, let's say we run the parser with the rules above on the input string {Â "b" = "hello world" }. Those would be the steps taken.
Start
stream: [ start_object, field_name[b], value_string, end_object ]
symbols: [ S ]
action performed: { }
output: { }
Apply rule 1; it is the only one for the S symbol
stream: [ start_object, field_name[b], value_string, end_object ]
symbols: [ start_object, CLASS_X_PROPERTY_LIST, end_object ]
action performed: { }
output: {}
Match terminal start_object
stream: [ field_name[b], value_string, end_object ]
symbols: [ CLASS_X_PROPERTY_LIST, end_object ]
action performed: { output.obj = new X() }
output: { obj: X }
Apply rule 3; rule 2 cannot be applied because the look-ahead is not end_object
stream: [ field_name[b], value_string, end_object ]
symbols: [ CLASS_X_PROPERTY, CLASS_X_PROPERTY_LIST, end_object ]
action performed: { }
output: { obj: X }
Apply rule 5; it is the one for field_name[b] (and symbol CLASS_X_PROPERTY)
stream: [ field_name[b], value_string, end_object ]
symbols: [ field_name[b], value_string, CLASS_X_PROPERTY_LIST, end_object ]
action performed: { }
output: { obj: X }
Match terminal field_name[b]
stream: [ value_string, end_object ]
symbols: [ value_string, CLASS_X_PROPERTY_LIST, end_object ]
action performed: { }
output: { obj: X }
Match terminal value_string
stream: [ end_object ]
symbols: [ CLASS_X_PROPERTY_LIST, end_object ]
action performed: { obj.b = parser.getText() }
output: { obj: X }
Apply rule 2
stream: [ end_object ]
symbols: [ end_object ]
action performed: { }
output: { obj: X }
Match terminal end_object
stream: [ ]
symbols: [ ]
action performed: { }
output: { obj: X }
We would have to add a few conveniences to our parser implementation, like the possibility to specify that a rule should be executed only once, to avoid having multiple values for the same property. It does not add more power to the grammar, but we would have a number of rules in the order of n! without the feature, where n is the number of properties.
@cowtowncoder, no problem, you can reuse the diagrams.
Also, my prototype is here: [email protected]:plcarmel/jackson-databind-issue-1467-poc.git
In the tests, there is a function to generate the diagrams for the steps taken. They look like this:





Hi All,
is there any update on the whole topic? I am currently struggling with Kotlin Data-classes and _JsonUnwrapped_ annotations, since the Data-classes are implicitly using _@JsonCreator_ for the constructor.
OT: Nice proposal @plcarmel haven't seen something like that in a while.
Thank you @Lacritz. Unfortunately, I won't have time to work on this anytime soon.
Most helpful comment
Implementation of this is necessary to use
@JsonUnwrappedin Kotlin.