Kotlinx.serialization: Fields name policy converters

Created on 17 Nov 2017 · 20Comments · Source: Kotlin/kotlinx.serialization

A very common case of serialization is when API/JSON naming policy conflicts with your code naming, for example, snake case quite popular naming style for JSON fields, but for Kotlin it's camel case. Also, format naming style can be incompatible with JVM (for example field names with dashes).
Gson provides feature to solve that: https://google.github.io/gson/apidocs/com/google/gson/FieldNamingPolicy.html (implementation of FieldNamingStrategy)

In general, it's questionable feature, Moshi doesn't provide it and suggests to use the same case for your model classes.

But it can be a big blocker if you want to migrate to kotlinx.serialization, because you have 3 choices:

Use @SerialName for each field. Tedious and error-prone and requires a lot of refactoring
Rename all your data models to match API name convention. Tedious, requires even more refactoring, violates Kotlin code style guides and backward incompatible.
Write own implementation of JSON (or any other format) that provides such feature, looks like a reasonable solution, but now you have one more copy JSON implementation in your project (because JSON implementation now a part of kotlinx.serialization) with different name, but the same API

Possible solutions:

Add naming policy feature to JSON implementation. Some drawbacks: doesn't work for other formats, debatable nature of this feature, required not for all use cases.
Provide universal API that allows defining name converters. A user can provide own implementation, so not necessary to include it to core fo kotlinx.serialization, works for all formats without additional effort. By default just uses field names as is, so current behavior.
Possible problems: If you use more than one serialization format, different formats probably require different strategies, so we still should integrate naming policy to format implementation. Also, you sometimes want to use different naming policy for particular requests (from different API), so looks like Serialization format still should provide API to register and use naming policies.

feature

Source

gildor

👍66 ❤2

Most helpful comment

It would be nice to give people the option to use casing strategies if their usecase demands it. We have a situation where we have existing systems built differently and so use different casing strategies. We'd like to use kotlinx.serialization to generate contracts (avro IDLs generated using avro4k) in a global format (snake-cased) to enable the systems to speak with each other, but sometimes those IDLs are being generated from systems who already use camel-casing everywhere. With this restriction, we'd be forced to always annotate every field using @SerialName on every field, which as pointed out in the OP is tedious and error prone. Since this seems like a pretty trivial change, it would be awesome if we can get this option as it could save us and others in a similar position a lot of pain.

williamboxhall on 13 Oct 2020

👍13

All 20 comments

This feature is really needed!

BoxResin on 11 Sep 2018

@Serializable
data class Data(val a: Int, @Optional val b: String = "42")

What about adding new class JsonSerializationStrategy?

This class can be something like

class JsonSerializationStrategy<T>(
    val serializer: SerializationStrategy<T>, 
    val config: JsonSerializationConfig) : SerializationStrategy(serializer)

Json.stringify will only accept JsonSerializationStrategy.

So, we can set "Fields name policy" in that config class.

Or adding new paramter in Json class.

Why to prefer 1 case: JsonSerializationStrategy can be cached/reused.

raderio on 15 Feb 2019

Has there been any movement on this feature?

rscottcarson on 27 Aug 2019

As a Gson maintainer which was used as justification for this feature, we regret field naming policies and do not recommend their use. Moshi, the Gson successor, does not include field naming policies because we view it as a mistake.

JakeWharton on 28 Aug 2019

Jackson library also has field naming policies.

raderio on 28 Aug 2019

👍1

I would like to propose a 3rd "solution" that is to use an intercepting encoders/decoders. Basically this encoder/decoder would delegate to the actual encoder, but use a delegating SerialDescriptor implementation that does the renaming when/where needed. Note that the generated code cannot know whether a name was provided automatically or manually (using @SerialName).. perhaps something to add a flag for.

Having said that. Related to the GSON issue I would say that there is a reason to not do this a your serialization can be used for two reasons:

temporary serialization to an opaque format (for deserialization only). In this case names don't really matter, so why bother with having names that are not identical to the Kotlin fields
permanent serialization to a transparent (defined) format. In this case there is a case for applying @SerialName even in the case that the names are equal, because type member renames should not lead to serial format changes (that would be incompatible).

Intercepting encoders/decoders could be useful for other cases as well (I saw a bug about encrypted data). Perhaps it would be worthwhile to provide a base class for these, but it is also an issue that would be better implemented outside the main runtime library.

pdvrieze on 29 Aug 2019

👍4

+1 for the global setting and annotation per class

Lewik on 21 Apr 2020

Any plans for this?

dragneelfps on 19 Sep 2020

We regret field naming policies and do not recommend their use.

@JakeWharton what was the reasoning behind this? I am not sure how would you have any problems if this is used as an optional feature. My guess would be performance concerns?

dragneelfps on 19 Sep 2020

Performance is not a concern. The serialization model is only built once.

It's more that the magical behavior is needless and you should embrace the naming conventions of the layer you are modeling, even if they break the otherwise normal conventions of the language in which you are doing the modeling. And if you need an alternate name, it can be specified explicitly so as to not break tools like grep.

More at https://publicobject.com/2016/01/20/strict-naming-conventions-are-a-liability/

JakeWharton on 19 Sep 2020

👍2 👎1

And if you need an alternate name, it can be specified explicitly so as to not break tools like grep

I guess the search tools we use should support name policy converters 😁 Because now it seems that we have to do tedious actions in Kotlin only because of some external factors...

And now if you even write the whole full-stack app in Kotlin, you still don't have a way to make the code more concise, for example, to minify names automatically (as in #908).

SerVB on 19 Sep 2020

Minifying names automatically is horrifying. It's impossible to make deterministic such that you retain compatibility across compilations and deployments. Persistence or blue/green deployment is thus impossible.

I'm happy to fight against that in addition to field name policies, though.

JakeWharton on 19 Sep 2020

williamboxhall on 13 Oct 2020

👍13

What is the plan here? Is there any alternative way we can prevent having to write hundreds of @SerialName entries just because our backend team has a different casing convention (snake_case) than we have in our Mobile app (lowerCamelCase)? It feels like lots of duplicate code ... just checkout this one type:
Bildschirmfoto 2020-12-09 um 16 44 13

I absolutely disagree with what is said in the blog post linked to by @JakeWharton by the way. A naming change in an API is a breaking API change and is simply forbidden, but even if it's needed, our backend team isn't supposed to check for it. Instead, they report it to our Mobile team and they will do the check. Everyone there will know that we are converting snake_case to lowerCamelCase – so there is no such problem at all. So, while it might make sense for some teams, I think the official Kotlin serializer shouldn't be opinionated about the use cases here as there is clearly the need for such a thing. At least 55 people seem to agree with me (see the upvotes).

Considering Android apps, there's typically always an iOS counterpart and in Swifts official JsonDecoder there is absolutely a policy converter – so your backend team might not find all use cases without converting to lowerCamelCase anyways. Let's be realistic and just accept that in no world a backend developer will be able to actually trust such a search without converting cases as there will be always the potential for clients for doing the conversions.

Jeehut on 9 Dec 2020

👍2

The only duplicate code is where you've specified a @SerialName where one isn't needed. Otherwise you're clearly mapping across domains in which case you are required to specify the identifier association on both sides. This is pretty standard fare for layer crossing such as serialization and persistence.

Any kind of automatic field naming conversion policy introduces additional edge cases that have to be considered. What if the JSON contains keys for external_id and externalId? This is valid JSON as keys do not collide. JSON is unordered so do you take first key? Last key? That means non-determinism. Do you throw at runtime? What if someone needs both do they have to drop out of a naming policy for the entire model tree? Or can you control it per-subgraph? You might have the luxury of ignoring these cases as never happening but the library does not.

An easy way to apply a field naming policy today is to specify your serialization format in an IDL and code gen the model objects on all platforms with appropriate policies applied at generation time through these annotations. It also means you're never out-of-sync on any platform and never have to write these model specifications more than once and never for a specific platform.

Finally, citing [other library] has this feature is not an argument for or against the feature in this library. Gson has _ton_ of features I can point you to, and the majority of them were bad decisions in retrospect. We include field naming policies as one of them, and they're thankfully absent from ~Gson 3.0~ Moshi.

JakeWharton on 9 Dec 2020

😕2

@JakeWharton I appreciate your answer, but you're just repeating the same arguments but not addressing our question: Why can't we have the choice to say "we don't expect any problems like 'external_id' and 'externalId' and we don't care about the libraries behavior in that case as it won't happen - feel free to crash in that case if needed!".

I didn't cite other libraries which have this feature to say this is a good feature because others have it, too. I just said that their existence and wide adoption renders the argument of the blog post irrelevant.

Your suggestion to use an IDL just proves again that your view is considering a very restricted set of use cases. I can understand that in some random serializer, but I don't think an official serializer for such a widely applicable language such as Kotlin should be so restricted. I don't know your company, but we don't have the resources to solve this problem with an intermediary layer - our code is working fine, and we don't expect any changes as this is forbidden within an API version as per our requirements. I suspect many others are in a similar situation and using an IDL is just overkill.

Jeehut on 9 Dec 2020

👍3

not addressing our question

I'm not the library author so I cannot answer the question. I can merely argue against this feature's inclusion as strongly as I can having been one of the maintainers of both Gson and Moshi which take different views on the subject matter.

If I were the library author, my answer would be that you can already do this with @SerialName. This is our answer in Moshi.

Moreover, introducing the policy introduces not only edge cases and require careful thought, but severe indirection in the generated code such that I'd be concerned about the performance impact. Not the mention it's unclear whether supporting this is even possible in a binary compatible way in a post-1.0 world.

I didn't cite other libraries which have this feature to say this is a good feature because others have it, too. I just said that their existence and wide adoption renders the argument of the blog post irrelevant.

It doesn't. The argument is for not building it into anything new.

your view is considering a very restricted set of use cases

Yeah, not really. @SerialName solves this problem without question. Period. Your issue, and this feature request, is about the associated verbosity of being explicit everywhere.

We (the maintainers of Gson and Moshi) have thought about the implications of supporting this at the library-level at length.

I don't know your company, but we don't have the resources to solve this problem with an intermediary layer

Considering using one of the many existing open-source tools for it which should require less work that defining models on multiple clients.

we don't expect any changes as this is forbidden within an API version as per our requirements.

The stated goal was for field mapping and reduction of "boilerplate". IDL solves both of these completely eliminating the need to write models on any client. If you write models by hand, it's not unreasonable for the models to need to be explicit "by hand" as well.

The layering of responsibility there is actually quite nice. When you move up a level of abstraction you get to define global transforms such as generated field name conversions.

I don't think an official serializer for such a widely applicable language such as Kotlin should be so restricted

Your opinion is noted. Mine is the opposite. We don't have to agree, and we won't.

JakeWharton on 9 Dec 2020

@JakeWharton

@SerialName solves this problem without question. Period. Your issue, and this feature request, is about the associated verbosity of being explicit everywhere.

I don't know what your "this problem" is referring to if not "this feature request". These two sentences contradict each other. I thought this is a discussion about "this feature request" or did I misunderstand how GitHub issues work? 55 people have confirmed they have "this problem" and the OP explains very well how it is still unsolved. Stating that it is already solved is nothing else but downplaying this problem and ignoring our voices.

But don't get me wrong, I do understand that you're trying to drive forward a specific way of thinking to prevent a given set of problems by design. I just disagree on the question if this is the right choice for the audience of this library. Or to put it in your own words:

Your opinion is noted. Mine is the opposite. We don't have to agree, and we won't.

😇

Jeehut on 9 Dec 2020

Interestingly, in the XML format I wrote, I've introduced the concept of a policy. When creating the format you can set an instance of the interface (there is a default). This is used to determine various aspect, including what tag or attribute name to use. It does make it much easier to be compatible with other serialization frameworks (without automatic renaming - although that is possible). Important here is that this is something that the user of the library provides. The library uses it to build the tree structure representing the serialization metadata, and uses that structure for actual serialization (it adds complexity, but is much more manageable than doing the same in various places of the serialization/deserialization itself).

pdvrieze on 12 Jan 2021

@SerialName solves this problem without question. Period.

It does not, @JakeWharton, because you yourself suggested:

you should embrace the naming conventions of the layer you are modeling, even if they break the otherwise normal conventions of the language in which you are doing the modeling.

While that works for libraries that target a single serialization format (like Gson and Moshi), it does not necessarily work for libraries that target multiple serialization formats (like kotlinx.serialization). Because in case "the naming conventions of the layer you are modeling" conflict, it raises the question on which convention to align. And a configurable property naming strategy is an easy way to do that. A more complicated way is to have output-format-specific IDLs.