Kotlinx.serialization: Pain point: a way to minify serial names automatically

Created on 13 Jul 2020  路  5Comments  路  Source: Kotlin/kotlinx.serialization

What is your use-case and why do you need this feature?
By default, kx.serialization uses full names for enum items and polymorphic types, for field names. There are two cons I see:

  1. The decoding time is slower because there is a need to compare long strings.
  2. The size of serialized content is huge.

So in my protocol I have SerialNames specified almost everywhere to make names as compact as possible (you can take sample repo from #907):

@Serializable
enum class PaintType {
  @SerialName("a")
  DRAW,

  @SerialName("b")
  FILL,
}
@Serializable
sealed class ImageId {

  @Serializable
  @SerialName("a")
  data class BufferedImageId(
    @SerialName("a")
    val identityHash: Int,
    @SerialName("b")
    val stateHash: Int
  ) : ImageId()

  @Serializable
  @SerialName("b")
  data class PVolatileImageId(
    @SerialName("a")
    val id: Long
  ) : ImageId()
// ...

It's not convenient to support such code. I can barely see signature of a class. We even have a special test which checks that all serial names in any context form alphabet.

Describe the solution you'd like

I believe a feature to minify serial names automatically should be available.

A parameter for @Serializable annotation is a good temporary solution. Setting a mask for FQNs of classes that should be minified in Gradle script is even cooler.

feature

All 5 comments

See also: #33

We do not plan to introduce this feature for now, mainly because of the philosophy "what you see in definition of class in Kotlin code is what you get in the JSON output". Although minification is an interesting use-case for naming policies, why don't use binary formats (cbor, protobuf) if one want to minimize output size?

@sandwwraith

cbor, protobuf

They don't solve the problem because they contain full name of classes too. Consider the example:

@Serializable
sealed class MySealed

@Serializable
data class MyTypeWithLongName(val argumentName: String) : MySealed()

fun main() {
  val protoBuf = ProtoBuf()
  val code = protoBuf.dump(MySealed.serializer(), MyTypeWithLongName("abc"))
  println(code.joinToString { "${it.toChar()} ($it)" })
}

Output is:

 (10),  (18), M (77), y (121), T (84), y (121), p (112), e (101), W (87), i (105), t (116), h (104), L (76), o (111), n (110), g (103), N (78), a (97), m (109), e (101),  (18),  (5), 
 (10),  (3), a (97), b (98), c (99)

It will be even bigger in normal life when I have a long package name also.

However, with minification that I must do manually now like this

@Serializable
sealed class MySealed

@Serializable
@SerialName("a")
data class MyTypeWithLongName(val argumentName: String) : MySealed()

fun main() {
  val protoBuf = ProtoBuf()
  val code = protoBuf.dump(MySealed.serializer(), MyTypeWithLongName("abc"))
  println(code.joinToString { "${it.toChar()} ($it)" })
}

The output is about 3 times smaller:

 (10),  (1), a (97),  (18),  (5), 
 (10),  (3), a (97), b (98), c (99)

They don't solve the problem because they contain full name of classes too. Consider the example:

Not when using the @ProtoNumber annotation, however the problem of still persists of having to annotate every field, but at least it'd be more robust using integers than chars as strings.

If output size is really that important why bother encoding serial names altogether and serialize sequentially instead?

serialize sequentially

What do you mean?

@SerVB Sequential serialization is where you don't write any names at all. You just write the minimum needed (for polymorphism you still need a discriminator, unfortunately). Of course such serialization is not particularly robust (but neither is using arbitrary letters as names). Formats such as ProtoBuf, Json and XML are designed to be robust and allow for backwards (and forwards) compatibility. Btw. in terms of file/message size, repeating names compress very very well - the main issue would be parsing speed, but for something like that you would go to a binary format anyway.

Was this page helpful?
0 / 5 - 0 ratings