Ksql: Support for UNION types in Avro

Created on 19 Sep 2018  路  5Comments  路  Source: confluentinc/ksql

Some customers create topics with complex schemas using Avro that include UNION (choice) fields.
It would be useful to be able to query these UNION fields, for example to only show messages whose field is of a particular type.

avro data-accessibility enhancement

Most helpful comment

Should totally support this by just adding the superset of columns from all types in the union.

https://martinfowler.com/eaaCatalog/singleTableInheritance.html

With Schema Registry's new support for schema references more and more users will be using Unions to allow topics to receive different event types, so ksqlDB not supporting Unions/OneOfs is going to become a bigger issue.

All 5 comments

Not supporting UNION AVRO is very limiting for KSQL. We have a scenario where we have a message with some fixed fields and a map of additional fields whose values are constrained by the AVRO UNION type. Specifically it looks like this.

"fields":[
    {"name":"Key","type":"string"},
    {"name":"Timestamp","type":"long"},
    {"name":"Attributes","type":{"type":"map","values":
       ["string","float","double","int", 
         {"type":"long",
          "connect.version":1,
          "connect.name":"org.apache.kafka.connect.data.Timestamp",
          "logicalType":"timestamp-millis"
         }
       ]
     }]

We can create a stream in KSQL so long as it doesn't contain the Attributes map. This is because the map type can only be specified as MAP, where ValueType needs to be a primitive type. It doesn't support value types where an AVRO UNION is included as above. However, the attributes map is the main body of the data and therefore we can't use KSQL at all to interrogate the data at all. This rules KSQL out for us until this can be supported.

The only alternative we would have is to publish our data in a MAP and convert all values to strings. We don't want to lose type information so doing that would be too much of a compromise.

Please can you support AVRO UNION types in general but specifically as values in a Map?

cc @MichaelDrogalis @derekjn @apurvam in case we want to prioritize this on our roadmap.

Protobuf and JSON Schema both have an equivalent "oneof" construct.

Unions/oneofs will be more important now that Schema Registry supports references. Using unions with references is to be preferred over using RecordNamingStrategy when storing multiple schema types in the same topic (see https://github.com/confluentinc/ksql/issues/1267).

Should totally support this by just adding the superset of columns from all types in the union.

https://martinfowler.com/eaaCatalog/singleTableInheritance.html

With Schema Registry's new support for schema references more and more users will be using Unions to allow topics to receive different event types, so ksqlDB not supporting Unions/OneOfs is going to become a bigger issue.

Here's a blog post describing how to store multiple event types in the same topic using unions/oneofs. Having union support in ksqlDB would allow such topics to be queried.

https://www.confluent.io/blog/multiple-event-types-in-the-same-kafka-topic/

Was this page helpful?
0 / 5 - 0 ratings