Ksql: Support multi schema topics

Created on 3 May 2018  路  9Comments  路  Source: confluentinc/ksql

In a recent blog article, the Avro serializer in the Java client has been updated to support multiple schemas in a single topic: https://www.confluent.io/blog/put-several-event-types-kafka-topic/

When trying to use this with KSQL, if I try to create a table or stream with multiple schemas I get the following error message if I don't have a schema in the format {topic}-value:

message: " Could not fetch the AVRO schema from schema registry. Subject not found. io.confluent.rest.exceptions.RestNotFoundException: Subject not found."

Can KSQL support this functionality?

avro data-accessibility enhancement

Most helpful comment

Well, we are 2 years later, and I don't hear much moving on this front.
Are people really only having topics with only one single schema ?

All 9 comments

Hi @unfw . Currently KSQL expects all messages in a topic to have the same schema, and we don't support the updates described in that post.

Marking this an an enhancement request, though it will be quite complicated: a stream / table has a defined schema within KSQL, and those are depended upon by other queries. What would be the semantics if a message comes along with an incompatible schema? Should it silently be dropped? Etc.

Reference to the upstream schema registry PR that added support for multi schema topics https://github.com/confluentinc/schema-registry/pull/680

I have the similar needs and just hit this issue myself.

I have a Kafka Streams app that uses the Confluent Avro serdes for the value. My topics are configured with a value.subject.name.strategy of RecordNameStrategy. I鈥檇 like use the those output topics as backing topics in KSQL but I鈥檓 getting deserialization errors in KSQL, Subject not found.

In my case the individual schema are similar and I think I could design a table schema that would work for all, but I understand the challenge around consuming messages that are incompatible with the table/stream schema. It would be easy to publish messages with schemas that are not compatible.

Regarding semantics, in my case if I were to hit a compatibility issue and messages were silently dropped then I'd have a consistency issue. I've got an event-sourcing app and I'd like to join some aggregates for advanced usage. I can't drop an aggregate. With a bit more config I can force my aggregate topics to use the traditional subject naming strategy, but I'd do that knowing that I can't use the multi-schema event topics in KSQL until there's a path here.

I'm in early dev, and I may stick to Streams for now. But I'm glad I realized this early on. I need to consider my subject naming strategy and the tradeoffs I'm making regarding KSQL usage.

I'd like a solution to this as well -- the ability to put multiple event types into one topic is critical for proper ordering of aggregate events, as explained in the confluent blob post. However, I also understand the issue with compatibility with KSQL semantics.

I think a decent initial capability would be to simply allow the CREATE STREAM syntax to include a particular Avro subject/schema, and drop any data that does not match. Multiple such streams can then be joined as needed by the user, using the usual KSQL semantics.

I suppose the lack of this capability can be worked around via a KStream that simply splits the combined topic into multiple topics, one per schema, in advance of being processed by KSQL -- I haven't tried this yet but I don't see why it wouldn't work.

@apurvam The primary intent of confluentinc/schema-registry#680 was to support multiple schemas per topic. But there are other use cases, where users would still use a single schema but have a different SubjectNameStrategy. For e.g. RecordNameStrategy would let you use a single schema & subject across multiple topics. So, adding support for SubjectNameStrategy in KSQL doesn't really necessitate supporting multiple schemas. We should be able to add this within the current limitations.

Team,
Can i work on this PR, if someone hasn't started yet.
Depending upon 3 strategies mentioned in https://www.confluent.io/blog/put-several-event-types-kafka-topic/ , for value.subject.name.strategy and key.subject.name.strategy, we need to modify SCHEMA_REGISTRY_VALUE_SUFFIX https://github.com/confluentinc/ksql/blob/247735b224b2a4a080c8cf7c959b3402aa6b5ad2/ksql-engine/src/main/java/io/confluent/ksql/util/AvroUtil.java#L163
before sending HTTP request to schemaRegistryClient. Each strategy has their own SCHEMA_REGISTRY_VALUE_SUFFIX value.

I鈥檓 also facing the issue of not being able to consume a topic containing messages of different Avro schema types. I would like to store all events for an aggregate into a single topic, which is the only solution to ensure ordering of events.

Support for the RecordNamingStrategy would be great. I currently have a handful of topics that I can't do anything with in KSQL due to this limitation.

Maybe at a minimum you could just support the RecordNamingStrategy but still have the requirement that only one message type be on the topic. Require the engineer to split the different message types out onto different topics. It's not ideal, but it wouldn't be the worst workaround.

Well, we are 2 years later, and I don't hear much moving on this front.
Are people really only having topics with only one single schema ?

Was this page helpful?
0 / 5 - 0 ratings