Ksql: Handling of Stream/Table schemas if schema of source topic changes

Created on 29 Nov 2018  路  3Comments  路  Source: confluentinc/ksql

(source for discussion: https://stackoverflow.com/questions/53539878/how-to-update-ksql-stream-definition-dynamically-based-on-schema-registry)

The Schema Registry supports schema evolution, as does components such as Kafka Connect JDBC connector. How should KSQL handle it?

For example, topic A is Avro-serialised and has one column.

CREATE STREAM KSQL_STREAM_A WITH (KAFKA_TOPIC='A', VALUE_FORMAT'AVRO');
ksql> DESCRIBE KSQL_STREAM_A;

Name                 : KSQL_STREAM_A
 Field    | Type
--------------------------------------
 ROWTIME  | BIGINT           (system)
 ROWKEY   | VARCHAR(STRING)  (system)
 COL1     | VARCHAR(STRING)
--------------------------------------

If topic A gets a new schema version (e.g. in the database feeding it a new column is added to the table), should we expect KSQL to automagically recognise this and evolve the stream definition?

ksql> DESCRIBE KSQL_STREAM_A;

Name                 : KSQL_STREAM_A
 Field    | Type
--------------------------------------
 ROWTIME  | BIGINT           (system)
 ROWKEY   | VARCHAR(STRING)  (system)
 COL1     | VARCHAR(STRING)
 COL2     | VARCHAR(STRING)
--------------------------------------

What about dependent CSAS/CTAS queries, would they evolve too? Needs some thought, but in principle should there be an option to propogate forward compatible schema changes throughout the streams?

P0 avro data-accessibility enhancement

Most helpful comment

@rmoff Any update on the above issue? we are trying to create dynamic streams.

All 3 comments

@rmoff Any update on the above issue? we are trying to create dynamic streams.

This will be released as part of the 0.12.0 release, though with some limitations (see #5611 with how it will be expressed). Sources will not "automagically" pick up schema evolution, the operation will need to be a manual replace in order to ensure compatibility both of the immediate source that is being upgraded and all downstream queries.

The code for this is merged - you can now update DDL statements with CREATE OR REPLACE (Pending #6078 to have it enabled by default)

We can create a new ticket if we want the streams to update dynamically, though I have my doubts about automagically updating schemas 馃槈 we'll discuss on that ticket.

Was this page helpful?
0 / 5 - 0 ratings