Ksql: Unable to create JSON stream with Schema Registry

Created on 17 Mar 2020  路  8Comments  路  Source: confluentinc/ksql

Describe the bug

On the latest master, when KSQL is started with Schema Registry, CREATE STREAM statements with value format JSON fail with the error message Could not register schema for topic. which is a bug since the JSON format should work independent of Schema Registry (no schemas should be registered at all).

To Reproduce

On the current master:

  • Enable Schema Registry integration by uncommenting ksql.schema.registry.url=http://localhost:8081 from the server properties file
  • Start the ksqlDB server
  • Create a topic, e.g., locations
  • Start the CLI and issue a CREATE STREAM statement such as CREATE STREAM riderLocations (profileId VARCHAR, latitude DOUBLE, longitude DOUBLE) WITH (kafka_topic='locations', value_format='json');

Expected behavior

The stream should be created successfully.

Actual behaviour

The CLI shows the following error message Could not register schema for topic.

There's nothing obvious in the server logs with the default logging configs.

Additional context

I think this has to do with the recently added support for JSON with Schema Registry. It's as if the JSON format is being interpreted as JSON_SR and something is going wrong when attempting to register a schema. Need to debug further to understand what's going on.

blocker bug

Most helpful comment

Cool. Thanks for the details. The quick fix is fine by me. Though I鈥檓 not sure that #4717 is on 5.5

All 8 comments

Looked into this with @agavra and found the issue:

The intended behavior is that both JSON and JSON_SR formats support schema inference when Schema Registry is configured, and both also register schemas to Schema Registry (for inference by other streams/tables down the road). The difference between the two formats is that JSON_SR serializes data with the Schema Registry magic byte prepended, whereas the regular JSON format serializes data without the magic byte (as vanilla JSON).

However, pre-5.5 versions of Schema Registry do not support JSON schemas, which means the current behavior breaks compatibility of ksqlDB with older Schema Registry versions since when Schema Registry is configured and a JSON stream is created, ksqlDB will try to register the schema with Schema Registry, which throws an exception (Unrecognized field: schemaType; error code: 422) since JSON schemas are not supported.

Options going forward include:

  • A quick fix of removing schema inference for the JSON format, which means the JSON format also won't attempt to register schemas with Schema Registry, and backwards compatibility will be restored.
  • A more involved fix of detecting old Schema Registry versions that do not support JSON and not registering JSON schemas only in this case. Bonus points for also adding a config to control whether the JSON format registers schemas with schema registry, in case users of newer Schema Registry versions just want vanilla JSON and don't want JSON schemas registered.

Chatting with @agavra we think it makes sense to pursue the quick fix for the ksqlDB 0.8.0 and CP 5.5.0 releases, and looking into the more involved fix for future releases. WDYT? @MichaelDrogalis @derekjn @apurvam ?

UPDATE: This is only a problem on master and not 5.5 (see discussion below).

So with the quick fix, JSON format will behave exactly as in previous versions: no schema inference, no backward compatibility checks,etc

The JSON_SR format has all those goodies, and only works with CP 5.5.

What is the error message if JSON_SR is used with older CP versions? Seems to me without version detection and with lazy registration of schemas we will not be able to provide good UX right ?

Finally, how did we find thid?

So with the quick fix, JSON format will behave exactly as in previous versions: no schema inference, no backward compatibility checks,etc

The JSON_SR format has all those goodies, and only works with CP 5.5.

Correct.

What is the error message if JSON_SR is used with older CP versions? Seems to me without version detection and with lazy registration of schemas we will not be able to provide good UX right ?

Schemas are now registered at topic creation time, not lazily (see https://github.com/confluentinc/ksql/pull/4717), which is why the bug reported in this issue causes the CREATE STREAM statement to fail.

The error message if JSON_SR is used with an older CP version is the same as the one in this bug report: Could not register schema for topic.

Finally, how did we find this?

After I cut a candidate release image I tried running through the ksqlDB quickstart as a sanity check, which failed on the first statement since I was using a docker compose file with 5.4.1 Schema Registry.

Cool. Thanks for the details. The quick fix is fine by me. Though I鈥檓 not sure that #4717 is on 5.5

Confirmed that this isn't an issue in 5.5 (I was sure I had tested exactly this!) - it was introduced by #4717

ksql> CREATE STREAM json (id VARCHAR) WITH (kafka_topic='json', value_format='JSON', partitions=1);

 Message
----------------
 Stream created
----------------
ksql> CREATE STREAM json_sr (id VARCHAR) WITH (kafka_topic='json_sr', value_format='JSON_SR', partitions=1);

 Message
----------------
 Stream created
----------------
ksql> INSERT INTO json (id) VALUES ('id');
ksql> INSERT INTO json_sr (id) VALUES ('id');
Failed to insert values into 'JSON_SR'. Could not serialize row: [ 'id' ]

The second fails because SR deployed is 5.4 (error message is not great, but there's a separate ticket to fix that).

Good call -- I'll close the change I targeted at 5.5, and only merge the one targeted at master. Thanks for the catch!

Awesome! Thanks @vcrfxia and @agavra !

Closing this issue since the quick fix has been implemented. Created another JIRA to track the more involved fix going forward: https://github.com/confluentinc/ksql/issues/4802

Was this page helpful?
0 / 5 - 0 ratings