Confluent-kafka-dotnet: [enhancement] AvroSerializer to work without schema registration (schema.registry.url config) & not to put magic byte and 4 bytes for Schema ID in the beginning of serialized data

Created on 9 May 2018  路  6Comments  路  Source: confluentinc/confluent-kafka-dotnet

Description

AvroSerilizer enhancements to work with no need of schema registration (schema.registry.url config) & not put magic byte and 4 bytes for Schama ID in the begnining of Binary stream.

How to reproduce

1) AvroSerializer both specific and generic needs always schema registration to be specified. For some projects, the producer and consumers need not need to use schema registry URI as it may not be needed (For reasons like schema will not change etc)

Confluent.Kafka.Serialization.AvroSerializer is not going to work because without schema.registry.url config property mentioned the KAFKA producer creation fails with an error.

2) Also how about making schema registration process completely optional. Currently Confluent.Kafka.Serialization.AvroSerializer adds 4 bytes of information to the beginning of Binary stream to indicate schema ID. So can this kept configurable something like the schema ID info will be added to serialized form only if schema registration URI is mentioned?

3) Also to make it compatible with Apache Avro serializer, adding 00 magic byte as first-byte serialization data is to indicate this is from KAFKA platform, can this be again kept configurable. Configuraing false to a new key config will not put 00 magic byte as first byte of information into serialized data.

Checklist

Please provide the following information:

  • [x] Confluent.Kafka nuget version: 0.11.4
  • [ ] Apache Kafka version:
  • [ ] Client configuration:
  • [ ] Operating system:
  • [ ] Provide logs (with "debug" : "..." as necessary in configuration)
  • [ ] Provide broker log excerpts
  • [ ] Critical issue
enhancement

Most helpful comment

it's unlikely we'll ever implement this (refer to previous comments). if anyone else is interested please +1 / chime in here.

All 6 comments

have you considered protobuf? i think it's probably unwise to be transmitting avro serialized data around completely detached from the associated schema since this is required to make sense of it. by contrast, protobuf can be deserialized without the writer schema present and is probably a better fit for the scenario you describe. it's not too hard to implement a protobuf serializer / deserializer. I have already done this in fact, though it's not been contributed to this project yet, partly because there is the open question of whether there will ever be protobuf integration with schema registry and if so what that might look like.

Avro now has an official specification for this,
https://avro.apache.org/docs/1.8.2/spec.html#single_object_encoding

This would allow the object to be deserialized without custom confluent code. You'd still need the schema of course.

interesting, I was not aware of that. either way, you need some mechanism to manage the schemas. I'm not aware of any existing solution that does this using the official Avro encoding, so you'd need to build it yourself (possibly as part of your application).

Since it's part of the Avro standard, we'd be happy to accept pull requests to add it as a serializer/deserializer configuration parameter.

I do prefer the Confluent Schema Registry way - it's more straightforward and requires less overhead.

Oh, and if you really don't want to use Confluent Schema Registry for some reason, you'd just need to implement ISchemaRegistryClient and pass your custom implementation to the constructor of AvroSerializer / AvroDeserializer.

Ah yes, I didn't mean a replacement for Confluent Schema Registry, just as an alternative to writing the magic header.

it's unlikely we'll ever implement this (refer to previous comments). if anyone else is interested please +1 / chime in here.

Was this page helpful?
0 / 5 - 0 ratings