Pulsar: Shading Avro in pulsar-client

Created on 6 Mar 2019  路  6Comments  路  Source: apache/pulsar

Describe the bug

Because we shade Avro in the pulsar-client, the ReflectData API to get a schema from a POJO fails to get the correct schema for generated Avro java classes

To Reproduce

Given an avro schema:

{
  "namespace": "example.avro",
  "type": "record",
  "name": "User",
  "fields": [
    {
      "name": "name",
      "type": "string"
    },
    {
      "name": "favorite_number",
      "type": [
        "int",
        "null"
      ]
    },
    {
      "name": "favorite_color",
      "type": "string"
    },
    {
      "name": "age",
      "type": "int",
      "default": 19
    }
  ]
}

Passing the generated java class from the above schema to the AvroSchema in pulsar-client produces the following result:

System.out.println(new String(org.apache.pulsar.client.impl.schema.AvroSchema.of(User.class).getSchemaInfo().getSchema()));

outputs:

{"type":"record","name":"User","namespace":"example.avro","fields":[{"name":"name","type":["null","string"],"default":null},{"name":"favorite_number","type":["null","int"],"default":null},{"name":"favorite_color","type":["null","string"],"default":null},{"name":"age","type":"int"}]}

which is incorrect while

System.out.println(ReflectData.get().getSchema(User.class));

outputs the correct schema that matches the original schema:

{"type":"record","name":"User","namespace":"example.avro","fields":[{"name":"name","type":"string"},{"name":"favorite_number","type":["int","null"]},{"name":"favorite_color","type":"string"},{"name":"age","type":"int","default":19}]}

Here is the line in Avro responsible for getting the schema from generated classes:
https://github.com/apache/avro/blob/release-1.8.2/lang/java/avro/src/main/java/org/apache/avro/specific/SpecificData.java#L277

Here is a project I created that demonstrates this problem:

https://github.com/jerrypeng/TestAvro

componenclient componenschemaregistry typbug

Most helpful comment

I'd like to at least have the option to use the Java Pulsar client with an unshaded Avro. We use Avro extensively with Kafka, and while we migrate to Pulsar, its imperative we have compatibility at the Avro message level, which means using the same version of Avro.

Here is an issue we are running into because our messages use Avro 1.9.1 which supports the java.time APIs:

java.lang.RuntimeException: java.lang.NoSuchMethodException: java.time.Instant.<init>()
    at org.apache.pulsar.shade.org.apache.avro.specific.SpecificData.newInstance(SpecificData.java:353) ~[pulsar-client-2.4.1.jar:2.4.1]
    at org.apache.pulsar.shade.org.apache.avro.specific.SpecificData.newRecord(SpecificData.java:369) ~[pulsar-client-2.4.1.jar:2.4.1]
    at org.apache.pulsar.shade.org.apache.avro.reflect.ReflectData.newRecord(ReflectData.java:901) ~[pulsar-client-2.4.1.jar:2.4.1]
    at org.apache.pulsar.shade.org.apache.avro.generic.GenericDatumReader.readRecord(GenericDatumReader.java:212) ~[pulsar-client-2.4.1.jar:2.4.1]
    at org.apache.pulsar.shade.org.apache.avro.generic.GenericDatumReader.readWithoutConversion(GenericDatumReader.java:175) ~[pulsar-client-2.4.1.jar:2.4.1]
    at org.apache.pulsar.shade.org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:153) ~[pulsar-client-2.4.1.jar:2.4.1]
    at org.apache.pulsar.shade.org.apache.avro.generic.GenericDatumReader.readWithoutConversion(GenericDatumReader.java:179) ~[pulsar-client-2.4.1.jar:2.4.1]
    at org.apache.pulsar.shade.org.apache.avro.reflect.ReflectDatumReader.readField(ReflectDatumReader.java:302) ~[pulsar-client-2.4.1.jar:2.4.1]
    at org.apache.pulsar.shade.org.apache.avro.generic.GenericDatumReader.readRecord(GenericDatumReader.java:222) ~[pulsar-client-2.4.1.jar:2.4.1]
    at org.apache.pulsar.shade.org.apache.avro.generic.GenericDatumReader.readWithoutConversion(GenericDatumReader.java:175) ~[pulsar-client-2.4.1.jar:2.4.1]
    at org.apache.pulsar.shade.org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:153) ~[pulsar-client-2.4.1.jar:2.4.1]
    at org.apache.pulsar.shade.org.apache.avro.generic.GenericDatumReader.readWithoutConversion(GenericDatumReader.java:179) ~[pulsar-client-2.4.1.jar:2.4.1]
    at org.apache.pulsar.shade.org.apache.avro.reflect.ReflectDatumReader.readField(ReflectDatumReader.java:302) ~[pulsar-client-2.4.1.jar:2.4.1]
    at org.apache.pulsar.shade.org.apache.avro.generic.GenericDatumReader.readRecord(GenericDatumReader.java:222) ~[pulsar-client-2.4.1.jar:2.4.1]
    at org.apache.pulsar.shade.org.apache.avro.generic.GenericDatumReader.readWithoutConversion(GenericDatumReader.java:175) ~[pulsar-client-2.4.1.jar:2.4.1]
    at org.apache.pulsar.shade.org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:153) ~[pulsar-client-2.4.1.jar:2.4.1]
    at org.apache.pulsar.shade.org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:145) ~[pulsar-client-2.4.1.jar:2.4.1]
    at org.apache.pulsar.client.impl.schema.reader.AvroReader.read(AvroReader.java:52) ~[pulsar-client-2.4.1.jar:2.4.1]
    at org.apache.pulsar.client.impl.schema.StructSchema.decode(StructSchema.java:94) ~[pulsar-client-2.4.1.jar:2.4.1]
    at org.apache.pulsar.client.impl.MessageImpl.getValue(MessageImpl.java:270) ~[pulsar-client-2.4.1.jar:2.4.1]
    at org.apache.pulsar.client.impl.TopicMessageImpl.getValue(TopicMessageImpl.java:143) ~[pulsar-client-2.4.1.jar:2.4.1]
    [...]

All 6 comments

@sijie @merlimat I can think of two ways we can solve this problem:

  1. We just unshade avro in the pulsar-client library. May conflict with a user imported library, but is anyone using older version avro i.e. 1.7 or older which was released in like 2014

  2. In our code we search for the "SCHEMA$" field in a generated avro class and get the schema ourselves.

Ideally I would prefer unshading avro, so people can even replace the avro version with a suitable and compatible version. However it has a lot of unknowns when those unknowns only happens when you hit them.

so the second approach sounds like a safer approach to me.

I'd like to at least have the option to use the Java Pulsar client with an unshaded Avro. We use Avro extensively with Kafka, and while we migrate to Pulsar, its imperative we have compatibility at the Avro message level, which means using the same version of Avro.

Here is an issue we are running into because our messages use Avro 1.9.1 which supports the java.time APIs:

java.lang.RuntimeException: java.lang.NoSuchMethodException: java.time.Instant.<init>()
    at org.apache.pulsar.shade.org.apache.avro.specific.SpecificData.newInstance(SpecificData.java:353) ~[pulsar-client-2.4.1.jar:2.4.1]
    at org.apache.pulsar.shade.org.apache.avro.specific.SpecificData.newRecord(SpecificData.java:369) ~[pulsar-client-2.4.1.jar:2.4.1]
    at org.apache.pulsar.shade.org.apache.avro.reflect.ReflectData.newRecord(ReflectData.java:901) ~[pulsar-client-2.4.1.jar:2.4.1]
    at org.apache.pulsar.shade.org.apache.avro.generic.GenericDatumReader.readRecord(GenericDatumReader.java:212) ~[pulsar-client-2.4.1.jar:2.4.1]
    at org.apache.pulsar.shade.org.apache.avro.generic.GenericDatumReader.readWithoutConversion(GenericDatumReader.java:175) ~[pulsar-client-2.4.1.jar:2.4.1]
    at org.apache.pulsar.shade.org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:153) ~[pulsar-client-2.4.1.jar:2.4.1]
    at org.apache.pulsar.shade.org.apache.avro.generic.GenericDatumReader.readWithoutConversion(GenericDatumReader.java:179) ~[pulsar-client-2.4.1.jar:2.4.1]
    at org.apache.pulsar.shade.org.apache.avro.reflect.ReflectDatumReader.readField(ReflectDatumReader.java:302) ~[pulsar-client-2.4.1.jar:2.4.1]
    at org.apache.pulsar.shade.org.apache.avro.generic.GenericDatumReader.readRecord(GenericDatumReader.java:222) ~[pulsar-client-2.4.1.jar:2.4.1]
    at org.apache.pulsar.shade.org.apache.avro.generic.GenericDatumReader.readWithoutConversion(GenericDatumReader.java:175) ~[pulsar-client-2.4.1.jar:2.4.1]
    at org.apache.pulsar.shade.org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:153) ~[pulsar-client-2.4.1.jar:2.4.1]
    at org.apache.pulsar.shade.org.apache.avro.generic.GenericDatumReader.readWithoutConversion(GenericDatumReader.java:179) ~[pulsar-client-2.4.1.jar:2.4.1]
    at org.apache.pulsar.shade.org.apache.avro.reflect.ReflectDatumReader.readField(ReflectDatumReader.java:302) ~[pulsar-client-2.4.1.jar:2.4.1]
    at org.apache.pulsar.shade.org.apache.avro.generic.GenericDatumReader.readRecord(GenericDatumReader.java:222) ~[pulsar-client-2.4.1.jar:2.4.1]
    at org.apache.pulsar.shade.org.apache.avro.generic.GenericDatumReader.readWithoutConversion(GenericDatumReader.java:175) ~[pulsar-client-2.4.1.jar:2.4.1]
    at org.apache.pulsar.shade.org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:153) ~[pulsar-client-2.4.1.jar:2.4.1]
    at org.apache.pulsar.shade.org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:145) ~[pulsar-client-2.4.1.jar:2.4.1]
    at org.apache.pulsar.client.impl.schema.reader.AvroReader.read(AvroReader.java:52) ~[pulsar-client-2.4.1.jar:2.4.1]
    at org.apache.pulsar.client.impl.schema.StructSchema.decode(StructSchema.java:94) ~[pulsar-client-2.4.1.jar:2.4.1]
    at org.apache.pulsar.client.impl.MessageImpl.getValue(MessageImpl.java:270) ~[pulsar-client-2.4.1.jar:2.4.1]
    at org.apache.pulsar.client.impl.TopicMessageImpl.getValue(TopicMessageImpl.java:143) ~[pulsar-client-2.4.1.jar:2.4.1]
    [...]

@codelipenghui I notice the issue was closed, but I doubt the fix will handle my situation. While it does at least get the actual underlying schema, the shaded Avro version is still the wrong version. Should I open another issue for a fix in which a variant of the client is published with an unshaded avro?

@rocketraman the avro version was changed to 1.9.1 in pulsar 2.5.0

@vzhikserg Thanks, that will work for me, though an underlying Avro upgrade probably would have been a good time to uncouple the Avro version from Pulsar to avoid this same problem in the future...

Was this page helpful?
0 / 5 - 0 ratings