Azure-sdk-for-js: [schema registry] Avsc details leak through prototypes given to deserialized objects

Created on 2 Oct 2020  路  8Comments  路  Source: Azure/azure-sdk-for-js

Forking distinct issue from #10950

Decide if it is OK that deserialized objects are given a prototype from avsc causing a slight observable difference between plain data going in and data coming back.

I debugged through this and put a repro of the underlying cause below.

Summary: avsc gives objects it deserializes a prototype that causes the result returned to have 7 additional inherited enumerable properties: clone, compare, isValid, toBuffer, toString, wrap, wrapped. This is what causes chai to throw on deepEqual comparison as it considers all enumerable properties, including functions AFAICT. JSON.stringify(left) === JSON.stringify(right), which was the workaround pending investigation holds because the extra enumerable properties are functions, which JSON.stringify skips.

So now the question remains: is this OK? I am slightly concerned that consumers of schema-registry-avro would take a dependency on these functions, which might prevent us from replacing the avro serializer with another implementation due to the risk of breaking such uses. But maybe that is too paranoid. @xirzec Thoughts?

So far it appears to be non-trivial to strip these properties out (or make them non-enumerable if we prefer) as the return value can be a graph of objects where each of the objects would have them. I'm looking into whether the resolver arg to fromBuffer might be able to allow me to do this cleanly.

Repro of underlying cause

import * as avro from "avsc";
import { assert } from "chai";

const schema: avro.schema.RecordType = {
  type: "record",
  name: "User",
  namespace: "com.azure.schemaregistry.samples",
  fields: [
    {
      name: "firstName",
      type: "string",
    },
    {
      name: "lastName",
      type: "string",
    },
  ],
};

const type = avro.Type.forSchema(schema);
const value = { firstName: "Nick", lastName: "Guerrera"};
const serialized = type.toBuffer(value); 
const deserialized = type.fromBuffer(serialized);

for (const key in value) {
    console.log(key);
}
console.log("");
for (const key in deserialized) {
    console.log(key);
}

assert.deepEqual(value, deserialized);
> node .\index.js
firstName
lastName

firstName
lastName
clone
compare
isValid
toBuffer
toString
wrap
wrapped

C:\Temp\rp\node_modules\chai\lib\chai\assertion.js:141
      throw new AssertionError(msg, {
      ^
AssertionError: expected { Object (firstName, lastName) } to deeply equal { Object (firstName, lastName) }
    at Object.<anonymous> (C:\Temp\rp\index.js:50:15)
    at Module._compile (internal/modules/cjs/loader.js:1075:30)
    at Object.Module._extensions..js (internal/modules/cjs/loader.js:1096:10)
    at Module.load (internal/modules/cjs/loader.js:940:32)
    at Function.Module._load (internal/modules/cjs/loader.js:781:14)
    at Function.executeUserEntryPoint [as runMain] (internal/modules/run_main.js:72:12)
    at internal/main/run_main_module.js:17:47 {
  showDiff: true,
  actual: { firstName: 'Nick', lastName: 'Guerrera' },
  expected: User { firstName: 'Nick', lastName: 'Guerrera' }
}

image

Client Schema Registry

Most helpful comment

Happy to help. Thanks for the kind words :).

All 8 comments

Looks like this logic mostly happens here: https://github.com/mtth/avsc/blob/f3c6b1c104815bf239b08311fab1c5dc1ec1290d/lib/types.js#L2185

It's somewhat vexing to me that the library author has no option to provide plain objects.

Hi there. If you update avsc to 5.5.1, you can use an option to omit these methods:

const avroType = avro.Type.forSchema(JSON.parse(schema), {omitRecordMethods: true});

Note that decoded record values will still have a named constructor for performance reasons. If you'd like to hide this constructor as well, you can do so with a type hook. For example to copy the values into plain objects:

/** Minimal logical type which returns decoded records as plain objects. */
class PlainRecordType extends avro.types.LogicalType {
  _fromValue(obj) { return {...obj}; }
  _toValue(obj) { return obj; }

  /** Returns a type hook wrapping all records and errors with this logical type. */
  static createHook() {
    const visited = new Set();
    return (schema, opts) => {
      const {name, type} = schema;
      if ((type !== 'record' && type !== 'error') || visited.has(name)) {
        return; // Fall back to default processing
      }
      visited.add(name);
      return new PlainRecordType(schema, opts);
    };
  }
}

Sample usage:

const plainType = avro.Type.forSchema({
  type: 'record',
  name: 'Person',
  fields: [{name: 'name', type: 'string'}],
}, {typeHook: PlainRecordType.createHook()});

const buf = plainType.toBuffer({name: 'Ann'});
const val = plainType.fromBuffer(buf);
console.log(val.constructor.name); // Object

You can tweak the logical type's implementation above to match the API you settle on: _fromValue generates the decoded values exposed to users, _toValue determines the data you'd like to accept when encoding. The logical type documentation has more information, including a couple examples.

@mtth Awesome, thanks!

@mtth you rock! Thank you so much.

I put up a PR with omitRecordMethods: https://github.com/Azure/azure-sdk-for-js/pull/11786

Worked like a charm. Thanks again, @mtth!

I didn't do the maneuver to drop the named constructor (yet). I wasn't sure it was necessary and there's some risk of losing perf to copying every object, right? @xirzec Do you think it's OK to leave the constructor being something else than Object?

I didn't do the maneuver to drop the named constructor (yet). I wasn't sure it was necessary and there's some risk of losing perf to copying every object, right? @xirzec Do you think it's OK to leave the constructor being something else than Object?

I'm not worried about that; I think JSON.stringify() will work fine without the Object constructor.

Happy to help. Thanks for the kind words :).

Was this page helpful?
0 / 5 - 0 ratings