One: [onert] Implementing front end code for BatchMatMulV2 custom ops

Created on 20 May 2020  路  17Comments  路  Source: Samsung/ONE

I am trying to implement BatchMatMulV2 ops for cpu backend. As part of this task, I need to add support for this operator on frontend, specifically at base_loader.h. At present, I see the following function to handle custom operator loading:

template <typename LoaderDomain, typename SpecificLoader>
void BaseLoader<LoaderDomain, SpecificLoader>::loadCustom(const Operator *op, ir::Graph &subg)

My question comes in two parts, 1. BatchMatMulV2-specific and 2. general query on custom ops. They are as follows:

  1. How are the parameters of the BatchMatMulV2 operator (adj1 and adj2) represented in the flatbuffer format, and how can I parse them into userdata that probably goes as params?

  2. Should we implement a loadCustom<your_custom_op_name> function for every new custom op?

Could anyone please let me know? Thanks in advance.

help wanted question typdiscussion

Most helpful comment

@s-barannikov Thank you for your opinion. Introducing a new operator involves a lot of changes in our project. nncc compiler, tools, viewer, loader, kernel, documentation, and so on. In addition, I don't want to expose something if it is likely to be removed even under experimental. Also, note we don't have enough slot for operator, it only has 254 slots including growing tensorflow lite's operator slots.

We can work with BatchMatMulv2 without exposing much. We may expose it as our circle built-in operator if it turned out necessary.

All 17 comments

If you try to implement tflite loader,

How are the parameters of the BatchMatMulV2 operator (adj1 and adj2) represented in the flatbuffer format, and how can I parse them into userdata that probably goes as params?

You should check type and size of each parameter, then parse from custom parameter yourself. There is no schema, so you need to check tensorflow's implementation (and tflite converter implementation if need)

Should we implement a loadCustom function for every new custom op?

We need to implement functions to parse I/O and parameter for each custom operation.

Implementation for custom OPs don't have general implementation form yet such as general operations. I'll start to implement All (reduction_all) operation to prepare infra for test, loader, IR, and backend.

If I'm not mistaken we already have support for custom operations. Do we need yet another one?

Thanks a lot for your detailed replies. I understand there is already support for custom operations, with parameters moved under userdata. I would still need some clarity on the subject of parsing parameters, generating IR for new custom op, and providing backend support. For instance, I see that OperationFactory.cc maps on to NNAPI-supported operations, whereas custom ops by definition, are not covered by NNAPI (please correct me if I am wrong here). As mentioned, I will start by checking with TF implementation for parsing parameters. Thanks again.

I can't tell anything regarding NNAPI, but tflite have specific opcode for that purpose: BuiltinOperator_CUSTOM. And it's handled in tflite loader.

userdata in tflite have flexbuffers format. At least toco uses it to export tensorflow operations not yet supported by the tflite interpreter. We on the other hand not required to use flexbuffers. Ignoring endianness and other cross-platform issues( if we know target platform for compiled network ) we can even use this buffer as a raw c struct storage and use it like this in code

struct OpParams* p = (OpParams*)userdata;

If we know specification of the "custom" operation, it is no longer "custom". Why don't just add the op to the schema if we know its specification?

The intention of adding _custom_ op was to allow user write _their own_ implementation (in form of shared library) _without_ having to modify the source code of the project.

@toomuchsalt, @s-barannikov , thanks for the explanation. From what I gather, the frontend work, i.e. generating IR for custom ops is already taken care of.

The larger context for my questions was the backend implementation for custom ops. So, I looked up the details for custom ops under /cpu/KernelGenerator.cc, and got references to KernelBuilder, which is probably where I should start. @hseok-oh , could you please confirm whether this is indeed the right starting point for backend implementation?

Onert cpu backend already have support for custom operations. All you need to do is provide a symbol which should be loaded by tflite importer. The only thing what is missing in onert is shape inference for custom operations as onert( by its old name ) had no support for shape inference at that time

Runtime will define its own internal IR for each custom operation (like other built-in operations). So it may not difficult to implement backend if contributor understand operation well.

Issue may be loader and test framework.

@toomuchsalt , @hseok-oh , thanks again for the explanation. To understand how custom ops are implemented in backend, I looked at tests/custom_op/FillFrom/FillFrom_runner.cc and tests/custom_op/FillFrom/kernels/FillFromKernel.cc. I see a two-step process as follows:

  1. register custom kernels in FillFrom_runner.cc,
  2. when a session is run, KernelGenerator visits the custom ops, fetches the eval from registry and calls eval->run().

As you mentioned test framework, I was thinking of adding my files for BatchMatMulV2 (a runner and a kernel) under a new ONE/tests/custom_op/BatchMatMulV2 folder. In doing so, I realized that I may have to generate the tflite file. I looked up the docs for tflchef, and am now wondering if I have to write the recipe manually, or if there is an alternative. Could you kindly let me know? Thanks again for your patience and help in answering my queries.

@dr-venkman Supporting custom operation for internal model is differnet with FillFrom. As I commented above,

Runtime will define its own internal IR

runtime will support tflite (and circle) custom op by using internal IR and kernel, not custom kernel.

These operation will not defined in circle schema, so there may no plan to handle these operations as built-in operation in tflchef or other compiler's modules now.

IMO, I recommand you to postpone this issue and handle other issue until other reference PR is ready because this item (custom operation) require more understanding about runtime internal and history.

These operation will not defined in circle schema

Why not?

@s-barannikov We introduce circle only operator when we are sure it is necessary. At this time, I am not sure.

@glistening What can possibly go wrong if we add it to schema? We could mark it as "experimental" operation if you have concerns about backward compatibility.

@s-barannikov Thank you for your opinion. Introducing a new operator involves a lot of changes in our project. nncc compiler, tools, viewer, loader, kernel, documentation, and so on. In addition, I don't want to expose something if it is likely to be removed even under experimental. Also, note we don't have enough slot for operator, it only has 254 slots including growing tensorflow lite's operator slots.

We can work with BatchMatMulv2 without exposing much. We may expose it as our circle built-in operator if it turned out necessary.

@dr-venkman I forgot you tried to implement BatchMatMulV2. If you want to try implement BatchMatMulV2 now, I'll stop my task #1753. Otherwise, I'll continue my task.

I'll continue task #1753

@hseok-oh , sorry about my delayed response. I am currently implementing shape inferencing rules for Transpose. Please go ahead with task #1753.

Was this page helpful?
0 / 5 - 0 ratings

Related issues

mhs4670go picture mhs4670go  路  3Comments

seanshpark picture seanshpark  路  3Comments

KimDongEon picture KimDongEon  路  4Comments

periannath picture periannath  路  3Comments

hasw7569 picture hasw7569  路  4Comments