One: [onert] Implementing front end code for BatchMatMulV2 custom ops

Created on 20 May 2020 · 17Comments · Source: Samsung/ONE

I am trying to implement BatchMatMulV2 ops for cpu backend. As part of this task, I need to add support for this operator on frontend, specifically at base_loader.h. At present, I see the following function to handle custom operator loading:

template <typename LoaderDomain, typename SpecificLoader>
void BaseLoader<LoaderDomain, SpecificLoader>::loadCustom(const Operator *op, ir::Graph &subg)

My question comes in two parts, 1. BatchMatMulV2-specific and 2. general query on custom ops. They are as follows:

How are the parameters of the BatchMatMulV2 operator (adj1 and adj2) represented in the flatbuffer format, and how can I parse them into userdata that probably goes as params?
Should we implement a loadCustom<your_custom_op_name> function for every new custom op?

Could anyone please let me know? Thanks in advance.

help wanted question typdiscussion

Source

dr-venkman

Most helpful comment

@s-barannikov Thank you for your opinion. Introducing a new operator involves a lot of changes in our project. nncc compiler, tools, viewer, loader, kernel, documentation, and so on. In addition, I don't want to expose something if it is likely to be removed even under experimental. Also, note we don't have enough slot for operator, it only has 254 slots including growing tensorflow lite's operator slots.

We can work with BatchMatMulv2 without exposing much. We may expose it as our circle built-in operator if it turned out necessary.

glistening on 26 May 2020

👍2 😕1

All 17 comments

If you try to implement tflite loader,

How are the parameters of the BatchMatMulV2 operator (adj1 and adj2) represented in the flatbuffer format, and how can I parse them into userdata that probably goes as params?

You should check type and size of each parameter, then parse from custom parameter yourself. There is no schema, so you need to check tensorflow's implementation (and tflite converter implementation if need)

Should we implement a loadCustom function for every new custom op?

We need to implement functions to parse I/O and parameter for each custom operation.

Implementation for custom OPs don't have general implementation form yet such as general operations. I'll start to implement All (reduction_all) operation to prepare infra for test, loader, IR, and backend.

hseok-oh on 20 May 2020

If I'm not mistaken we already have support for custom operations. Do we need yet another one?

toomuchsalt on 20 May 2020

Thanks a lot for your detailed replies. I understand there is already support for custom operations, with parameters moved under userdata. I would still need some clarity on the subject of parsing parameters, generating IR for new custom op, and providing backend support. For instance, I see that OperationFactory.cc maps on to NNAPI-supported operations, whereas custom ops by definition, are not covered by NNAPI (please correct me if I am wrong here). As mentioned, I will start by checking with TF implementation for parsing parameters. Thanks again.

dr-venkman on 20 May 2020

I can't tell anything regarding NNAPI, but tflite have specific opcode for that purpose: BuiltinOperator_CUSTOM. And it's handled in tflite loader.

userdata in tflite have flexbuffers format. At least toco uses it to export tensorflow operations not yet supported by the tflite interpreter. We on the other hand not required to use flexbuffers. Ignoring endianness and other cross-platform issues( if we know target platform for compiled network ) we can even use this buffer as a raw c struct storage and use it like this in code

struct OpParams* p = (OpParams*)userdata;

toomuchsalt on 20 May 2020

If we know specification of the "custom" operation, it is no longer "custom". Why don't just add the op to the schema if we know its specification?

The intention of adding _custom_ op was to allow user write _their own_ implementation (in form of shared library) _without_ having to modify the source code of the project.

s-barannikov on 20 May 2020

@toomuchsalt, @s-barannikov , thanks for the explanation. From what I gather, the frontend work, i.e. generating IR for custom ops is already taken care of.

The larger context for my questions was the backend implementation for custom ops. So, I looked up the details for custom ops under /cpu/KernelGenerator.cc, and got references to KernelBuilder, which is probably where I should start. @hseok-oh , could you please confirm whether this is indeed the right starting point for backend implementation?

dr-venkman on 21 May 2020

Onert cpu backend already have support for custom operations. All you need to do is provide a symbol which should be loaded by tflite importer. The only thing what is missing in onert is shape inference for custom operations as onert( by its old name ) had no support for shape inference at that time

toomuchsalt on 21 May 2020

Runtime will define its own internal IR for each custom operation (like other built-in operations). So it may not difficult to implement backend if contributor understand operation well.

Issue may be loader and test framework.

hseok-oh on 22 May 2020

@toomuchsalt , @hseok-oh , thanks again for the explanation. To understand how custom ops are implemented in backend, I looked at tests/custom_op/FillFrom/FillFrom_runner.cc and tests/custom_op/FillFrom/kernels/FillFromKernel.cc. I see a two-step process as follows:

register custom kernels in FillFrom_runner.cc,
when a session is run, KernelGenerator visits the custom ops, fetches the eval from registry and calls eval->run().

As you mentioned test framework, I was thinking of adding my files for BatchMatMulV2 (a runner and a kernel) under a new ONE/tests/custom_op/BatchMatMulV2 folder. In doing so, I realized that I may have to generate the tflite file. I looked up the docs for tflchef, and am now wondering if I have to write the recipe manually, or if there is an alternative. Could you kindly let me know? Thanks again for your patience and help in answering my queries.

dr-venkman on 22 May 2020

@dr-venkman Supporting custom operation for internal model is differnet with FillFrom. As I commented above,

Runtime will define its own internal IR

runtime will support tflite (and circle) custom op by using internal IR and kernel, not custom kernel.

These operation will not defined in circle schema, so there may no plan to handle these operations as built-in operation in tflchef or other compiler's modules now.

IMO, I recommand you to postpone this issue and handle other issue until other reference PR is ready because this item (custom operation) require more understanding about runtime internal and history.

hseok-oh on 22 May 2020

These operation will not defined in circle schema

Why not?

s-barannikov on 22 May 2020

@s-barannikov We introduce circle only operator when we are sure it is necessary. At this time, I am not sure.

glistening on 25 May 2020

@glistening What can possibly go wrong if we add it to schema? We could mark it as "experimental" operation if you have concerns about backward compatibility.

s-barannikov on 26 May 2020

We can work with BatchMatMulv2 without exposing much. We may expose it as our circle built-in operator if it turned out necessary.

glistening on 26 May 2020

👍2 😕1

@dr-venkman I forgot you tried to implement BatchMatMulV2. If you want to try implement BatchMatMulV2 now, I'll stop my task #1753. Otherwise, I'll continue my task.

hseok-oh on 3 Jun 2020

I'll continue task #1753

hseok-oh on 5 Jun 2020

@hseok-oh , sorry about my delayed response. I am currently implementing shape inferencing rules for Transpose. Please go ahead with task #1753.

dr-venkman on 5 Jun 2020

😄1

Was this page helpful?

0 / 5 - 0 ratings