Incubator-mxnet: [RFC] Custom Operator Part 2

Created on 7 Dec 2019  路  7Comments  路  Source: apache/incubator-mxnet

Description

Request for comments on the next PR for enhancing custom operator support

  • ~custom GPU operators~ (started in #17270)
  • ~Random number generator resource request~ (#17762)
  • ~sparse data types~ #17569
  • migrate lambda functions in MXLoadLib in src/c_api/c_api.cc to classes defined elsewhere
  • Documentation, add the "library" python package to the namespace to the doc: https://mxnet.apache.org/api/python/docs/api/ ?
  • Use struct to reduce args to _opCallFCompute function

References

  • initial PR (Part 1): #15921
Feature request RFC

Most helpful comment

@larroy Users may need matrix operators and DNN Op(e.g. ReLU, Conv) when writing a custom Op. Although they can implement it by third-party libraries, it is more convenient to use the built-in functions in MXNet.

All 7 comments

Hi @samskalicky , thank you for the contribution!
I have several suggestions.

  • custom GPU operators

    1. Provide CUDA stream in OpResource.

    2. Share the same function on CPU and GPU.

      Users can discriminate the context by MXTensor::dltensor::ctx

  • Call framework specific math helper
    It is important for a custom operator. Users may call gemm, even convolution op in custom op.

Thanks.

Need to include a fix for the test error https://github.com/apache/incubator-mxnet/pull/15921#pullrequestreview-328686634

@wkcn could you explain your suggestion? calling gemm back into the framework which gets dispatched to GPU or CPU?

We should create a namespace for the stuff in the lib_api.h file as suggested by @larroy:
https://github.com/apache/incubator-mxnet/pull/15760/files#r311756416

@larroy Users may need matrix operators and DNN Op(e.g. ReLU, Conv) when writing a custom Op. Although they can implement it by third-party libraries, it is more convenient to use the built-in functions in MXNet.

Custom ops should be able to set the inplace property.

Speed. All those std::string and std::unordered_map objects don't come cheaply.

I compared an integrated fork with a custom operator.

https://github.com/kpuatamazon/incubator-mxnet/tree/intgemm integrated version end-to-end Sockeye performance (based on 1.6.0):

real    2m57.962s
user    7m3.986s
sys 0m6.724s

Custom operator version (based on 1.7.x. because it had to be for custom operators):

real    3m16.879s
user    7m43.727s
sys 0m8.273s

Conditions:
unset MXNET_ENGINE_TYPE; export OMP_NUM_THREADS=2; numactl -C 0-7 translate.sh
Both were compiled with the MKL backend hack for the remaining fp32 operations.

Was this page helpful?
0 / 5 - 0 ratings

Related issues

xzqjack picture xzqjack  路  3Comments

phunterlau picture phunterlau  路  3Comments

WangcsShuai picture WangcsShuai  路  3Comments

ranti-iitg picture ranti-iitg  路  3Comments

yuconglin picture yuconglin  路  3Comments