As we also care about models that has a lot of operations but each has little calculation, it could be useful to have less overhead by doing the jobs at compilation time as much as possible. getTensorShape(#4544 ) would be the most critical part and it is being worked on. Aside from that, here's what I found out. And these could be subtle, so the priority of this task would not be so high.
_kernel is std::function (introduces indirection)std::bind is used (introduces another indirection)BinaryArithmeticOps has switch statement in itI don't know well but I think making _kenel be a raw function pointer(make nothing to be captured) and avoid using std::bind is better.
I tried some redesign in BinaryOps ( https://github.com/krayzemli/ONE/tree/OptimizedMinMax ) to reuse its vectorization in ElementwiseBinary Min/Max operations, but it only slowed everything down because of broadcast preprocessing overhead, and so I stopped working on it. I can try once again after PR #4611 is merged. In PR #4611, I've removed bind, but there is still _kernel std::function indirection to operation- and datatype- specific functor. So, one more level of indirection can be removed by introducing specialized derivations of BinaryArithmeticLayer class, but this will need introducing a factory function into KernelGenerator, which I believe is not a problem.
However, removing datatype-specific indirection will be an obstacle if we want to support dynamic data type in the future (for now, we do not), which seems to have been mentioned in #4625. There is also one indirection in cker, where BinaryBroadcastFiveFold takes operation- and activation- specific function pointers, which could be changed to templates. Well, I'm not certain whether inderections are really so evil on ARM which has branch prediction mechanism.
I think I could continue my redesign work on BinaryOps after #4611 is merged, given no one else is going to work on binary ops. There is also an issue #4540 related to binary ops, which is likely to interfere with my work if someone starts working on that issue. Also, there is now no support for int32 neon kernels, which would not be very different from float32 kernels.
However, removing datatype-specific indirection will be an obstacle if we want to support dynamic data type in the future (for now, we do not)
I have never thought about that before(We have it in nnfw api but it is just ignored, and that API had been introduced before I got responsible in it). Anyways if data type could be dynamic, it is no problem to have an indirection. I think it is just same with that we have an indirection for tensor buffers too. I'm more worried about std::bind and std::function overhead itself compared to raw function pointers, AFAIK they have more overhead like multiple indirections internally.
I think I could continue my redesign work on BinaryOps after #4611 is merged,
Please go on. 馃槃
Most helpful comment
I tried some redesign in BinaryOps ( https://github.com/krayzemli/ONE/tree/OptimizedMinMax ) to reuse its vectorization in ElementwiseBinary Min/Max operations, but it only slowed everything down because of broadcast preprocessing overhead, and so I stopped working on it. I can try once again after PR #4611 is merged. In PR #4611, I've removed
bind, but there is still_kernelstd::functionindirection to operation- and datatype- specific functor. So, one more level of indirection can be removed by introducing specialized derivations of BinaryArithmeticLayer class, but this will need introducing a factory function intoKernelGenerator, which I believe is not a problem.However, removing datatype-specific indirection will be an obstacle if we want to support dynamic data type in the future (for now, we do not), which seems to have been mentioned in #4625. There is also one indirection in
cker, whereBinaryBroadcastFiveFoldtakes operation- and activation- specific function pointers, which could be changed to templates. Well, I'm not certain whether inderections are really so evil on ARM which has branch prediction mechanism.I think I could continue my redesign work on BinaryOps after #4611 is merged, given no one else is going to work on binary ops. There is also an issue #4540 related to binary ops, which is likely to interfere with my work if someone starts working on that issue. Also, there is now no support for int32 neon kernels, which would not be very different from float32 kernels.