I implemented mobilenet but forward time looks slow in GPU. I use grouped conv to implement depthwise conv, set group equal to number of feature maps. The computation reduced by depthwise conv dosn't make inference time decrease. The mobilenet impemented by TF seem to be much fasterhttps://github.com/Zehaos/MobileNet,
I encounter the same problem, will MxNet add a more efficient group convolutions implementation ?
I encounter the same problem, will MxNet add a more efficient group convolutions implementation ? I found tf can run mobilenet 0.059s/image in cpu !!!
mark
I encountered the same problem. The implementation of grouped convolutions is slow.
Generally speaking, group implementation is based one or more GEMM ( matrix multiply, one implementation way of convolution ). For example, if group_num = 1 (default situation), then the total execution times of GEMM equals one. However, when group_num = 2, then the execution times of GEMM equals two.
I don't read MXNet code of low API, so Is MXNet as I said above, which shoud read low-level codes. So if you want to implement a fast depth-wise conv, maybe you should write a low-level depth-wise operator, but not python codes.
This issue is closed due to lack of activity in the last 90 days. Feel free to ping me to reopen if this is still an active issue. Thanks!
Also, do please check out our forum (and Chinese version) for general "how-to" questions.
For the record, we now have an efficient depthwise conv implementation when you set num_filters to num_groups
Most helpful comment
For the record, we now have an efficient depthwise conv implementation when you set num_filters to num_groups