Incubator-mxnet: BatchNorm with axis=-1 is much slower than axis=1

Created on 30 Jun 2020 · 4Comments · Source: apache/incubator-mxnet

Description

import mxnet as mx
from mxnet import autograd, np, npx, gluon, init
from mxnet.gluon import nn
import time

npx.set_np()

data = mx.np.random.uniform(size=(32, 100, 100), ctx=mx.gpu())
label = mx.np.ones((32, 100, 100), ctx=mx.gpu())
net = nn.Sequential()
net.add(nn.BatchNorm(axis=-1))
net.initialize(init.Xavier(), ctx=mx.gpu())
loss = gluon.loss.L2Loss()
t = time.time()
for _ in range(5000):
    with autograd.record():
        l = loss(net(data), label)
    l.backward()
mx.nd.waitall()
print('spent: {}s'.format(time.time() - t))

MXNet version: static build with branch v1.7x commit 75ab15569bd0f20a90806ce2fc38df08be208ed7
I got around 5 sec with axis=1 and 30 sec with axis=-1 on P3.8xlarge (V100).
Both of case are computing the 32 * 100 data for each axis
similar to https://github.com/apache/incubator-mxnet/issues/10095

Solution

Thanks @ptrendx to point out that cudnn 7.4 (https://docs.nvidia.com/deeplearning/sdk/cudnn-release-notes/rel_7xx.html#rel_741) added a new cudnnBatchNormalization*Ex API that gives much better speed for axis = -1

Bug Operator Performance

Source

stu1130

👍1

Most helpful comment

I think NHWC layout is very important in point cloud algorithms.

chinakook on 6 Jul 2020

👍2

All 4 comments

The reason is that MKLDNN and CuDNN are only applied when axis = 1.
The open PR https://github.com/apache/incubator-mxnet/pull/18504 fixes it.

However, we will replace mkldnn_off and cudnn_off attributes with environment variables, so the PR is blocked.

wkcn on 1 Jul 2020

@wkcn Thanks for you detailed explanation.
So I think there are two phrases.

enable cuDNN when axis is not 1
use cudnnBatchNormalizationForwardTrainingEx for NHWC case (I checked the source code, we are all using cudnnBatchNormalizationForwardTraining)

stu1130 on 1 Jul 2020

👍1

I think NHWC layout is very important in point cloud algorithms.

chinakook on 6 Jul 2020

👍2

I have verified the performance is almost the same after the fix https://github.com/apache/incubator-mxnet/pull/18504. Close the issue

stu1130 on 9 Jul 2020

🎉1

Was this page helpful?

0 / 5 - 0 ratings