Apex: Can not use tensor cores

Created on 26 Mar 2019  路  8Comments  路  Source: NVIDIA/apex

Hi ,
I am on an ubuntu machine with a 2080Ti using cuda 10.0,cuddn 7.4, python3.7 ,pytorch1.0.1 and ubuntu 16.04
I converted the model to use the tensorcore using amp module as specified by this example:

https://nvidia.github.io/apex/amp.html

but when i run my python program using the profiler nvprof as specified here
https://devtalk.nvidia.com/default/topic/1047165/how-to-confirm-whether-tensor-core-is-working-or-not-/

i get :


No events/metrics were profiled.


which as stated by modertator should not occur if my tensorcores were being used.
Can anyone help me why this is happening ?
any help is appreciated
Thanks

Most helpful comment

Convolutions:
For cudnn versions 7.2 and ealier, @vaibhav0195 is correct: input channels, output channels, and batch size should be multiples of 8 to use tensor cores. However, this requirement is lifted for cudnn versions 7.3 and later. For cudnn 7.3 and later, you don't need to worry about making your channels/batch size multiples of 8 to enable Tensor Core use.

GEMMs (fully connected layers):
For matrix A x matrix B, where A has size [I, J] and B has size [J, K], I, J, and K must be multiples of 8 to use Tensor Cores. This requirement exists for all cublas and cudnn versions. This means that for bare fully connected layers, the batch size, input features, and output features must be multiples of 8, and for RNNs, you usually (but not always, it can be architecture-dependent depending on what you use for encoder/decoder) need to have batch size, hidden size, embedding size, and dictionary size as multiples of 8.

All 8 comments

What was the command line you used to run your script under nvprof?

/usr/local/cuda/bin/nvprof --kernels compute_gemm --metrics tensor_precision_fu_utilization,tensor_int_fu_utilization python myscript.py

Hi, @vaibhav0195, @mcarilli, must we change all the length (N, C, H, W) of a tensor so that they can be divided by 8 before we can make use of tensor cores?

@mcarilli i think just the input and output channels of the conv and the batch sizes should do the trick.

Convolutions:
For cudnn versions 7.2 and ealier, @vaibhav0195 is correct: input channels, output channels, and batch size should be multiples of 8 to use tensor cores. However, this requirement is lifted for cudnn versions 7.3 and later. For cudnn 7.3 and later, you don't need to worry about making your channels/batch size multiples of 8 to enable Tensor Core use.

GEMMs (fully connected layers):
For matrix A x matrix B, where A has size [I, J] and B has size [J, K], I, J, and K must be multiples of 8 to use Tensor Cores. This requirement exists for all cublas and cudnn versions. This means that for bare fully connected layers, the batch size, input features, and output features must be multiples of 8, and for RNNs, you usually (but not always, it can be architecture-dependent depending on what you use for encoder/decoder) need to have batch size, hidden size, embedding size, and dictionary size as multiples of 8.

@mcarilli Thank you for your clear explanation.

It may also help to set
torch.backends.cudnn.benchmark=True
at the top of your script, which enables pytorch鈥榮 autotuner. Each time pytorch encounters a new set of convolution parameters, it will test all available cudnn algorithms to find the fastest one, then cache that choice to reuse whenever it encounters the same set of convolution parameters again. The first iteration of your network will be slower as pytorch tests all the cudnn algorithms for each convolution, but the second iteration and later iterations will likely be faster.

ers, it will test all available cudnn algorithms to find the fastest one, then cache that choice to reuse whenever it encounters the same set of convolution parameters again. The first iteration of your network will be slower as pyt

Hi, thanks for your detailed explanation. Is the command to set autotoner
torch.backends.cudnn.benchmark=True
specific for Apex? Can we use it in more general cases?
Thanks.

Was this page helpful?
0 / 5 - 0 ratings