Darknet: Ampere sparsity

Created on 7 Oct 2020  路  2Comments  路  Source: AlexeyAB/darknet

Hi, any thoughts on the new sparsity features of the nvidia Ampere gpus? Looks like it could give a big speed improvement if applicable:

A100 whitepaper
GA102 whitepaper

Feature-request

Most helpful comment

Even without sparsity, for YOLOv4 the training will be 6x times faster on Ampere, since there is TF32 for all Ampere GPUs (RTX 3070 - 3090, Tesla A100). And inference will be 2x faster.

And yes, it seems Sparsity can be used for Pruning, just to prune (set to zero) the smalles 2 of 4 sequential weights values, and it should increase inference time 2x (so in total inference will be 4x times faster than by using Turing).

https://www.nvidia.com/content/dam/en-zz/Solutions/Data-Center/nvidia-ampere-architecture-whitepaper.pdf

NVIDIA has developed a simple and universal recipe for sparsifying deep neural networks for
inference using this 2:4 structured sparsity pattern. The network is first trained using dense
weights, then fine-grained structured pruning is applied, and finally the remaining non-zero
weights are fine-tuned with additional training steps. This method results in virtually no loss in
inferencing accuracy based on evaluation across dozens of networks spanning vision, object
detection, segmentation, natural language modeling, and translation.

image

All 2 comments

Even without sparsity, for YOLOv4 the training will be 6x times faster on Ampere, since there is TF32 for all Ampere GPUs (RTX 3070 - 3090, Tesla A100). And inference will be 2x faster.

And yes, it seems Sparsity can be used for Pruning, just to prune (set to zero) the smalles 2 of 4 sequential weights values, and it should increase inference time 2x (so in total inference will be 4x times faster than by using Turing).

https://www.nvidia.com/content/dam/en-zz/Solutions/Data-Center/nvidia-ampere-architecture-whitepaper.pdf

NVIDIA has developed a simple and universal recipe for sparsifying deep neural networks for
inference using this 2:4 structured sparsity pattern. The network is first trained using dense
weights, then fine-grained structured pruning is applied, and finally the remaining non-zero
weights are fine-tuned with additional training steps. This method results in virtually no loss in
inferencing accuracy based on evaluation across dozens of networks spanning vision, object
detection, segmentation, natural language modeling, and translation.

image

Cool. The fp16 performance doesn't seem to be that much better than a 2080ti (e.g. this bench on resnet but maybe when the tensorrt support comes out that will change? I don't know.

Was this page helpful?
0 / 5 - 0 ratings

Related issues

HanSeYeong picture HanSeYeong  路  3Comments

siddharth2395 picture siddharth2395  路  3Comments

bit-scientist picture bit-scientist  路  3Comments

yongcong1415 picture yongcong1415  路  3Comments

louisondumont picture louisondumont  路  3Comments