Darknet: Ampere sparsity

Created on 7 Oct 2020 · 2Comments · Source: AlexeyAB/darknet

Hi, any thoughts on the new sparsity features of the nvidia Ampere gpus? Looks like it could give a big speed improvement if applicable:

A100 whitepaper
GA102 whitepaper

Feature-request

Source

JoeCool90

Most helpful comment

Even without sparsity, for YOLOv4 the training will be 6x times faster on Ampere, since there is TF32 for all Ampere GPUs (RTX 3070 - 3090, Tesla A100). And inference will be 2x faster.

And yes, it seems Sparsity can be used for Pruning, just to prune (set to zero) the smalles 2 of 4 sequential weights values, and it should increase inference time 2x (so in total inference will be 4x times faster than by using Turing).

https://www.nvidia.com/content/dam/en-zz/Solutions/Data-Center/nvidia-ampere-architecture-whitepaper.pdf

NVIDIA has developed a simple and universal recipe for sparsifying deep neural networks for
inference using this 2:4 structured sparsity pattern. The network is first trained using dense
weights, then fine-grained structured pruning is applied, and finally the remaining non-zero
weights are fine-tuned with additional training steps. This method results in virtually no loss in
inferencing accuracy based on evaluation across dozens of networks spanning vision, object
detection, segmentation, natural language modeling, and translation.

AlexeyAB on 7 Oct 2020

👍3

All 2 comments

Even without sparsity, for YOLOv4 the training will be 6x times faster on Ampere, since there is TF32 for all Ampere GPUs (RTX 3070 - 3090, Tesla A100). And inference will be 2x faster.

https://www.nvidia.com/content/dam/en-zz/Solutions/Data-Center/nvidia-ampere-architecture-whitepaper.pdf

NVIDIA has developed a simple and universal recipe for sparsifying deep neural networks for
inference using this 2:4 structured sparsity pattern. The network is first trained using dense
weights, then fine-grained structured pruning is applied, and finally the remaining non-zero
weights are fine-tuned with additional training steps. This method results in virtually no loss in
inferencing accuracy based on evaluation across dozens of networks spanning vision, object
detection, segmentation, natural language modeling, and translation.

AlexeyAB on 7 Oct 2020

👍3

Cool. The fp16 performance doesn't seem to be that much better than a 2080ti (e.g. this bench on resnet but maybe when the tensorrt support comes out that will change? I don't know.

JoeCool90 on 7 Oct 2020

Was this page helpful?

0 / 5 - 0 ratings