On every script run, I get the CUDNN convolution optimization algorithm running. This can take a few seconds, I wonder if we could cache the result locally based on a hash of MXNet + CUDA + CUDNN version for each device ID (or whatever could cause a change in algorithm selection) ?
[20:48:19] src/operator/nn/./cudnn/./cudnn_algoreg-inl.h:107: Running performance tests to find the best convolution algorithm, this can take a while... (setting env variable MXNET_CUDNN_AUTOTUNE_DEFAULT to 0 to disable
@eric-haibin-lin : Please label : CUDA, Feature
Just want to +1. I've talked to quite a few MXNet users who could really use this functionality.
+1
Any news?
My team has an implementation of this in a fork. We'll try and contribute it back, but no promises on a timeline.
Any updates @KellenSunderland? That feature sounds very useful.
@KellenSunderland would love to know more, this is very relevant to us.
+1
2 years has past..
May be fixed as part of cudnn 8 integration? https://docs.nvidia.com/deeplearning/sdk/cudnn-api/index.html
cc @DickJC123
Most helpful comment
My team has an implementation of this in a fork. We'll try and contribute it back, but no promises on a timeline.