Incubator-mxnet: How to disable MXNET_CUDNN_AUTOTUNE_DEFAULT and bucketing log message without turning off MXNET_CUDNN_AUTOTUNE_DEFAULT?

Created on 3 Oct 2017  路  19Comments  路  Source: apache/incubator-mxnet

16:49:37] src/operator/././cudnn_algoreg-inl.h:112: Running performance tests to find the best convolution algorithm, this can take a while... (setting env variable MXNET_CUDNN_AUTOTUNE_DEFAULT to 0 to disable)
[16:49:37] src/operator/././cudnn_algoreg-inl.h:117: If you see this message in the middle of training, you are probably using bucketing. Consider setting env variable MXNET_CUDNN_AUTOTUNE_DEFAULT to 0 to disabl
e cudnn tuning.
[16:49:42] src/operator/././cudnn_algoreg-inl.h:112: Running performance tests to find the best convolution algorithm, this can take a while... (setting env variable MXNET_CUDNN_AUTOTUNE_DEFAULT to 0 to disable)
[16:49:42] src/operator/././cudnn_algoreg-inl.h:117: If you see this message in the middle of training, you are probably using bucketing. Consider setting env variable MXNET_CUDNN_AUTOTUNE_DEFAULT to 0 to disabl
e cudnn tuning.
[16:49:42] src/operator/././cudnn_algoreg-inl.h:112: Running performance tests to find the best convolution algorithm, this can take a while... (setting env variable MXNET_CUDNN_AUTOTUNE_DEFAULT to 0 to disable)
[16:49:42] src/operator/././cudnn_algoreg-inl.h:117: If you see this message in the middle of training, you are probably using bucketing. Consider setting env variable MXNET_CUDNN_AUTOTUNE_DEFAULT to 0 to disabl
e cudnn tuning.
[16:49:42] src/operator/././cudnn_algoreg-inl.h:112: Running performance tests to find the best convolution algorithm, this can take a while... (setting env variable MXNET_CUDNN_AUTOTUNE_DEFAULT to 0 to disable)
[16:49:42] src/operator/././cudnn_algoreg-inl.h:117: If you see this message in the middle of training, you are probably using bucketing. Consider setting env variable MXNET_CUDNN_AUTOTUNE_DEFAULT to 0 to disabl
e cudnn tuning.
[16:49:45] src/operator/././cudnn_algoreg-inl.h:112: Running performance tests to find the best convolution algorithm, this can take a while... (setting env variable MXNET_CUDNN_AUTOTUNE_DEFAULT to 0 to disable)
[16:49:45] src/operator/././cudnn_algoreg-inl.h:117: If you see this message in the middle of training, you are probably using bucketing. Consider setting env variable MXNET_CUDNN_AUTOTUNE_DEFAULT to 0 to disabl
e cudnn tuning.
[16:49:47] src/operator/././cudnn_algoreg-inl.h:112: Running performance tests to find the best convolution algorithm, this can take a while... (setting env variable MXNET_CUDNN_AUTOTUNE_DEFAULT to 0 to disable)
[16:49:47] src/operator/././cudnn_algoreg-inl.h:117: If you see this message in the middle of training, you are probably using bucketing. Consider setting env variable MXNET_CUDNN_AUTOTUNE_DEFAULT to 0 to disabl
e cudnn tuning.
[16:49:49] src/operator/././cudnn_algoreg-inl.h:112: Running performance tests to find the best convolution algorithm, this can take a while... (setting env variable MXNET_CUDNN_AUTOTUNE_DEFAULT to 0 to disable)
[16:49:49] src/operator/././cudnn_algoreg-inl.h:117: If you see this message in the middle of training, you are probably using bucketing. Consider setting env variable MXNET_CUDNN_AUTOTUNE_DEFAULT to 0 to disabl
e cudnn tuning.
[16:49:49] src/operator/././cudnn_algoreg-inl.h:112: Running performance tests to find the best convolution algorithm, this can take a while... (setting env variable MXNET_CUDNN_AUTOTUNE_DEFAULT to 0 to disable)
[16:49:49] src/operator/././cudnn_algoreg-inl.h:117: If you see this message in the middle of training, you are probably using bucketing. Consider setting env variable MXNET_CUDNN_AUTOTUNE_DEFAULT to 0 to disabl
e cudnn tuning.
[16:49:51] src/operator/././cudnn_algoreg-inl.h:112: Running performance tests to find the best convolution algorithm, this can take a while... (setting env variable MXNET_CUDNN_AUTOTUNE_DEFAULT to 0 to disable)
[16:49:51] src/operator/././cudnn_algoreg-inl.h:117: If you see this message in the middle of training, you are probably using bucketing. Consider setting env variable MXNET_CUDNN_AUTOTUNE_DEFAULT to 0 to disabl
e cudnn tuning.
[16:49:52] src/operator/././cudnn_algoreg-inl.h:112: Running performance tests to find the best convolution algorithm, this can take a while... (setting env variable MXNET_CUDNN_AUTOTUNE_DEFAULT to 0 to disable)
[16:49:52] src/operator/././cudnn_algoreg-inl.h:117: If you see this message in the middle of training, you are probably using bucketing. Consider setting env variable MXNET_CUDNN_AUTOTUNE_DEFAULT to 0 to disabl
e cudnn tuning.
[16:49:52] src/operator/././cudnn_algoreg-inl.h:112: Running performance tests to find the best convolution algorithm, this can take a while... (setting env variable MXNET_CUDNN_AUTOTUNE_DEFAULT to 0 to disable)
[16:49:52] src/operator/././cudnn_algoreg-inl.h:117: If you see this message in the middle of training, you are probably using bucketing. Consider setting env variable MXNET_CUDNN_AUTOTUNE_DEFAULT to 0 to disabl
e cudnn tuning.
[16:49:53] src/operator/././cudnn_algoreg-inl.h:112: Running performance tests to find the best convolution algorithm, this can take a while... (setting env variable MXNET_CUDNN_AUTOTUNE_DEFAULT to 0 to disable)
[16:49:53] src/operator/././cudnn_algoreg-inl.h:117: If you see this message in the middle of training, you are probably using bucketing. Consider setting env variable MXNET_CUDNN_AUTOTUNE_DEFAULT to 0 to disabl
e cudnn tuning.
[16:49:54] src/operator/././cudnn_algoreg-inl.h:112: Running performance tests to find the best convolution algorithm, this can take a while... (setting env variable MXNET_CUDNN_AUTOTUNE_DEFAULT to 0 to disable)
[16:49:54] src/operator/././cudnn_algoreg-inl.h:117: If you see this message in the middle of training, you are probably using bucketing. Consider setting env variable MXNET_CUDNN_AUTOTUNE_DEFAULT to 0 to disabl
e cudnn tuning.
[16:49:55] src/operator/././cudnn_algoreg-inl.h:112: Running performance tests to find the best convolution algorithm, this can take a while... (setting env variable MXNET_CUDNN_AUTOTUNE_DEFAULT to 0 to disable)
[16:49:55] src/operator/././cudnn_algoreg-inl.h:117: If you see this message in the middle of training, you are probably using bucketing. Consider setting env variable MXNET_CUDNN_AUTOTUNE_DEFAULT to 0 to disabl
e cudnn tuning.
[16:49:55] src/operator/././cudnn_algoreg-inl.h:112: Running performance tests to find the best convolution algorithm, this can take a while... (setting env variable MXNET_CUDNN_AUTOTUNE_DEFAULT to 0 to disable)
[16:49:55] src/operator/././cudnn_algoreg-inl.h:117: If you see this message in the middle of training, you are probably using bucketing. Consider setting env variable MXNET_CUDNN_AUTOTUNE_DEFAULT to 0 to disabl
e cudnn tuning.

Bug Operator

Most helpful comment

Can you check in your model, if some convolution layer has cudnn_tune not set to off. If so, please update your model by setting all the cudnn_tune param all the layers to off. Thanks.

All 19 comments

I tried exporting or using os.environ but they don't work. The code i am using is https://github.com/pangyupo/mxnet_mtcnn_face_detection

7988 ought to do it.

@szha I don't quite understand how to use #7988. How should I solve the problem?

Sorry for the confusion. What I meant was that once that PR is merged the problem should be resolved. Let's wait for that PR to get in first.

I have the same problems for that.

@yxchng did you save this problem, i am also using mtcnn mxnet

export MXNET_CUDNN_AUTOTUNE_DEFALUT=0, it is ok

Hi there

I am on Ubuntu 16.04.3 using conda 4.4.2 and Python 2.7.14

I see the same error even after export MXNET_CUDNN_AUTOTUNE_DEFALUT=0

ubuntu@ip-10-0-0-14:$ export MXNET_CUDNN_AUTOTUNE_DEFALUT=0
ubuntu@ip-10-0-0-14:$ python fine_tune_vgg.py --vgg nets/vgg16 --checkpoints checkpoints --prefix vggnet
[20:56:31] src/io/iter_image_recordio_2.cc:153: ImageRecordIOParser2: ../datasets/rec/train.rec, use 1 threads for decoding..
[20:56:36] src/io/iter_image_recordio_2.cc:153: ImageRecordIOParser2: ../datasets/rec/val.rec, use 1 threads for decoding..
[INFO] loading pre-trained model...
[20:56:37] src/nnvm/legacy_json_util.cc:190: Loading symbol saved by previous version v0.8.0. Attempting to upgrade...
[20:56:37] src/nnvm/legacy_json_util.cc:198: Symbol successfully upgraded!
[INFO] training network...
[20:56:37] src/operator/././cudnn_algoreg-inl.h:112: Running performance tests to find the best convolution algorithm, this can take a while... (setting env variable MXNET_CUDNN_AUTOTUNE_DEFAULT to 0 to disable)

Any news on how to solve this?

@jrzaurin This bug should have already been fixed by the PR mentioned above. Which version of mxnet are you using? Would you be able to upgrade to 1.0?

@szha

In fact I am using

In [2]: mxnet.__version__
Out[2]: '0.11.0'

which I installed via:
pip install mxnet-cu80==0.11.0

(opencv and graphviz where already installed)

I upgraded to cuDNN v6.0 but I doubt that has something to do (?)

Overall: Ubuntu 16.04.3, conda 4.4.2, Python 2.7.14 and mxnet 0.11 and I see the same message (I am running it as I write and is there) and it runs very very slow on a tiny dataset (Stanford Cars Dataset)

so...any idea is appreciated, Thanks!

The latest version of mxnet is mxnet-cu80==1.0.0.post2. Would you be able to try out this version?

@szha

Hey thanks!

I did not know about the mxnet-cu80==1.0.0, updated on my p2 instance (Tesla K80) and my mac and the message is gone, so I assume that problem solved 馃憤

Still runs very slow, but I am sure this is something on my side...so, regarding to the MXNET_CUDNN_AUTOTUNE_DEFALUT message, the solution was to upgrade:

pip install mxnet-cu80==1.0.0

and the message disappeared

thanks!

EDIT
@szha

just updated to mxnet-cu80==1.0.1 and:

train_net.py:92: DeprecationWarning: mxnet.model.FeedForward has been deprecated. Please use mxnet.mod.Module instead.
  begin_epoch=10)
/home/ubuntu/anaconda2/lib/python2.7/site-packages/mxnet/model.py:573: DeprecationWarning: Calling initializer with init(str, NDArray) has been deprecated.please use init(mx.init.InitDesc(...), NDArray) instead.
  self.initializer(k, v)
/home/ubuntu/anaconda2/lib/python2.7/site-packages/mxnet/model.py:579: DeprecationWarning: Calling initializer with init(str, NDArray) has been deprecated.please use init(mx.init.InitDesc(...), NDArray) instead.
  self.initializer(k, v)
[16:21:55] src/operator/nn/./cudnn/./cudnn_algoreg-inl.h:107: Running performance tests to find the best convolution algorithm, this can take a while... (setting env variable MXNET_CUDNN_AUTOTUNE_DEFAULT to 0 to disable)

The first 3 lines are just DeprecationWarning, but the final one is again the MXNET_CUDNN_AUTOTUNE_DEFAULT message even though is set to 0. Anyway, All seems to work fine, and it runs in decent times.

I am just saying in case is useful.

@yxchng , is this still an issue for you?

@lanking520 nope thanks

I'm using mxnet 1.2.0, cuda 9.0 and cudnn 7.0, and still encounter this problem. So what's the final solution? (resetting MXNET_CUDNN_AUTOTUNE_DEFAULT doesn't work).
Thanks!

While auto tuning, it gives an error message and still running. Is this normal? mx 1.2.0, cuda 9.0, ubuntu 16.04

2605:2644:0418/091830.114475:ERROR:upload_data_presenter.cc(73)] Not implemented reached in virtual void extensions::RawDataPresenter::FeedNext(const net::UploadElementReader &)

After export MXNET_CUDNN_AUTOTUNE_DEFAULT = 0, this error was gone.

I have the same problems for now

The problem still persists after export MXNET_CUDNN_AUTOTUNE_DEFAULT = 0, running on mxnet 1.3.1, cuda 9.0, ubuntu 18.
Help would be really appreciated.

Can you check in your model, if some convolution layer has cudnn_tune not set to off. If so, please update your model by setting all the cudnn_tune param all the layers to off. Thanks.

Was this page helpful?
0 / 5 - 0 ratings

Related issues

Zhaoyang-XU picture Zhaoyang-XU  路  3Comments

dushoufu picture dushoufu  路  3Comments

sbodenstein picture sbodenstein  路  3Comments

Fzz123 picture Fzz123  路  3Comments

WangcsShuai picture WangcsShuai  路  3Comments