For bugs or installation issues, please provide the following information.
The more information you provide, the more likely people will be able to help you.
Operating System: ubuntu16.04
Package used (Python/R/Scala/Julia):python3
MXNet version:0.9.5
Or if installed from source:yes
[14:45:39] src/operator/././cudnn_algoreg-inl.h:65: Running performance tests to find the best convolution algorithm, this can take a while... (setting env variable MXNET_CUDNN_AUTOTUNE_DEFAULT to 0 to disable)
[14:45:39] /home/rincy/mxnet-master/dmlc-core/include/dmlc/./logging.h:304: [14:45:39] /home/rincy/mxnet-master/mshadow/mshadow/././././cuda/tensor_gpu-inl.cuh:58: too large launch parameter: MapReduceKeepDim1[851968,1], [256,1,1]
Stack trace returned 10 entries:
[bt] (0) /usr/local/lib/python3.5/dist-packages/mxnet-0.9.5-py3.5.egg/mxnet/libmxnet.so(_ZN4dmlc15LogMessageFatalD1Ev+0x3c) [0x7f9905d2098c]
[bt] (1) /usr/local/lib/python3.5/dist-packages/mxnet-0.9.5-py3.5.egg/mxnet/libmxnet.so(+0x1953772) [0x7f99070b6772]
[bt] (2) /usr/local/lib/python3.5/dist-packages/mxnet-0.9.5-py3.5.egg/mxnet/libmxnet.so(_ZN5mxnet2op14ReduceToAssignIN7mshadow3red3sumENS2_3gpuENS2_6TensorIS5_Li2EfEEfEEvNS6_IT0_Li2ET2_EENS_9OpReqTypeERKT1_+0x449) [0x7f990717e019]
[bt] (3) /usr/local/lib/python3.5/dist-packages/mxnet-0.9.5-py3.5.egg/mxnet/libmxnet.so(_ZN5mxnet2op21ReduceAxesComputeImplIN7mshadow3gpuENS2_3red3sumELb0EEEvRKN4nnvm9NodeAttrsERKNS_9OpContextERKSt6vectorINS_5TBlobESaISE_EERKSD_INS_9OpReqTypeESaISJ_EESI_RKNS6_6TShapeE+0x883) [0x7f9907190613]
[bt] (4) /usr/local/lib/python3.5/dist-packages/mxnet-0.9.5-py3.5.egg/mxnet/libmxnet.so(_ZN5mxnet2op17ReduceAxesComputeIN7mshadow3gpuENS2_3red3sumELb0EEEvRKN4nnvm9NodeAttrsERKNS_9OpContextERKSt6vectorINS_5TBlobESaISE_EERKSD_INS_9OpReqTypeESaISJ_EESI_+0xb8) [0x7f990727ec48]
[bt] (5) /usr/local/lib/python3.5/dist-packages/mxnet-0.9.5-py3.5.egg/mxnet/libmxnet.so(+0x13defa8) [0x7f9906b41fa8]
[bt] (6) /usr/local/lib/python3.5/dist-packages/mxnet-0.9.5-py3.5.egg/mxnet/libmxnet.so(_ZNSt17_Function_handlerIFvN5mxnet10RunContextENS0_6engine18CallbackOnCompleteEEZNS2_11NaiveEngine4PushEPNS2_3OprENS0_7ContextEibEUlS1_S3_E_E9_M_invokeERKSt9_Any_dataOS1_OS3_+0x51) [0x7f9906ab35c1]
[bt] (7) /usr/local/lib/python3.5/dist-packages/mxnet-0.9.5-py3.5.egg/mxnet/libmxnet.so(_ZN5mxnet6engine11NaiveEngine9PushAsyncESt8functionIFvNS_10RunContextENS0_18CallbackOnCompleteEEENS_7ContextERKSt6vectorIPNS0_3VarESaISA_EESE_NS_10FnPropertyEiPKc+0x748) [0x7f9906abc5d8]
[bt] (8) /usr/local/lib/python3.5/dist-packages/mxnet-0.9.5-py3.5.egg/mxnet/libmxnet.so(_ZN5mxnet6engine11NaiveEngine4PushEPNS0_3OprENS_7ContextEib+0x8f) [0x7f9906abd94f]
[bt] (9) /usr/local/lib/python3.5/dist-packages/mxnet-0.9.5-py3.5.egg/mxnet/libmxnet.so(_ZN5mxnet4exec13GraphExecutor6RunOpsEbmm+0x1f9) [0x7f9906b43da9]
Traceback (most recent call last):
File "/home/rincy/PycharmProjects/ClassifyEvent/mxmodel/dcnn_train.py", line 168, in <module>
train_model(args, ctx)
File "/home/rincy/PycharmProjects/ClassifyEvent/mxmodel/dcnn_train.py", line 157, in train_model
num_epoch=args.num_epoch)
File "/usr/local/lib/python3.5/dist-packages/mxnet-0.9.5-py3.5.egg/mxnet/module/base_module.py", line 472, in fit
self.forward_backward(data_batch)
File "/usr/local/lib/python3.5/dist-packages/mxnet-0.9.5-py3.5.egg/mxnet/module/base_module.py", line 193, in forward_backward
self.forward(data_batch, is_train=True)
File "/usr/local/lib/python3.5/dist-packages/mxnet-0.9.5-py3.5.egg/mxnet/module/bucketing_module.py", line 390, in forward
self._curr_module.forward(data_batch, is_train=is_train)
File "/usr/local/lib/python3.5/dist-packages/mxnet-0.9.5-py3.5.egg/mxnet/module/module.py", line 538, in forward
self._exec_group.forward(data_batch, is_train)
File "/usr/local/lib/python3.5/dist-packages/mxnet-0.9.5-py3.5.egg/mxnet/module/executor_group.py", line 386, in forward
exec_.forward(is_train=is_train)
File "/usr/local/lib/python3.5/dist-packages/mxnet-0.9.5-py3.5.egg/mxnet/executor.py", line 133, in forward
ctypes.c_int(int(is_train))))
File "/usr/local/lib/python3.5/dist-packages/mxnet-0.9.5-py3.5.egg/mxnet/base.py", line 84, in check_call
raise MXNetError(py_str(_LIB.MXGetLastError()))
mxnet.base.MXNetError: [14:45:39] /home/rincy/mxnet-master/mshadow/mshadow/././././cuda/tensor_gpu-inl.cuh:58: too large launch parameter: MapReduceKeepDim1[851968,1], [256,1,1]
the code is here:
https://github.com/R1ncy/issue/blob/master/gpu_iss.py
when i set ctx=mx.cpu(0), everything works fine and I can get the f1 score after every epoch.
but when i chage to ctx=mx.gpu(), this error occurs.
I run the example/cnn_text_classification with mx.gpu() to make sure whether the gpu version build successfully. And the outputs show it runs well.
[16:24:17] src/operator/././cudnn_algoreg-inl.h:65: Running performance tests to find the best convolution algorithm, this can take a while... (setting env variable MXNET_CUDNN_AUTOTUNE_DEFAULT to 0 to disable)
/home/rincy/mxnet-master/example/cnn_text_classification/text_cnn.py:93: DeprecationWarning: Calling initializer with init(str, NDArray) has been deprecated.please use init(mx.init.InitDesc(...), NDArray) instead.
initializer(name, arg_dict[name])
Iter [0] Train: Time: 5.578s, Training Accuracy: 56.155 --- Dev Accuracy thus far: 64.100
Iter [1] Train: Time: 5.563s, Training Accuracy: 72.021 --- Dev Accuracy thus far: 70.500
Iter [2] Train: Time: 5.535s, Training Accuracy: 81.119 --- Dev Accuracy thus far: 74.300
Could you please give me some advice to solve this problem?
Thanks a lot
@reminisce Could you take a look?
Replacing all ReduceToAssign with Reduce should solve this.
@piiswrong Ok, I will replace it after my current work is done.
hi, @reminisce
Is there any progress on this issue? Thanks for your help!
@R1ncy I am working on something else these days. I should be able to get started on this issue in a day or two.
solved
thank you all @piiswrong @reminisce :) TVT
Most helpful comment
@R1ncy I am working on something else these days. I should be able to get started on this issue in a day or two.