I am running example/rcnn/demo.py . It succedded on gpu. Then I try to run whole program on cpu. But the program halted at line:
scores = executor.output_dict['cls_prob_reshape_output'].asnumpy()[0]
in file rcnn/detector.py.
then I modified the line to :
```
scores_raw = executor.output_dict['cls_prob_reshape_output']
scores = scores_raw.asnumpy()[0]
it still halted at scores_raw.asnumpy().
I have made sample code as following:
```
mx_x = mx.nd.ones((1,300,21))
np_x = mx_x.asnumpy()[0]
But it succeed. so what happens? does asnumpy() has some bug?
By Debuging into the code , I found the code halted at
check_call(_LIB.MXNDArraySyncCopyToCPU(
self.handle,
data.ctypes.data_as(ctypes.c_void_p),
ctypes.c_size_t(data.size)))
in file ndarray.py.
I have same issues on https://github.com/dmlc/mxnet/issues/3684 also at _LIB.MXNDArraySyncCopyToCPU
rcnn is a big model. You need to wait longer
The copy size is about 300_21_4=24k, Gpu version is quick .For cpu version ,I waited more than half an hour...
@precedenceguo
asnumpy awaits engine for the actual computation.
Is you machine conducting heavy calculation while stuck at this line? If so, please use mxnet-notebook/predict_with_pretrained_model to test an image and see how long that takes?
Thank you .I checked the cpu status and it utilize almost 0 . I will try predict_with_pretrained_model 。
@javelinjs I guess the toArray method in Scala is same as asnumpy in python.
I moved context from mx.cpu(0) to mx.cpu(1) and the problem solved . It seems the bug is about thread synchronization.
@nopattern Have you tried to call predict in multi-thread?
@zihaolucky How to ?
@nopattern Python multi-thread, call model.predict and asnumpy. Or you could also use waitToRead after asnumpy in single thread.
I also met this problem.
I tried to bind two executors with partially shared parameters. If I use exe1.outputs[0].asnumpy() in the training iteration (to inspect whether there is any problem with the training setup), the training process freeze. If I comment out all asnumpy() lines, training goes smoothly. If I bind without any parameter shared, then there is no problem with asnumpy() call
I thought this issue is fixed in collection #4713 and pr #4528.
@zihaolucky Hi, How do you resolve this problem? I met the same occasion. Each time I call NDArray.toArray in scala, it will block at least 10 seconds when I run this on a cpu-Server with Centos. however, in my Macbook cpu only, its time cost can be noticed!
@maxenceliu Unfortunately, I haven't resolve this.
@zihaolucky So, you give up to deploy mxnet on Server at last and use another platform?
@maxenceliu I use the Scala package for a while in production. With version 0.7 it works okay. Or maybe you should try naive engine.
@zihaolucky Now I use version 1.10, problem appear again?!! What do you mean by naive engine?
@zihaolucky actually pre 0.9 version works okey. Seems there have been deadlock since NVVM refactor
@cloudhan Have you tried 0.10.1 version? Still will block for several seconds. It will call WaitToRead() when copy data to Cpu. I don't understand if all the dependency need to be wait for this copy.
@zihaolucky naive engine has not been implemented?
@maxenceliu nah, I lost the script and forgot how to reproduce...
I found I use the docker env in the server. I'm not sure if docker could lead to this problem?
I think I have resolved this problem but still don't know why. #7417
@maxenceliu
Could you list the system env, including the mx version you're using (you mentioned 1.10 and 0.10.1 but neither is a valid version number) or the git commit version?
Also please provide the shortest snippets that can reproduce the problem. I'll look into it.
I have meet the same problem .
in asnumpy (self=
ctypes.c_size_t(data.size)))
How about to use as_in_context(mx.cpu()) like this:
executor.output_dict['cls_prob_reshape_output'].as_in_context(mx.cpu()).asnumpy()