Incubator-mxnet: asnumpy() of NDArray @cpu halted

Created on 4 Nov 2016  Â·  28Comments  Â·  Source: apache/incubator-mxnet

I am running example/rcnn/demo.py . It succedded on gpu. Then I try to run whole program on cpu. But the program halted at line:
scores = executor.output_dict['cls_prob_reshape_output'].asnumpy()[0]
in file rcnn/detector.py.
then I modified the line to :

     ```

scores_raw = executor.output_dict['cls_prob_reshape_output']
scores = scores_raw.asnumpy()[0]

it still halted at  scores_raw.asnumpy().
     I have made sample  code as following:

      ```
mx_x = mx.nd.ones((1,300,21))
      np_x = mx_x.asnumpy()[0] 

But it succeed. so what happens? does asnumpy() has some bug?

All 28 comments

By Debuging into the code , I found the code halted at
check_call(_LIB.MXNDArraySyncCopyToCPU(
self.handle,
data.ctypes.data_as(ctypes.c_void_p),
ctypes.c_size_t(data.size)))
in file ndarray.py.

I have same issues on https://github.com/dmlc/mxnet/issues/3684 also at _LIB.MXNDArraySyncCopyToCPU

rcnn is a big model. You need to wait longer

The copy size is about 300_21_4=24k, Gpu version is quick .For cpu version ,I waited more than half an hour...

@precedenceguo

asnumpy awaits engine for the actual computation.
Is you machine conducting heavy calculation while stuck at this line? If so, please use mxnet-notebook/predict_with_pretrained_model to test an image and see how long that takes?

Thank you .I checked the cpu status and it utilize almost 0 . I will try predict_with_pretrained_model 。

@javelinjs I guess the toArray method in Scala is same as asnumpy in python.

I moved context from mx.cpu(0) to mx.cpu(1) and the problem solved . It seems the bug is about thread synchronization.

@nopattern Have you tried to call predict in multi-thread?

@zihaolucky How to ?

@nopattern Python multi-thread, call model.predict and asnumpy. Or you could also use waitToRead after asnumpy in single thread.

I also met this problem.
I tried to bind two executors with partially shared parameters. If I use exe1.outputs[0].asnumpy() in the training iteration (to inspect whether there is any problem with the training setup), the training process freeze. If I comment out all asnumpy() lines, training goes smoothly. If I bind without any parameter shared, then there is no problem with asnumpy() call

I thought this issue is fixed in collection #4713 and pr #4528.

@zihaolucky Hi, How do you resolve this problem? I met the same occasion. Each time I call NDArray.toArray in scala, it will block at least 10 seconds when I run this on a cpu-Server with Centos. however, in my Macbook cpu only, its time cost can be noticed!

@maxenceliu Unfortunately, I haven't resolve this.

@zihaolucky So, you give up to deploy mxnet on Server at last and use another platform?

@maxenceliu I use the Scala package for a while in production. With version 0.7 it works okay. Or maybe you should try naive engine.

@zihaolucky Now I use version 1.10, problem appear again?!! What do you mean by naive engine?

@zihaolucky actually pre 0.9 version works okey. Seems there have been deadlock since NVVM refactor

@cloudhan Have you tried 0.10.1 version? Still will block for several seconds. It will call WaitToRead() when copy data to Cpu. I don't understand if all the dependency need to be wait for this copy.

@zihaolucky naive engine has not been implemented?

@maxenceliu nah, I lost the script and forgot how to reproduce...

I found I use the docker env in the server. I'm not sure if docker could lead to this problem?

I think I have resolved this problem but still don't know why. #7417

@maxenceliu
Could you list the system env, including the mx version you're using (you mentioned 1.10 and 0.10.1 but neither is a valid version number) or the git commit version?
Also please provide the shortest snippets that can reproduce the problem. I'll look into it.

I have meet the same problem .
in asnumpy (self=, data=)
ctypes.c_size_t(data.size)))

How about to use as_in_context(mx.cpu()) like this:
executor.output_dict['cls_prob_reshape_output'].as_in_context(mx.cpu()).asnumpy()

Was this page helpful?
0 / 5 - 0 ratings