Hi! there.
Consider this case:
It takes T1 seconds to execute an operator mx.nd.some_op.
I have the code:
a = mx.nd.some_op(xxxx)
time.sleep(T2)
print(a.asnumpy())
Because of lazy evaluation, the excution time is (T1 + T2) seconds.
Using wait_to_read is unuseful to address the issue, since wait_to_read will block until the computation finishes.
Is there any solution to disable lazy evaluation, and execute the operator immediatly? Thank you!
Hey, this is the MXNet Label Bot.
Thank you for submitting the issue! I will try and suggest some labels so that the appropriate MXNet community members can help resolve it.
Here are my recommended labels: Feature
I wrote a test case:
import mxnet as mx
import time
N = 4000
a = mx.nd.zeros((N, N))
b = mx.nd.zeros((N, N))
while 1:
tic = time.time()
c = mx.nd.dot(a, b)
# time.sleep(5)
tic2 = time.time()
c.wait_to_read()
print("wait_to_read", time.time() - tic2)
print(time.time() - tic)
The output is as follow:
('wait_to_read', 2.236266851425171)
2.23689508438
('wait_to_read', 2.1589441299438477)
2.15961289406
('wait_to_read', 2.1335580348968506)
2.13375878334
It takes about 2 seconds to execute all operators.
However, after uncommenting time.sleep(5), there is the output:
('wait_to_read', 6.985664367675781e-05)
5.00382494926
('wait_to_read', 2.384185791015625e-05)
5.00761890411
('wait_to_read', 2.5987625122070312e-05)
5.0087518692
It shows that wait_to_read spent 2e-5 seconds.
When does the operators execute?
@wkcn So it seemed the concept of Lazy evaluation is not totally applicable to all ops based on the fact that op get executed even if time.time() is called. Try to adjust the wait time to see what happened.
MXNET_ENGINE_TYPE=NaiveEngine?
Test Code:
import mxnet as mx
import time
N = 4000
a = mx.nd.zeros((N, N))
b = mx.nd.zeros((N, N))
while 1:
tic = time.time()
c = mx.nd.dot(a, b)
time.sleep(5)
tic2 = time.time()
c.wait_to_read()
print("wait_to_read", time.time() - tic2)
print(time.time() - tic)
Engine|calling time.sleep(5)|wait_to_read time|total time|
----------|----|----|----
ThreadedEnginePerDevice|Yes|4e-5|5.01
ThreadedEnginePerDevice|No|0.705|0.708
NaiveEngine|Yes|4e-5|5.7
NaiveEngine|No|5.96e-6|0.68
@wkcn I am not sure I understand your question. As you showed from your numbers the op seems to be executing parallelly with the sleep thread. So when you execute op(T1) + sleep(T2) you dont see T1 + T2 but see somewhere close to T2 if T2 > T1. For example op execution time without sleep was 2 seconds. op execution time with sleep was 5 seconds. Also, op execution without sleep was 0.708 seconds and with 5 seconds sleep was close to 5 seconds. Are you asking if you can somehow avoid wait_to_read call. Yes you can avoid it but there is no guarantee that computation has finished unless you are using naive engine.
@anirudh2290 Yes. Before my experiment, I was wrong to think the total time was op(T1) + sleep(T2).
If the total time is T1 + T2, we should disable lazy evaluation to accelerate the execution.
However, I did an experiment, and found that the operator execute parallelly with the sleep thread. It is confusing. I do not know when the operator executes.
It is interesting that the operator will be executed immediatly, and there is no lazy evaluation for mx.nd.dot.
Close the issue.
@lanking520 @eric-haibin-lin @anirudh2290
Thank you so much!
@wkcn I would rather say the operators are executed eagerly. The number of ops executed in parallel is controlled by MXNET_GPU_WORKER_NTHREADS and MXNET_CPU_WORKER_NTHREADS https://github.com/apache/incubator-mxnet/blob/master/docs/faq/env_var.md
@eric-haibin-lin Thank you!