In my code , I have five threads, for each thread I have generate a Executor just as follows:
std::unique_ptr
exec_.reset(network_.SimpleBind(ctx, args_,grad_store,grad_req_type));
but when I run the program,I found the cpu load is very high. I want to know Whether each thread generates a separate Executor engine锛宎nd each Executor engine generate many threads. and
if I want to just generate one Executor engine, and the threads use the same Executor engine, how to code in my program?
@mxnet-label-bot add [Performance, c++, question]
Adding labels for better visibility. You might try asking this question on https://discuss.mxnet.io/ instead. Questions there tend to get better visibility as github is primarily used to track issues.
@songziqin
This is based on my understanding of C++ API.
It is possible to create Executor engine in one thread and share it across many thread. However, the executor engine will run the forward pass on only one input at a time. That is, for a given input, the following 3 operations should be performed atomically for the correct inference
With this approach your application will be able to process multiple inputs but the inference operation will be serialized.
With some modifications the inception_inference.cpp example can be used to process multiple images. The object of Predictor class can be created by a single thread and shared across multiple threads. The calls to "PredictImage()" need to be synchronized so that Executor in Predictor object processes one image at a time.
I hope this helps.
@mxnet-label-bot add [Pending Requester Info]
@songziqin - I hope @leleamol response addressed your concern. Let us know if you still facing issue.
Closing the issue for now, please reopen if issue still exists.
Most helpful comment
@songziqin
This is based on my understanding of C++ API.
It is possible to create Executor engine in one thread and share it across many thread. However, the executor engine will run the forward pass on only one input at a time. That is, for a given input, the following 3 operations should be performed atomically for the correct inference
With this approach your application will be able to process multiple inputs but the inference operation will be serialized.
With some modifications the inception_inference.cpp example can be used to process multiple images. The object of Predictor class can be created by a single thread and shared across multiple threads. The calls to "PredictImage()" need to be synchronized so that Executor in Predictor object processes one image at a time.
I hope this helps.
@mxnet-label-bot add [Pending Requester Info]