Incubator-mxnet: Multi-threaded execution leads to high CPU load

Created on 13 Mar 2019  路  3Comments  路  Source: apache/incubator-mxnet

In my code , I have five threads, for each thread I have generate a Executor just as follows:
std::unique_ptr exec
exec_.reset(network_.SimpleBind(ctx, args_,grad_store,grad_req_type));
but when I run the program,I found the cpu load is very high. I want to know Whether each thread generates a separate Executor engine锛宎nd each Executor engine generate many threads. and
if I want to just generate one Executor engine, and the threads use the same Executor engine, how to code in my program?

C++ Performance

Most helpful comment

@songziqin
This is based on my understanding of C++ API.
It is possible to create Executor engine in one thread and share it across many thread. However, the executor engine will run the forward pass on only one input at a time. That is, for a given input, the following 3 operations should be performed atomically for the correct inference

  1. Setting the input for the Executor
  2. Running the forward pass . Executor->Forward()
  3. Retrieving the output from the executor.

With this approach your application will be able to process multiple inputs but the inference operation will be serialized.

With some modifications the inception_inference.cpp example can be used to process multiple images. The object of Predictor class can be created by a single thread and shared across multiple threads. The calls to "PredictImage()" need to be synchronized so that Executor in Predictor object processes one image at a time.

I hope this helps.

@mxnet-label-bot add [Pending Requester Info]

All 3 comments

@mxnet-label-bot add [Performance, c++, question]

Adding labels for better visibility. You might try asking this question on https://discuss.mxnet.io/ instead. Questions there tend to get better visibility as github is primarily used to track issues.

@songziqin
This is based on my understanding of C++ API.
It is possible to create Executor engine in one thread and share it across many thread. However, the executor engine will run the forward pass on only one input at a time. That is, for a given input, the following 3 operations should be performed atomically for the correct inference

  1. Setting the input for the Executor
  2. Running the forward pass . Executor->Forward()
  3. Retrieving the output from the executor.

With this approach your application will be able to process multiple inputs but the inference operation will be serialized.

With some modifications the inception_inference.cpp example can be used to process multiple images. The object of Predictor class can be created by a single thread and shared across multiple threads. The calls to "PredictImage()" need to be synchronized so that Executor in Predictor object processes one image at a time.

I hope this helps.

@mxnet-label-bot add [Pending Requester Info]

@songziqin - I hope @leleamol response addressed your concern. Let us know if you still facing issue.

Closing the issue for now, please reopen if issue still exists.

Was this page helpful?
0 / 5 - 0 ratings

Related issues

dushoufu picture dushoufu  路  3Comments

xzqjack picture xzqjack  路  3Comments

ranti-iitg picture ranti-iitg  路  3Comments

Ajoo picture Ajoo  路  3Comments

qiliux picture qiliux  路  3Comments