Keras: use multi-cores for keras cpu

Created on 20 Mar 2018  路  17Comments  路  Source: keras-team/keras

Dear all,
I would like to use 10 cores of cpu to run my model keras.
However, when I run my code, only two - three cpus are using 100%, the others is sleeping
screen shot 2018-03-20 at 11 54 19

Anyone know the way to distribute the work to all cores?
Thank you.

Most helpful comment

I am having the same issue under using different versions of python and keras in different machines.
I can't configure it to use all the resources.

All 17 comments

If you are using TF as the backend you can try the following:

config = tf.ConfigProto(device_count={"CPU": 8})
keras.backend.tensorflow_backend.set_session(tf.Session(config=config))

Let me know if that works

"8" means the number of distributed cores?

correct

I tried with

if options.nthread>0:
    config = tf.ConfigProto(device_count={"CPU": options.nthread})
    kr.backend.tensorflow_backend.set_session(tf.Session(config=config))
    sess = tf.Session(config=tf.ConfigProto(log_device_placement=True))
    print sess

but the result does not improve
screen shot 2018-03-20 at 22 28 04

Any leads on this problem?

any update? my code on aws deep learning machine also meet with this problem.

I am having the same issue under using different versions of python and keras in different machines.
I can't configure it to use all the resources.

Again same issue. Perhaps certain elements are single core only (eg convolution)?

Dear sir锛宧ow to solve this problem?@hainguyenct

I have the same problem.
I use Tensorflow 2.0 rc with the keras backend.
But I faced the same issue on Tensorflow 1.4 using Tensorflow only without Keras

Same problem here

I think I found solution. Keras actually uses multiple cores out of the box, but you may have a bottleneck in the generators.

Steps to fix this:

  1. Set workers=N parameter.
  2. Make your generators thread-safe.
  3. Don't use global variables in your generators (because of GIL).
  4. Ensure, your generators fast enough.

Example:

class DataPurpose(Enum):
    TRAINING = 1
    VALIDATION = 2


class RandomBlockSource:
    # You may use global variables only on an object instansiation.
    # If you need some global collection here, copy it.

    my_list = global_list.copy()

    def get_data(self, purpose: DataPurpose):
        pass  # TODO


def multithreaded_get_training_block(i):
    block_source = RandomBlockSource()
    return block_source.get_data(DataPurpose.TRAINING)


def multithreaded_get_validation_block(i):
    block_source = RandomBlockSource()
    return block_source.get_data(DataPurpose.VALIDATION)


@threadsafe_generator
def training_generator(batch_size):
    pool = ThreadPool()
    while (True):
        inputs, outputs = numpy.zeros((batch_size, INPUT_BLOCK_HEIGHT, INPUT_BLOCK_WIDTH, 1)), \
                          numpy.zeros((batch_size, OUTPUT_BLOCK_HEIGHT, OUTPUT_BLOCK_WIDTH, 1))

        # The bottleneck.
        pairs = pool.map(multithreaded_get_training_block, range(batch_size))

        for i, pair in enumerate(pairs):
            inputs[i][:, :, 0] = pair[0]
            outputs[i][:, :, 0] = pair[1]

        yield inputs, outputs


@threadsafe_generator
def validation_generator(batch_size):
    pool = ThreadPool()
    while (True):
        inputs, outputs = numpy.zeros((batch_size, INPUT_BLOCK_HEIGHT, INPUT_BLOCK_WIDTH, 1)), \
                          numpy.zeros((batch_size, OUTPUT_BLOCK_HEIGHT, OUTPUT_BLOCK_WIDTH, 1))

        # The bottleneck.
        pairs = pool.map(multithreaded_get_validation_block, range(batch_size))

        for i, pair in enumerate(pairs):
            inputs[i][:, :, 0] = pair[0]
            outputs[i][:, :, 0] = pair[1]

        yield inputs, outputs

history = model.fit_generator(
    training_generator(16), steps_per_epoch=200, nb_epoch=epochs,
    validation_data=validation_generator(8), validation_steps=4, validation_freq=1,
    workers=8)

threadsafe_generator decorator:
https://stackoverflow.com/questions/46509007/keras-thread-safe-generator-for-model-fit-generator-with-python-3-6-x

Also, instead of using complex generators, you may create an index on data set in a database and then:

select ... order by random() limit batch_size

Dear sir锛宧ow to solve this problem?@hainguyenct銆侷 have the same problem

tensorflow serving for keras may solve this problem

I can confirm that tensorflow does speed up the process, but the bottleneck remains.

Was this page helpful?
0 / 5 - 0 ratings

Related issues

rantsandruse picture rantsandruse  路  3Comments

amityaffliction picture amityaffliction  路  3Comments

snakeztc picture snakeztc  路  3Comments

fredtcaroli picture fredtcaroli  路  3Comments

nryant picture nryant  路  3Comments