Keras: use multi-cores for keras cpu

Created on 20 Mar 2018 · 17Comments · Source: keras-team/keras

Dear all,
I would like to use 10 cores of cpu to run my model keras.
However, when I run my code, only two - three cpus are using 100%, the others is sleeping
screen shot 2018-03-20 at 11 54 19

Anyone know the way to distribute the work to all cores?
Thank you.

Source

hainguyenct

👍11

Most helpful comment

I am having the same issue under using different versions of python and keras in different machines.
I can't configure it to use all the resources.

ivanlen on 30 Oct 2018

👍7

All 17 comments

If you are using TF as the backend you can try the following:

config = tf.ConfigProto(device_count={"CPU": 8})
keras.backend.tensorflow_backend.set_session(tf.Session(config=config))

Let me know if that works

frankgh on 20 Mar 2018

You could also try this:
https://stackoverflow.com/questions/41588383/how-to-run-keras-on-multiple-cores

frankgh on 20 Mar 2018

"8" means the number of distributed cores?

hainguyenct on 20 Mar 2018

correct

frankgh on 20 Mar 2018

I tried with

if options.nthread>0:
    config = tf.ConfigProto(device_count={"CPU": options.nthread})
    kr.backend.tensorflow_backend.set_session(tf.Session(config=config))
    sess = tf.Session(config=tf.ConfigProto(log_device_placement=True))
    print sess

but the result does not improve
screen shot 2018-03-20 at 22 28 04

hainguyenct on 20 Mar 2018

👍6

Any leads on this problem?

SamTube405 on 15 Jun 2018

👍2

any update? my code on aws deep learning machine also meet with this problem.

sicklife on 14 Sep 2018

👍3

I am having the same issue under using different versions of python and keras in different machines.
I can't configure it to use all the resources.

ivanlen on 30 Oct 2018

👍7

Again same issue. Perhaps certain elements are single core only (eg convolution)?

tuskiomi on 21 Jun 2019

Dear sir，how to solve this problem?@hainguyenct

wangyexiang on 12 Aug 2019

I have the same problem.
I use Tensorflow 2.0 rc with the keras backend.
But I faced the same issue on Tensorflow 1.4 using Tensorflow only without Keras

Jonathanpro on 4 Sep 2019

Same problem here

Bee-zest on 26 Sep 2019

I think I found solution. Keras actually uses multiple cores out of the box, but you may have a bottleneck in the generators.

Steps to fix this:

Set workers=N parameter.
Make your generators thread-safe.
Don't use global variables in your generators (because of GIL).
Ensure, your generators fast enough.

Example:

class DataPurpose(Enum):
    TRAINING = 1
    VALIDATION = 2


class RandomBlockSource:
    # You may use global variables only on an object instansiation.
    # If you need some global collection here, copy it.

    my_list = global_list.copy()

    def get_data(self, purpose: DataPurpose):
        pass  # TODO


def multithreaded_get_training_block(i):
    block_source = RandomBlockSource()
    return block_source.get_data(DataPurpose.TRAINING)


def multithreaded_get_validation_block(i):
    block_source = RandomBlockSource()
    return block_source.get_data(DataPurpose.VALIDATION)


@threadsafe_generator
def training_generator(batch_size):
    pool = ThreadPool()
    while (True):
        inputs, outputs = numpy.zeros((batch_size, INPUT_BLOCK_HEIGHT, INPUT_BLOCK_WIDTH, 1)), \
                          numpy.zeros((batch_size, OUTPUT_BLOCK_HEIGHT, OUTPUT_BLOCK_WIDTH, 1))

        # The bottleneck.
        pairs = pool.map(multithreaded_get_training_block, range(batch_size))

        for i, pair in enumerate(pairs):
            inputs[i][:, :, 0] = pair[0]
            outputs[i][:, :, 0] = pair[1]

        yield inputs, outputs


@threadsafe_generator
def validation_generator(batch_size):
    pool = ThreadPool()
    while (True):
        inputs, outputs = numpy.zeros((batch_size, INPUT_BLOCK_HEIGHT, INPUT_BLOCK_WIDTH, 1)), \
                          numpy.zeros((batch_size, OUTPUT_BLOCK_HEIGHT, OUTPUT_BLOCK_WIDTH, 1))

        # The bottleneck.
        pairs = pool.map(multithreaded_get_validation_block, range(batch_size))

        for i, pair in enumerate(pairs):
            inputs[i][:, :, 0] = pair[0]
            outputs[i][:, :, 0] = pair[1]

        yield inputs, outputs

history = model.fit_generator(
    training_generator(16), steps_per_epoch=200, nb_epoch=epochs,
    validation_data=validation_generator(8), validation_steps=4, validation_freq=1,
    workers=8)

threadsafe_generator decorator:
https://stackoverflow.com/questions/46509007/keras-thread-safe-generator-for-model-fit-generator-with-python-3-6-x

georgy7 on 29 Sep 2019

Also, instead of using complex generators, you may create an index on data set in a database and then:

select ... order by random() limit batch_size

georgy7 on 29 Sep 2019

Dear sir，how to solve this problem?@hainguyenct。I have the same problem

liduang on 30 Sep 2019

tensorflow serving for keras may solve this problem

koestlerWang on 8 Apr 2020

I can confirm that tensorflow does speed up the process, but the bottleneck remains.

tuskiomi on 15 Apr 2020

Was this page helpful?

0 / 5 - 0 ratings

Related issues

How to add reshape layer following embedding(for the purpose of 2d convolution)?

rantsandruse · 3Comments

Understanding stateful_lstm.py

amityaffliction · 3Comments

compile() should not require arguments when not training

kylemcdonald · 3Comments

concatenate models (not layers)

LuCeHe · 3Comments

Accessing the internal states c of LSTM for all the time steps of each input sequence

yil8 · 3Comments