Dear all,
I would like to use 10 cores of cpu to run my model keras.
However, when I run my code, only two - three cpus are using 100%, the others is sleeping

Anyone know the way to distribute the work to all cores?
Thank you.
If you are using TF as the backend you can try the following:
config = tf.ConfigProto(device_count={"CPU": 8})
keras.backend.tensorflow_backend.set_session(tf.Session(config=config))
Let me know if that works
You could also try this:
https://stackoverflow.com/questions/41588383/how-to-run-keras-on-multiple-cores
"8" means the number of distributed cores?
correct
I tried with
if options.nthread>0:
config = tf.ConfigProto(device_count={"CPU": options.nthread})
kr.backend.tensorflow_backend.set_session(tf.Session(config=config))
sess = tf.Session(config=tf.ConfigProto(log_device_placement=True))
print sess
but the result does not improve

Any leads on this problem?
any update? my code on aws deep learning machine also meet with this problem.
I am having the same issue under using different versions of python and keras in different machines.
I can't configure it to use all the resources.
Again same issue. Perhaps certain elements are single core only (eg convolution)?
Dear sir锛宧ow to solve this problem?@hainguyenct
I have the same problem.
I use Tensorflow 2.0 rc with the keras backend.
But I faced the same issue on Tensorflow 1.4 using Tensorflow only without Keras
Same problem here
I think I found solution. Keras actually uses multiple cores out of the box, but you may have a bottleneck in the generators.
Steps to fix this:
Example:
class DataPurpose(Enum):
TRAINING = 1
VALIDATION = 2
class RandomBlockSource:
# You may use global variables only on an object instansiation.
# If you need some global collection here, copy it.
my_list = global_list.copy()
def get_data(self, purpose: DataPurpose):
pass # TODO
def multithreaded_get_training_block(i):
block_source = RandomBlockSource()
return block_source.get_data(DataPurpose.TRAINING)
def multithreaded_get_validation_block(i):
block_source = RandomBlockSource()
return block_source.get_data(DataPurpose.VALIDATION)
@threadsafe_generator
def training_generator(batch_size):
pool = ThreadPool()
while (True):
inputs, outputs = numpy.zeros((batch_size, INPUT_BLOCK_HEIGHT, INPUT_BLOCK_WIDTH, 1)), \
numpy.zeros((batch_size, OUTPUT_BLOCK_HEIGHT, OUTPUT_BLOCK_WIDTH, 1))
# The bottleneck.
pairs = pool.map(multithreaded_get_training_block, range(batch_size))
for i, pair in enumerate(pairs):
inputs[i][:, :, 0] = pair[0]
outputs[i][:, :, 0] = pair[1]
yield inputs, outputs
@threadsafe_generator
def validation_generator(batch_size):
pool = ThreadPool()
while (True):
inputs, outputs = numpy.zeros((batch_size, INPUT_BLOCK_HEIGHT, INPUT_BLOCK_WIDTH, 1)), \
numpy.zeros((batch_size, OUTPUT_BLOCK_HEIGHT, OUTPUT_BLOCK_WIDTH, 1))
# The bottleneck.
pairs = pool.map(multithreaded_get_validation_block, range(batch_size))
for i, pair in enumerate(pairs):
inputs[i][:, :, 0] = pair[0]
outputs[i][:, :, 0] = pair[1]
yield inputs, outputs
history = model.fit_generator(
training_generator(16), steps_per_epoch=200, nb_epoch=epochs,
validation_data=validation_generator(8), validation_steps=4, validation_freq=1,
workers=8)
threadsafe_generator decorator:
https://stackoverflow.com/questions/46509007/keras-thread-safe-generator-for-model-fit-generator-with-python-3-6-x
Also, instead of using complex generators, you may create an index on data set in a database and then:
select ... order by random() limit batch_size
Dear sir锛宧ow to solve this problem?@hainguyenct銆侷 have the same problem
tensorflow serving for keras may solve this problem
I can confirm that tensorflow does speed up the process, but the bottleneck remains.
Most helpful comment
I am having the same issue under using different versions of python and keras in different machines.
I can't configure it to use all the resources.