hi, when I run 'python3 parallel_model.py ', an error has happened.any ideas?
I had the same problem.Have you solved it?
hi, when I run 'python3 parallel_model.py ', an error has happened.any ideas?
The code is updated. Please look at Pull requests.My problem has been solved.
The problems happens to me when I was using keras 2.2.2, when I downgraded to keras 2.1.3, the problem runs away.
The problems happens to me when I was using keras 2.2.2, when I downgraded to keras 2.1.3, the problem runs away.
thanks,you are right!
I believe this was related to the parallel model's init function.
I was able to get Keras v2.2.2 working by changing the parallel model like so:
def make_parallel(keras_model, gpu_count):
"""Creates a new wrapper model that consists of multiple replicas of
the original model placed on different GPUs.
Args:
keras_model: the input model to replicate on multiple gpus
gpu_count: the number of replicas to build
Returns:
Multi-gpu model
"""
# Slice inputs. Slice inputs on the CPU to avoid sending a copy
# of the full inputs to all GPUs. Saves on bandwidth and memory.
input_slices = {name: tf.split(x, gpu_count)
for name, x in zip(keras_model.input_names,
keras_model.inputs)}
output_names = keras_model.output_names
outputs_all = []
for i in range(len(keras_model.outputs)):
outputs_all.append([])
# Run the model call() on each GPU to place the ops there
for i in range(gpu_count):
with tf.device('/gpu:%d' % i):
with tf.name_scope('tower_%d' % i):
# Run a slice of inputs through this replica
zipped_inputs = zip(keras_model.input_names,
keras_model.inputs)
inputs = [
KL.Lambda(lambda s: input_slices[name][i],
output_shape=lambda s: (None,) + s[1:])(tensor)
for name, tensor in zipped_inputs]
# Create the model replica and get the outputs
outputs = keras_model(inputs)
if not isinstance(outputs, list):
outputs = [outputs]
# Save the outputs for merging back together later
for l, o in enumerate(outputs):
outputs_all[l].append(o)
# Merge outputs on CPU
with tf.device('/cpu:0'):
merged = []
for outputs, name in zip(outputs_all, output_names):
# Concatenate or average outputs?
# Outputs usually have a batch dimension and we concatenate
# across it. If they don't, then the output is likely a loss
# or a metric value that gets averaged across the batch.
# Keras expects losses and metrics to be scalars.
if K.int_shape(outputs[0]) == ():
# Average
m = KL.Lambda(lambda o: tf.add_n(
o) / len(outputs), name=name)(outputs)
else:
# Concatenate
m = KL.Concatenate(axis=0, name=name)(outputs)
merged.append(m)
return merged
class ParallelModel(KM.Model):
"""Subclasses the standard Keras Model and adds multi-GPU support.
It works by creating a copy of the model on each GPU. Then it slices
the inputs and sends a slice to each copy of the model, and then
merges the outputs together and applies the loss on the combined
outputs.
"""
def __init__(self, keras_model, gpu_count):
"""Class constructor.
keras_model: The Keras model to parallelize
gpu_count: Number of GPUs. Must be > 1
"""
merged_outputs = make_parallel(
keras_model=keras_model, gpu_count=gpu_count)
super(ParallelModel, self).__init__(inputs=keras_model.inputs,
outputs=merged_outputs)
self.inner_model = keras_model
def __getattribute__(self, attrname):
"""Redirect loading and saving methods to the inner model. That's where
the weights are stored."""
if 'load' in attrname or 'save' in attrname:
return getattr(self.inner_model, attrname)
return super(ParallelModel, self).__getattribute__(attrname)
def summary(self, *args, **kwargs):
"""Override summary() to display summaries of both, the wrapper
and inner models."""
super(ParallelModel, self).summary(*args, **kwargs)
self.inner_model.summary(*args, **kwargs)
I am using keras 2.2.4 tensorflow1.14.0
@dcyoung there is a new error that
def __getattribute__(self, attrname):
"""Redirect loading and saving methods to the inner model. That's where
the weights are stored."""
if 'load' in attrname or 'save' in attrname:
return getattr(self.inner_model, attrname)
return super(ParallelModel, self).__getattribute__(attrname)
it showed
Traceback (most recent call last):
File "lpr_train.py", line 283, in <module>
layers='all',augmentation=augmentation)
File "/data/workspace/willy_sung/Mask_RCNN/mrcnn/model.py", line 2369, in train
use_multiprocessing=True,
File "/usr/local/lib/python3.6/dist-packages/keras/legacy/interfaces.py", line 91, in wrapper
return func(*args, **kwargs)
File "/usr/local/lib/python3.6/dist-packages/keras/engine/training.py", line 1418, in fit_generator
initial_epoch=initial_epoch)
File "/usr/local/lib/python3.6/dist-packages/keras/engine/training_generator.py", line 251, in fit_generator
callbacks.on_epoch_end(epoch, epoch_logs)
File "/usr/local/lib/python3.6/dist-packages/keras/callbacks.py", line 79, in on_epoch_end
callback.on_epoch_end(epoch, logs)
File "/usr/local/lib/python3.6/dist-packages/keras/callbacks.py", line 444, in on_epoch_end
self.model.save_weights(filepath, overwrite=True)
File "/data/workspace/willy_sung/Mask_RCNN/mrcnn/parallel_model.py", line 98, in __getattribute__
return getattr(self.inner_model, attrname)
File "/data/workspace/willy_sung/Mask_RCNN/mrcnn/parallel_model.py", line 99, in __getattribute__
return super(ParallelModel, self).__getattribute__(attrname)
AttributeError: 'ParallelModel' object has no attribute 'inner_model'
it seems ttaht there is no innwe_model anymore
how to modify it
Most helpful comment
I believe this was related to the parallel model's init function.
I was able to get Keras
v2.2.2working by changing the parallel model like so: