https://github.com/matterport/Mask_RCNN/blob/957ab32e890f1986d113d2a34755927c4892bd49/model.py#L2164
next is returning just one item. Why aren't we passing the generator?
I tried to remove the next but it returns an error when keras is calling np.avarage due to some items having multiple dimensions instead of being scalars.
As a quick fix i removed next and added axis=0 in np.avarage inside keras code
@zamponotiropita Can you point me to where you made the change?
@waleedka the change was in the keras code itself. Just remove the next and follow the traceback to the error in the keras code. I can't point you to the file since I already made the change and I don't get an error anymore.
File ".../lib/python3.6/site-packages/numpy/lib/function_base.py", line 1124, in average
"Axis must be specified when shapes of a and weights "
TypeError: Axis must be specified when shapes of a and weights differ.
In evaluate_generator in Keras training.py, the
outs = self.test_on_batch(x, y, sample_weight=sample_weight)
if isinstance(x, list):
batch_size = len(x[0])
elif isinstance(x, dict):
batch_size = len(list(x.values())[0])
else:
batch_size = len(x)
if batch_size == 0:
raise ValueError('Received an empty batch. '
'Batches should at least contain one item.')
outs = [ o.flatten()[0] for o in outs]
all_outs.append(outs)
I added the
outs = [ o.flatten()[0] for o in outs]
This is hack, the issue is that Keras is expecting out to be a list of numbers, not a list of numpy array's. The model is returning 1,1 arrays, rather than just numbers. I could not figure out the right place in model.py to fix this.
Hello,
in model.py I modified 2 functions: mrcnn_class_loss_graph and mrcnn_bbox_loss_graph and commented out the lines
# loss = K.reshape(loss, [1, 1])
It seem to remove the issue without hacking keras's code.
Why this reshape in the 1st place?
@LamDang's fix works for me. @waleedka is there a real reason for this reshape to be where it is?
This thread got buried quickly so I lost track of it, sorry! The issue is now fixed here.
The fix suggested by @LamDang is correct. Originally, K.reshape(loss, [1, 1]) was needed to allow concatenating loss values when doing multi-GPU training. Later, I handled that case in parallel_model.py but the loss reshaping remained unintentionally. It's not needed anymore and it's removed in the latest commit.
Just adding that, if you run with GPU_COUNT > 1, then this still throws the very same error for me.
Replacing "validation_data": val_generator with "validation_data": next(val_generator)
works for me.
@spMohanty That's odd! I did test on multi-GPU and it worked. Maybe I missed something. Can you send details about the error you got and I'll track it down. (preferably in a separate issue)
I had a similar error. So I removed the changes from the recent master commits, specfically the next(val_generator) and the K.reshape removals, and now it works.
File "/usr/local/lib/python3.5/dist-packages/keras/layers/merge.py", line 155, in call [147/1875]
return self._merge_function(inputs)
File "/usr/local/lib/python3.5/dist-packages/keras/layers/merge.py", line 357, in _merge_function
return K.concatenate(inputs, axis=self.axis)
File "/usr/local/lib/python3.5/dist-packages/keras/backend/tensorflow_backend.py", line 1878, in concatenate
return tf.concat([to_dense(x) for x in tensors], axis)
File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/ops/array_ops.py", line 1099, in concat
return gen_array_ops._concat_v2(values=values, axis=axis, name=name)
File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/ops/gen_array_ops.py", line 706, in _concat_v2
"ConcatV2", values=values, axis=axis, name=name)
File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/framework/op_def_library.py", line 787, in _apply_op_helper
op_def=op_def)
File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/framework/ops.py", line 2956, in create_op
op_def=op_def)
File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/framework/ops.py", line 1470, in __init__
self._traceback = self._graph._extract_stack() # pylint: disable=protected-access
InvalidArgumentError (see above for traceback): Integer division by zero
[[Node: training/SGD/gradients/mrcnn_mask_loss_1/concat_grad/mod = FloorMod[T=DT_INT32, _class=["loc:@mrcnn_mask_loss
_1/concat"], _device="/job:localhost/replica:0/task:0/device:CPU:0"](rpn_class_loss_1/concat/axis, training/SGD/gradients/mrcn
n_mask_loss_1/concat_grad/Rank)]]
[[Node: training/SGD/gradients/tower_3/mask_rcnn/roi_align_classifier/concat_grad/Shape/_2457 = _HostRecv[client_term
inated=false, recv_device="/job:localhost/replica:0/task:0/device:GPU:3", send_device="/job:localhost/replica:0/task:0/device:
CPU:0", send_device_incarnation=1, tensor_name="edge_4689_training/SGD/gradients/tower_3/mask_rcnn/roi_align_classifier/concat
_grad/Shape", tensor_type=DT_INT32, _device="/job:localhost/replica:0/task:0/device:GPU:3"]()]]
I pulled straight from master yesterday. and trained on coco 2017 dataset with GPU_COUNT=4 and it broke. Maybe there are some new changes that have fixed this bug.
I can verify @waleedka @spMohanty that the code for GPU_COUNT > 1 still doesn't work unless you replace val_generator with next(val_generator). But then obviously you are validating only on one batch....I will dig in more unless the problem has already been identified?
Thanks. We have an issue to track this now here: https://github.com/matterport/Mask_RCNN/issues/395
The Mask-RCNN I cloned yesterday still has this problem: TypeError: Axis must be specified when shapes of a and weights differ. when using multi-gpu. Tensorflow 1.7, keras2.1.3.
Changing "validation_data": val_generator to "validation_data": next(val_generator) fixed this.
@YubinXie were you able to solve this?
@waleedka the version of repository i cloned has these changes, still I am getting the error.
warn("Anti-aliasing will be enabled by default in skimage 0.15 to "
Traceback (most recent call last):
File "Skycatch.py", line 502, in
augmentation=augmentation)
File "/home/mrigank/miniconda3/lib/python3.6/site-packages/mask_rcnn-2.1-py3.6.egg/mrcnn/model.py", line 2318, in train
File "/home/mrigank/miniconda3/lib/python3.6/site-packages/keras/legacy/interfaces.py", line 91, in wrapper
return func(args, *kwargs)
File "/home/mrigank/miniconda3/lib/python3.6/site-packages/keras/engine/training.py", line 2250, in fit_generator
max_queue_size=max_queue_size)
File "/home/mrigank/miniconda3/lib/python3.6/site-packages/keras/legacy/interfaces.py", line 91, in wrapper
return func(args, *kwargs)
File "/home/mrigank/miniconda3/lib/python3.6/site-packages/keras/engine/training.py", line 2431, in evaluate_generator
weights=batch_sizes))
File "/home/mrigank/miniconda3/lib/python3.6/site-packages/numpy/lib/function_base.py", line 1142, in average
"Axis must be specified when shapes of a and weights "
TypeError: Axis must be specified when shapes of a and weights differ.
@mriganktiwari Have you tried Changing "validation_data": val_generator to "validation_data": next(val_generator)?
Also, if you used 'set_up.py' to install mask rcnn, you might need to re_setup your mask rcnn. Previously, I changed the local file, but my function was calling the codes in the old version.
@YubinXie Thanks a lot, it worked!
Most helpful comment
In evaluate_generator in Keras training.py, the
I added the
outs = [ o.flatten()[0] for o in outs]
This is hack, the issue is that Keras is expecting out to be a list of numbers, not a list of numpy array's. The model is returning 1,1 arrays, rather than just numbers. I could not figure out the right place in model.py to fix this.