Mask_rcnn: Validation just doing 1 step

Created on 22 Nov 2017 · 17Comments · Source: matterport/Mask_RCNN

https://github.com/matterport/Mask_RCNN/blob/957ab32e890f1986d113d2a34755927c4892bd49/model.py#L2164

next is returning just one item. Why aren't we passing the generator?
I tried to remove the next but it returns an error when keras is calling np.avarage due to some items having multiple dimensions instead of being scalars.

Source

orestis-z

Most helpful comment

In evaluate_generator in Keras training.py, the

            outs = self.test_on_batch(x, y, sample_weight=sample_weight)

            if isinstance(x, list):
                batch_size = len(x[0])
            elif isinstance(x, dict):
                batch_size = len(list(x.values())[0])
            else:
                batch_size = len(x)
            if batch_size == 0:
                raise ValueError('Received an empty batch. '
                                 'Batches should at least contain one item.')

            outs = [ o.flatten()[0] for o in outs]
            all_outs.append(outs)

I added the
outs = [ o.flatten()[0] for o in outs]
This is hack, the issue is that Keras is expecting out to be a list of numbers, not a list of numpy array's. The model is returning 1,1 arrays, rather than just numbers. I could not figure out the right place in model.py to fix this.

jremillard on 25 Nov 2017

👍3

All 17 comments

As a quick fix i removed next and added axis=0 in np.avarage inside keras code

orestis-z on 22 Nov 2017

@zamponotiropita Can you point me to where you made the change?

waleedka on 23 Nov 2017

@waleedka the change was in the keras code itself. Just remove the next and follow the traceback to the error in the keras code. I can't point you to the file since I already made the change and I don't get an error anymore.

orestis-z on 25 Nov 2017

File ".../lib/python3.6/site-packages/numpy/lib/function_base.py", line 1124, in average
"Axis must be specified when shapes of a and weights "
TypeError: Axis must be specified when shapes of a and weights differ.

jremillard on 25 Nov 2017

In evaluate_generator in Keras training.py, the

            outs = self.test_on_batch(x, y, sample_weight=sample_weight)

            if isinstance(x, list):
                batch_size = len(x[0])
            elif isinstance(x, dict):
                batch_size = len(list(x.values())[0])
            else:
                batch_size = len(x)
            if batch_size == 0:
                raise ValueError('Received an empty batch. '
                                 'Batches should at least contain one item.')

            outs = [ o.flatten()[0] for o in outs]
            all_outs.append(outs)

jremillard on 25 Nov 2017

👍3

Hello,
in model.py I modified 2 functions: mrcnn_class_loss_graph and mrcnn_bbox_loss_graph and commented out the lines
# loss = K.reshape(loss, [1, 1])
It seem to remove the issue without hacking keras's code.

Why this reshape in the 1st place?

LamDang on 28 Jan 2018

👍2

@LamDang's fix works for me. @waleedka is there a real reason for this reshape to be where it is?

killthekitten on 21 Mar 2018

This thread got buried quickly so I lost track of it, sorry! The issue is now fixed here.

The fix suggested by @LamDang is correct. Originally, K.reshape(loss, [1, 1]) was needed to allow concatenating loss values when doing multi-GPU training. Later, I handled that case in parallel_model.py but the loss reshaping remained unintentionally. It's not needed anymore and it's removed in the latest commit.

waleedka on 27 Mar 2018

👍2

Just adding that, if you run with GPU_COUNT > 1, then this still throws the very same error for me.

Replacing "validation_data": val_generator with "validation_data": next(val_generator)
works for me.

spMohanty on 28 Mar 2018

👍2

@spMohanty That's odd! I did test on multi-GPU and it worked. Maybe I missed something. Can you send details about the error you got and I'll track it down. (preferably in a separate issue)

waleedka on 30 Mar 2018

I had a similar error. So I removed the changes from the recent master commits, specfically the next(val_generator) and the K.reshape removals, and now it works.

File "/usr/local/lib/python3.5/dist-packages/keras/layers/merge.py", line 155, in call                            [147/1875]
    return self._merge_function(inputs)
  File "/usr/local/lib/python3.5/dist-packages/keras/layers/merge.py", line 357, in _merge_function
    return K.concatenate(inputs, axis=self.axis)
  File "/usr/local/lib/python3.5/dist-packages/keras/backend/tensorflow_backend.py", line 1878, in concatenate
    return tf.concat([to_dense(x) for x in tensors], axis)
  File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/ops/array_ops.py", line 1099, in concat
    return gen_array_ops._concat_v2(values=values, axis=axis, name=name)
  File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/ops/gen_array_ops.py", line 706, in _concat_v2
    "ConcatV2", values=values, axis=axis, name=name)
  File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/framework/op_def_library.py", line 787, in _apply_op_helper
    op_def=op_def)
  File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/framework/ops.py", line 2956, in create_op
    op_def=op_def)
  File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/framework/ops.py", line 1470, in __init__
    self._traceback = self._graph._extract_stack()  # pylint: disable=protected-access

InvalidArgumentError (see above for traceback): Integer division by zero
         [[Node: training/SGD/gradients/mrcnn_mask_loss_1/concat_grad/mod = FloorMod[T=DT_INT32, _class=["loc:@mrcnn_mask_loss
_1/concat"], _device="/job:localhost/replica:0/task:0/device:CPU:0"](rpn_class_loss_1/concat/axis, training/SGD/gradients/mrcn
n_mask_loss_1/concat_grad/Rank)]]
         [[Node: training/SGD/gradients/tower_3/mask_rcnn/roi_align_classifier/concat_grad/Shape/_2457 = _HostRecv[client_term
inated=false, recv_device="/job:localhost/replica:0/task:0/device:GPU:3", send_device="/job:localhost/replica:0/task:0/device:
CPU:0", send_device_incarnation=1, tensor_name="edge_4689_training/SGD/gradients/tower_3/mask_rcnn/roi_align_classifier/concat
_grad/Shape", tensor_type=DT_INT32, _device="/job:localhost/replica:0/task:0/device:GPU:3"]()]]

I pulled straight from master yesterday. and trained on coco 2017 dataset with GPU_COUNT=4 and it broke. Maybe there are some new changes that have fixed this bug.

JonathanCMitchell on 30 Mar 2018

👍1

I can verify @waleedka @spMohanty that the code for GPU_COUNT > 1 still doesn't work unless you replace val_generator with next(val_generator). But then obviously you are validating only on one batch....I will dig in more unless the problem has already been identified?

michaeloneill on 3 Apr 2018

Thanks. We have an issue to track this now here: https://github.com/matterport/Mask_RCNN/issues/395

waleedka on 4 Apr 2018

The Mask-RCNN I cloned yesterday still has this problem： TypeError: Axis must be specified when shapes of a and weights differ. when using multi-gpu. Tensorflow 1.7, keras2.1.3.

Changing "validation_data": val_generator to "validation_data": next(val_generator) fixed this.

YubinXie on 14 Jun 2018

@YubinXie were you able to solve this?

@waleedka the version of repository i cloned has these changes, still I am getting the error.

warn("Anti-aliasing will be enabled by default in skimage 0.15 to "
Traceback (most recent call last):
File "Skycatch.py", line 502, in
augmentation=augmentation)
File "/home/mrigank/miniconda3/lib/python3.6/site-packages/mask_rcnn-2.1-py3.6.egg/mrcnn/model.py", line 2318, in train
File "/home/mrigank/miniconda3/lib/python3.6/site-packages/keras/legacy/interfaces.py", line 91, in wrapper
return func(args, *kwargs)
File "/home/mrigank/miniconda3/lib/python3.6/site-packages/keras/engine/training.py", line 2250, in fit_generator
max_queue_size=max_queue_size)
File "/home/mrigank/miniconda3/lib/python3.6/site-packages/keras/legacy/interfaces.py", line 91, in wrapper
return func(args, *kwargs)
File "/home/mrigank/miniconda3/lib/python3.6/site-packages/keras/engine/training.py", line 2431, in evaluate_generator
weights=batch_sizes))
File "/home/mrigank/miniconda3/lib/python3.6/site-packages/numpy/lib/function_base.py", line 1142, in average
"Axis must be specified when shapes of a and weights "
TypeError: Axis must be specified when shapes of a and weights differ.

mriganktiwari on 16 Jul 2018

@mriganktiwari Have you tried Changing "validation_data": val_generator to "validation_data": next(val_generator)?
Also, if you used 'set_up.py' to install mask rcnn, you might need to re_setup your mask rcnn. Previously, I changed the local file, but my function was calling the codes in the old version.

YubinXie on 16 Jul 2018

@YubinXie Thanks a lot, it worked!

mriganktiwari on 17 Jul 2018

🎉1

Was this page helpful?

0 / 5 - 0 ratings