Models: SSD Resnet50 FPN ValueError: Dimensions must be equal

Created on 16 Mar 2018 · 11Comments · Source: tensorflow/models

System information

Have I written custom code (as opposed to using a stock example script provided in TensorFlow): custom .config file, see below
OS Platform and Distribution (e.g., Linux Ubuntu 16.04): Linux Ubuntu 16.04
TensorFlow installed from (source or binary): source
TensorFlow version (use command below): 1.7-rc0 and also tried on 1.6
Python version: Tried 2.7 and 3.5
Bazel version (if compiling from source): 0.11.1
GCC/Compiler version (if compiling from source): gcc (Ubuntu 5.4.0-6ubuntu1~16.04.9) 5.4.0 20160609
CUDA/cuDNN version: 8.0 / 6.0
GPU model and memory: GeForce GTX 1060 6GB
Exact command to reproduce:

python3 object_detection/train.py \
    --pipeline_config_path=object_detection/samples/ssd_resnet_50_fpn_drone.config \
    --train_dir=object_detection/drone \
    --num_clones=1

When I'm trying to run test FPN config from model_builder_test.py (see attach), I'm getting error:

WARNING:tensorflow:From /usr/local/lib/python3.5/dist-packages/tensorflow/contrib/learn/python/learn/datasets/base.py:198: retry (from tensorflow.contrib.learn.python.learn.datasets.base) is deprecated and will be removed in a future version.
Instructions for updating:
Use the retry module or similar alternatives.
WARNING:tensorflow:From /home/undead/reps/tf_models/object_detection/trainer.py:228: create_global_step (from tensorflow.contrib.framework.python.ops.variables) is deprecated and will be removed in a future version.
Instructions for updating:
Please switch to tf.train.create_global_step
Traceback (most recent call last):
  File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/framework/common_shapes.py", line 686, in _call_cpp_shape_fn_impl
    input_tensors_as_shapes, status)
  File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/framework/errors_impl.py", line 516, in __exit__
    c_api.TF_GetCode(self.status.status))
tensorflow.python.framework.errors_impl.InvalidArgumentError: Dimensions must be equal, but are 6 and 5 for 'FeatureExtractor/fpn/top_down_features/add' (op: 'Add') with input shapes: [4,6,6,256], [4,5,5,256].

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "object_detection/train.py", line 167, in <module>
    tf.app.run()
  File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/platform/app.py", line 126, in run
    _sys.exit(main(argv))
  File "object_detection/train.py", line 163, in main
    worker_job_name, is_chief, FLAGS.train_dir)
  File "/home/undead/reps/tf_models/object_detection/trainer.py", line 246, in train
    clones = model_deploy.create_clones(deploy_config, model_fn, [input_queue])
  File "/home/undead/reps/tf_models/slim/deployment/model_deploy.py", line 193, in create_clones
    outputs = model_fn(*args, **kwargs)
  File "/home/undead/reps/tf_models/object_detection/trainer.py", line 179, in _create_losses
    prediction_dict = detection_model.predict(images, true_image_shapes)
  File "/home/undead/reps/tf_models/object_detection/meta_architectures/ssd_meta_arch.py", line 350, in predict
    preprocessed_inputs)
  File "/home/undead/reps/tf_models/object_detection/models/ssd_resnet_v1_fpn_feature_extractor.py", line 164, in extract_features
    scope='top_down_features')
  File "/home/undead/reps/tf_models/object_detection/models/feature_map_generators.py", line 217, in fpn_top_down_feature_maps
    top_down = 0.5 * top_down + 0.5 * residual
  File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/ops/math_ops.py", line 971, in binary_op_wrapper
    return func(x, y, name=name)
  File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/ops/gen_math_ops.py", line 296, in add
    "Add", x=x, y=y, name=name)
  File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/framework/op_def_library.py", line 787, in _apply_op_helper
    op_def=op_def)
  File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/framework/ops.py", line 3292, in create_op
    compute_device=compute_device)
  File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/framework/ops.py", line 3332, in _create_op_helper
    set_shapes_for_outputs(op)
  File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/framework/ops.py", line 2496, in set_shapes_for_outputs
    return _set_shapes_for_outputs(op)
  File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/framework/ops.py", line 2469, in _set_shapes_for_outputs
    shapes = shape_func(op)
  File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/framework/ops.py", line 2399, in call_with_requiring
    return call_cpp_shape_fn(op, require_shape_fn=True)
  File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/framework/common_shapes.py", line 627, in call_cpp_shape_fn
    require_shape_fn)
  File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/framework/common_shapes.py", line 691, in _call_cpp_shape_fn_impl
    raise ValueError(err.message)
ValueError: Dimensions must be equal, but are 6 and 5 for 'FeatureExtractor/fpn/top_down_features/add' (op: 'Add') with input shapes: [4,6,6,256], [4,5,5,256].

model_builder_test.py and ssd_resnet_v1_fpn_feature_extractor_test.py passes on Python2.7.
Tried TF 1.6 and 1.7, Python 2.7 and 3.5, tried different input resolutions. All the same. Seems like a bug.

ssd_resnet_50_fpn_drone.config.zip

Source

UndeadBlow

Most helpful comment

same here
ValueError: Dimensions must be equal, but are 26 and 25 for 'FeatureExtractor/MobilenetV1/fpn/top_down/add' (op: 'Add') with input shapes: [64,26,56,256], [64,25,56,256]
where do you set pad_to_multiple?

TomKomar on 23 Jul 2018

👍5

All 11 comments

You should set the correct input shape and padding to get correct feature map dimension. For example, with image size [256, 256] you have to set pad_to_multiple: 1. See the unit test for more details. I noticed that not every input shape works.

jrabary on 17 Mar 2018

👍3

@jrabary
Seems, that you are right my friend. Thanks you.
Some roslutions really doesn't work, but with explicit pad_to_multiple: 1 and 256 it works.

Therefore feature request: add pretrained FPN to model ZOO.

UndeadBlow on 18 Mar 2018

Having the FPN in the model zoo would be great. I'm trying to reach the performance of the retinanet in the detectron model zoo https://github.com/facebookresearch/Detectron/blob/master/MODEL_ZOO.md
They have 0.35 in box mAP. Currently the best I have is 0.21. If someone has an idea of how to set the parameters ?

There is an another issue on this topic here https://github.com/tensorflow/models/issues/3147

jrabary on 19 Mar 2018

@jrabary
I'm working on retinanet too, but don't know how to config the model.
Could you share your config file with me ? Thanks.

hitlk on 18 Apr 2018

👍3

@jrabary @UndeadBlow Did you guys get any success in using this? where should we put pad_to_multiple? I tried following config.

feature_extractor {
type: 'ssd_resnet50_v1_fpn'
min_depth: 16
depth_multiplier: 1.0
use_depthwise: true
conv_hyperparams {
activation: RELU_6,
regularizer {
l2_regularizer {
weight: 0.00004
}
}
initializer {
truncated_normal_initializer {
stddev: 0.03
mean: 0.0
}
}
batch_norm {
train: true,
scale: true,
center: true,
decay: 0.9997,
epsilon: 0.001,
}
}

I got following error.
ValueError: Number of feature maps is expected to equal the length of num_anchors_per_location.

viralbthakar on 15 Jun 2018

👍3

TomKomar on 23 Jul 2018

👍5

Could you please tell where do you set pad_to_multiple?
I have same problem.

MohammadMoradi on 29 Jul 2018

👍2

pad_to_multiple is set in your pipeline config in the feature extractor section.
I just ran training with this feature extractor for a 3840x2160 model (ran out of memory) and a 1080p model (no problem)

feature_extractor {

  type: 'ssd_resnet50_v1_fpn'
  fpn {

    min_level: 3
   max_level: 8
  }
  min_depth: 0
  depth_multiplier: 1.0
  pad_to_multiple: 32
  conv_hyperparams {
    activation: RELU_6,
    regularizer {
      l2_regularizer {
        weight: 0.0004
      }
    }
    initializer {
      truncated_normal_initializer {
        stddev: 0.03
        mean: 0.0
      }
    }
    batch_norm {
      scale: true,
      decay: 0.997,
      epsilon: 0.001,
    }
  }
  override_base_feature_extractor_hyperparams: true
}