Models: ValueError: Shapes (32, 1, 1, 7) and (32, 7) are incompatible

Created on 13 Oct 2016 · 13Comments · Source: tensorflow/models

hi.

Please let us know which model this issue is about (specify the top-level directory)

slim resnet

python train_image_classifier.py --train_dir=${l} --dataset_name=reku --dataset_split_name=train --dataset_dir=${d} --model_name=resnet_v1_152 --train_image_size=224
Traceback (most recent call last):
File "train_image_classifier.py", line 585, in
tf.app.run()
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/platform/app.py", line 30, in run
sys.exit(main(sys.argv[:1] + flags_passthrough))
File "train_image_classifier.py", line 482, in main
clones = model_deploy.create_clones(deploy_config, clone_fn, [batch_queue])
File "/home/rube/github/models/slim/deployment/model_deploy.py", line 195, in create_clones
outputs = model_fn(_args, *_kwargs)
File "train_image_classifier.py", line 476, in clone_fn
logits, labels, label_smoothing=FLAGS.label_smoothing, weight=1.0)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/contrib/losses/python/losses/loss_ops.py", line 359, in softmax_cross_entropy
logits.get_shape().assert_is_compatible_with(onehot_labels.get_shape())
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/framework/tensor_shape.py", line 750, in assert_is_compatible_with
raise ValueError("Shapes %s and %s are incompatible" % (self, other))
ValueError: Shapes (32, 1, 1, 7) and (32, 7) are incompatible

predictions and labels have different shapes?
any ideas?

regards.

Rube

community support

Source

lupingqiu

👍1

Most helpful comment

I also meet this problem, I am trying the fix this by adding this :
net = tf.squeeze(net, [1, 2], name='SpatialSqueeze')

under the line 199 in resnet_v1.py.

However, I don't know whether this works for you.

codegank on 25 Oct 2016

👍4

All 13 comments

Thanks for reporting this issue. I'm not sure what ${I} and ${d} are or exactly how to reproduce. We're still working on configuring the build and CI testing for this repository. I'm going to flag this issue as community support for the time being. If you don't get an answer soon, I would recommend trying Stack Overflow.

jart on 14 Oct 2016

👍2

I also meet this problem, I am trying the fix this by adding this :
net = tf.squeeze(net, [1, 2], name='SpatialSqueeze')

under the line 199 in resnet_v1.py.

However, I don't know whether this works for you.

codegank on 25 Oct 2016

👍4

I am facing the same issue. Is it fixed?

aldarabs on 28 Nov 2016

@aldarabs net = tf.squeeze(net, [1, 2], name='SpatialSqueeze') from @codegank ,solve it

lupingqiu on 1 Dec 2016

@qiulp ... I tried to add this line of code but the loss of learning did not decrease. Will it effect the learning?Will you please let me know what line number I should add that line of code?

aldarabs on 1 Dec 2016

@aldarabs under the line 199 in resnet_v1.py.It works for me.

lupingqiu on 1 Dec 2016

@qiulp .. Thank you for the instance reply. Just want to make sure it won't affect the training. Moreover, should include in the checkpoint_exclude_scopes (resnet_v1/SpatialSqueeze), right? please see below

DATA_DIR=/media/khaldun/drive1/TensorFlow_ME/dataset
DATASET_DIR=/media/khaldun/drive1/TensorFlow_ME/dataset
TRAIN_DIR=/media/khaldun/drive1/TensorFlow_ME/resnet_50
CHECKPOINT_PATH=/media/khaldun/drive1/TensorFlow_ME/checkpoint/resnet_v1_50.ckpt

/usr/bin/python train_image_classifier.py \
--train_dir=${TRAIN_DIR} \
--dataset_name=flowers \
--dataset_split_name=train \
--dataset_dir=${DATASET_DIR} \
--model_name=resnet_v1_50 \
--checkpoint_path=${CHECKPOINT_PATH} \
--trainable_scopes=resnet_v1/Logits, resnet_v1/SpatialSqueeze \
--checkpoint_exclude_scopes=resnet_v1/Logits, resnet_v1/SpatialSqueeze \
--max_number_of_steps=3000 \
--batch_size=8 \
--learning_rate=0.01 \
--save_interval_secs=60 \
--save_summaries_secs=60 \
--log_every_n_steps=100 \
--optimizer=rmsprop \
--weight_decay=0.00004

aldarabs on 1 Dec 2016

@aldarabs nope,
if num_classes is not None: net = slim.conv2d(net, num_classes, [1, 1], activation_fn=None, normalizer_fn=None, scope='logits') # Convert end_points_collection into a dictionary of end_points. net = tf.squeeze(net, [1, 2], name='xxx') end_points = slim.utils.convert_collection_to_dict(end_points_collection)
I just make the net shapes from (32, 1, 1, 7) to (32, 7) . In my case there is no fine-tuning.

lupingqiu on 1 Dec 2016

Then How can I fine-tune the model using the existing pre-trained models in resnet?

aldarabs on 1 Dec 2016

Hi aldarabs,

Hope everything worked out for you in the end. If not, I wonder if you would have better luck referring to the logit with this slightly different name:

--trainable_scopes=resnet_v1_50/logits
--checkpoint_exclude_scopes=resnet_v1_50/logits

My guess is that your loss is not decreasing because no valid layer name was included for trainable_scopes.

mawah on 28 Dec 2016

👍1

It should work with the updates from @mawah. I am closing this bug now, but feel free to reopen it if you still have issues.

mingxingtan on 8 Feb 2018

Traceback (most recent call last):
File "C:\Users\muromachi\Desktop\ROY\BIGtest131203gpu_new131204_1hr2DDQN.py", line 1894, in
mainQN.model.load_weights('217_5weightsold1.h5')
File "C:\Users\muromachi\Anaconda3\envs\PythonGPU\lib\site-packages\tensorflow_core\python\keras\engine\training.py", line 181, in load_weights
return super(Model, self).load_weights(filepath, by_name)
File "C:\Users\muromachi\Anaconda3\envs\PythonGPU\lib\site-packages\tensorflow_core\python\keras\engine\network.py", line 1177, in load_weights
saving.load_weights_from_hdf5_group(f, self.layers)
File "C:\Users\muromachi\Anaconda3\envs\PythonGPU\lib\site-packages\tensorflow_core\python\keras\saving\hdf5_format.py", line 699, in load_weights_from_hdf5_group
K.batch_set_value(weight_value_tuples)
File "C:\Users\muromachi\Anaconda3\envs\PythonGPU\lib\site-packages\tensorflow_core\python\keras\backend.py", line 3343, in batch_set_value
x.assign(np.asarray(value, dtype=dtype(x)))
File "C:\Users\muromachi\Anaconda3\envs\PythonGPU\lib\site-packages\tensorflow_core\python\opsresource_variable_ops.py", line 814, in assign
self._shape.assert_is_compatible_with(value_tensor.shape)
File "C:\Users\muromachi\Anaconda3\envs\PythonGPU\lib\site-packages\tensorflow_core\python\framework\tensor_shape.py", line 1115, in assert_is_compatible_with
raise ValueError("Shapes %s and %s are incompatible" % (self, other))
ValueError: Shapes (60, 9) and (60, 8) are incompatible