Models: [DeepLab] Shape mismatch only for validation set with new dataset

Created on 5 Apr 2018 · 18Comments · Source: tensorflow/models

System information

What is the top-level directory of the model you are using: /deeplab
Have I written custom code: yes
TensorFlow installed from: binary
TensorFlow version: 1.6
CUDA/cuDNN version: 9.1
GPU model and memory: K80 12 GB

Problem Description

I' m trying to finetune deeplab3+ with a new dataset (with a different number of classes).
I converted the dataset to tfrecords (training and validation) and started to train the model without problems using train.py.
Now I want to evaluate the new checkpoint, running the evaluation script eval.py, and I obtain a shape mismatch error.

tensorflow.python.framework.errors_impl.InvalidArgumentError: Shape mismatch in tuple component 1. Expected [513,513,3], got [2448,2448,3]
 [[Node: batch/padding_fifo_queue_enqueue = QueueEnqueueV2[Tcomponents=[DT_INT64, DT_FLOAT, DT_STRING, DT_INT32, DT_UINT8, DT_INT64], timeout_ms=-1, _device="/job:localhost/replica:0/task:0/device:CPU:0"](batch/padding_fifo_queue, Reshape_3/_4659, add_2/_4661, Reshape_1, add_3/_4663, case/cond/Merge/_4665, Reshape_6/_4667)]]

The problem seems to be inside the evaluation cycle in eval.py

slim.evaluation.evaluation_loop(
    master=FLAGS.master,
    checkpoint_dir=FLAGS.checkpoint_dir,
    logdir=FLAGS.eval_logdir,
    num_evals=num_batches,
    eval_op=list(metrics_to_updates.values()),
    max_number_of_evaluations=num_eval_iters,
    eval_interval_secs=FLAGS.eval_interval_secs,
    hooks=[tf_debug.LocalCLIDebugHook()]))

I don't understand this error because the preprocessing seems the same (crop and resize).
I tried also to use the tensorflow debugger without success.

I'm running

python "${WORK_DIR}"/train.py \
  --logtostderr \
  --save_summaries_secs=100 \
  --train_split="training" \
  --model_variant="xception_65" \
  --atrous_rates=6 \
  --atrous_rates=12 \
  --atrous_rates=18 \
  --output_stride=16 \
  --decoder_output_stride=4 \
  --train_crop_size=513 \
  --train_crop_size=513 \
  --train_batch_size=4 \
  --training_number_of_steps="${NUM_ITERATIONS}" \
  --dataset="mapillary" \
  --fine_tune_batch_norm=false \
  --tf_initial_checkpoint="${INIT_FOLDER}/deeplabv3_pascal_train_aug/model.ckpt" \
  --initialize_last_layer=false \
  --train_logdir="${TRAIN_LOGDIR}" \
  --dataset_dir="${NEW_DATASET}"

and

python "${WORK_DIR}"/eval.py \
  --logtostderr \
  --eval_split="validation" \
  --model_variant="xception_65" \
  --atrous_rates=6 \
  --atrous_rates=12 \
  --atrous_rates=18 \
  --output_stride=16 \
  --decoder_output_stride=4 \
  --eval_crop_size=513 \
  --eval_crop_size=513 \
  --dataset="mapillary" \
  --checkpoint_dir="${TRAIN_LOGDIR}" \
  --eval_logdir="${EVAL_LOGDIR}" \
  --dataset_dir="${NEW_DATASET}" \
  --max_number_of_evaluations=1

I don't know if I' m doing something wrong with the data conversion or there is some problem with the code.

Source

georgosgeorgos

👍2

Most helpful comment

Set eval_crop_size = output_stride * k + 1 for your dataset.
The default value, 513, is set for PASCAL images whose largest image dimension is 512.
We pick k = 32, resulting in eval_crop_size = 16 * 32 + 1 = 513 > 512, since we will do whole image inference.
Similar case as we did for Cityscapes images, where we set eval_crop_size = 1025x2049.

aquariusjay on 5 Apr 2018

👍11 😕3

All 18 comments

aquariusjay on 5 Apr 2018

👍11 😕3

Thanks for your answer. I solved this issue.

georgosgeorgos on 5 Apr 2018

Good job on solving the problem!
Closing the issue.

aquariusjay on 5 Apr 2018

👍1

@aquariusjay，Hi，What does the k mean?

shipengai on 7 Apr 2018

Hi, I also met this problem, I still don't know what the k is . Could you help me?

zhengduoru on 13 Apr 2018

@shipeng-uestc I got it. suppose your datasets' largest image dimension is 875. Make k grow from a small integer. Until you try k=55, eval_crop_size=16*55+1=881>875 .

zhengduoru on 13 Apr 2018

I have the same problem. Why don't we need to set anything with train.py, and we need to set eval_crop_size in eval.py?
They both have crop_size parameter with [513, 513] as default values (train_crop_size in train.py, eval_crop_size in eval.py), but it seems to take effect only in train.py.
Thanks

RomRoc on 16 Apr 2018

❤3

I still don't understand, why we do not need to set train_crop_size, but should reset eval_crop_size, In my sight, I don't understand difference between those two parameters, can anyone else explain it ?

fuyi02 on 4 Jun 2018

Hi @aquariusjay, would you mind explaining what "whole-image inference" does, and why it affects the performance a lot? Or could you share some link about this topic? Thanks!

baoruxiao on 23 Jul 2018

Could you pls explain where change the value

ramisha93 on 3 Aug 2018

Suppose the largest image dimension is 512x512 in the dataset.

During training, one may use a smaller crop size due to the limited GPU resources at hand. For example, one could set train_crop_size = [321, 321] (note that we always use odd-valued crop_size = k * output_stride + 1, as mentioned in the code repository). We found that using larger crop size is beneficial for the model, and thus if memory allows, we set train_crop_size = [513, 513].

During evaluation/inference, we do not perform segmentation within small regions, but instead we segment the whole image. Thus we set eval_crop_size = [513, 513]. If we set eval_crop_size < largest image resolution, the code will crash since there are regions left unprocessed.

aquariusjay on 3 Aug 2018

👍7 🎉2

Thank so much for your reply.
I rectified this error by keeping only .jpg images.
On 3 Aug 2018 9:21 p.m., "aquariusjay" notifications@github.com wrote:

Suppose the largest image dimension is 512x512 in the dataset.

During training, one may use a smaller crop size due to the limited GPU
resources at hand. For example, one could set train_crop_size = [321, 321]
(note that we always use odd-valued crop_size = k * output_stride + 1, as
mentioned in the code repository). We found that using larger crop size is
beneficial for the model, and thus if memory allows, we set train_crop_size
= [513, 513].

During evaluation/inference, we do not perform segmentation within small
regions, but instead we segment the whole image. Thus we set eval_crop_size
= [513, 513]. If we set eval_crop_size < largest image resolution, the code
will crash since there are regions left unprocessed.

—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub
https://github.com/tensorflow/models/issues/3886#issuecomment-410295851,
or mute the thread
https://github.com/notifications/unsubscribe-auth/AObVk8Eqj6W7Y2KjKyM115PIEaJBWnbeks5uNHGegaJpZM4TIwex
.

ramisha93 on 3 Aug 2018

Thanks @aquariusjay,
I trained with 512x512 and get similar performance as training with 513x513, so I want to dig into why using odd-valued crop size.
Would you mind link me to the code repository where mentioning "use odd-valued crop_size = k * output_stride + 1". Appreciate it!

baoruxiao on 3 Aug 2018

👍3

Still think it is wired to set crop_size to 513. As you said, the largest height/width is 512. So why we need a crop_size of 513 rather than 512? I believe 512 is large enough to cover a whole image.

Brother-Lee on 8 Jun 2019

Set eval_crop_size = output_stride * k + 1 for your dataset.
The default value, 513, is set for PASCAL images whose largest image dimension is 512.
We pick k = 32, resulting in eval_crop_size = 16 * 32 + 1 = 513 > 512, since we will do whole image inference.
Similar case as we did for Cityscapes images, where we set eval_crop_size = 1025x2049.

Hey so my current image dimensions are 1098x1220, i have set vis_crop_size as [1099,1221]
Shape mismatch in tuple component 1. Expected [1099,1221,3], got [1220,1221,3]

IamShubhamGupto on 15 Jul 2019

Set eval_crop_size = output_stride * k + 1 for your dataset.
The default value, 513, is set for PASCAL images whose largest image dimension is 512.
We pick k = 32, resulting in eval_crop_size = 16 * 32 + 1 = 513 > 512, since we will do whole image inference.
Similar case as we did for Cityscapes images, where we set eval_crop_size = 1025x2049.

In which file is it modified?

MaricheloGv on 3 Sep 2019

Still think it is wired to set crop_size to 513. As you said, the largest height/width is 512. So why we need a crop_size of 513 rather than 512? I believe 512 is large enough to cover a whole image.

Some people say that's for center point alignment, maybe it's useful for up sampler or something.

FantasyJXF on 20 Apr 2020

Set eval_crop_size = output_stride * k + 1 for your dataset.
The default value, 513, is set for PASCAL images whose largest image dimension is 512.
We pick k = 32, resulting in eval_crop_size = 16 * 32 + 1 = 513 > 512, since we will do whole image inference.
Similar case as we did for Cityscapes images, where we set eval_crop_size = 1025x2049.

In which file is it modified?

It's resolved :b

IamShubhamGupto on 25 Apr 2020

Was this page helpful?

0 / 5 - 0 ratings