System information
Problem Description
I' m trying to finetune deeplab3+ with a new dataset (with a different number of classes).
I converted the dataset to tfrecords (training and validation) and started to train the model without problems using train.py.
Now I want to evaluate the new checkpoint, running the evaluation script eval.py, and I obtain a shape mismatch error.
tensorflow.python.framework.errors_impl.InvalidArgumentError: Shape mismatch in tuple component 1. Expected [513,513,3], got [2448,2448,3]
[[Node: batch/padding_fifo_queue_enqueue = QueueEnqueueV2[Tcomponents=[DT_INT64, DT_FLOAT, DT_STRING, DT_INT32, DT_UINT8, DT_INT64], timeout_ms=-1, _device="/job:localhost/replica:0/task:0/device:CPU:0"](batch/padding_fifo_queue, Reshape_3/_4659, add_2/_4661, Reshape_1, add_3/_4663, case/cond/Merge/_4665, Reshape_6/_4667)]]
The problem seems to be inside the evaluation cycle in eval.py
slim.evaluation.evaluation_loop(
master=FLAGS.master,
checkpoint_dir=FLAGS.checkpoint_dir,
logdir=FLAGS.eval_logdir,
num_evals=num_batches,
eval_op=list(metrics_to_updates.values()),
max_number_of_evaluations=num_eval_iters,
eval_interval_secs=FLAGS.eval_interval_secs,
hooks=[tf_debug.LocalCLIDebugHook()]))
I don't understand this error because the preprocessing seems the same (crop and resize).
I tried also to use the tensorflow debugger without success.
I'm running
python "${WORK_DIR}"/train.py \
--logtostderr \
--save_summaries_secs=100 \
--train_split="training" \
--model_variant="xception_65" \
--atrous_rates=6 \
--atrous_rates=12 \
--atrous_rates=18 \
--output_stride=16 \
--decoder_output_stride=4 \
--train_crop_size=513 \
--train_crop_size=513 \
--train_batch_size=4 \
--training_number_of_steps="${NUM_ITERATIONS}" \
--dataset="mapillary" \
--fine_tune_batch_norm=false \
--tf_initial_checkpoint="${INIT_FOLDER}/deeplabv3_pascal_train_aug/model.ckpt" \
--initialize_last_layer=false \
--train_logdir="${TRAIN_LOGDIR}" \
--dataset_dir="${NEW_DATASET}"
and
python "${WORK_DIR}"/eval.py \
--logtostderr \
--eval_split="validation" \
--model_variant="xception_65" \
--atrous_rates=6 \
--atrous_rates=12 \
--atrous_rates=18 \
--output_stride=16 \
--decoder_output_stride=4 \
--eval_crop_size=513 \
--eval_crop_size=513 \
--dataset="mapillary" \
--checkpoint_dir="${TRAIN_LOGDIR}" \
--eval_logdir="${EVAL_LOGDIR}" \
--dataset_dir="${NEW_DATASET}" \
--max_number_of_evaluations=1
I don't know if I' m doing something wrong with the data conversion or there is some problem with the code.
Set eval_crop_size = output_stride * k + 1 for your dataset.
The default value, 513, is set for PASCAL images whose largest image dimension is 512.
We pick k = 32, resulting in eval_crop_size = 16 * 32 + 1 = 513 > 512, since we will do whole image inference.
Similar case as we did for Cityscapes images, where we set eval_crop_size = 1025x2049.
Thanks for your answer. I solved this issue.
Good job on solving the problem!
Closing the issue.
@aquariusjay,Hi,What does the k mean?
Hi, I also met this problem, I still don't know what the k is . Could you help me?
@shipeng-uestc I got it. suppose your datasets' largest image dimension is 875. Make k grow from a small integer. Until you try k=55, eval_crop_size=16*55+1=881>875 .
I have the same problem. Why don't we need to set anything with train.py, and we need to set eval_crop_size in eval.py?
They both have crop_size parameter with [513, 513] as default values (train_crop_size in train.py, eval_crop_size in eval.py), but it seems to take effect only in train.py.
Thanks
I still don't understand, why we do not need to set train_crop_size, but should reset eval_crop_size, In my sight, I don't understand difference between those two parameters, can anyone else explain it ?
Hi @aquariusjay, would you mind explaining what "whole-image inference" does, and why it affects the performance a lot? Or could you share some link about this topic? Thanks!
Could you pls explain where change the value
Suppose the largest image dimension is 512x512 in the dataset.
During training, one may use a smaller crop size due to the limited GPU resources at hand. For example, one could set train_crop_size = [321, 321] (note that we always use odd-valued crop_size = k * output_stride + 1, as mentioned in the code repository). We found that using larger crop size is beneficial for the model, and thus if memory allows, we set train_crop_size = [513, 513].
During evaluation/inference, we do not perform segmentation within small regions, but instead we segment the whole image. Thus we set eval_crop_size = [513, 513]. If we set eval_crop_size < largest image resolution, the code will crash since there are regions left unprocessed.
Thank so much for your reply.
I rectified this error by keeping only .jpg images.
On 3 Aug 2018 9:21 p.m., "aquariusjay" notifications@github.com wrote:
Suppose the largest image dimension is 512x512 in the dataset.
During training, one may use a smaller crop size due to the limited GPU
resources at hand. For example, one could set train_crop_size = [321, 321]
(note that we always use odd-valued crop_size = k * output_stride + 1, as
mentioned in the code repository). We found that using larger crop size is
beneficial for the model, and thus if memory allows, we set train_crop_size
= [513, 513].During evaluation/inference, we do not perform segmentation within small
regions, but instead we segment the whole image. Thus we set eval_crop_size
= [513, 513]. If we set eval_crop_size < largest image resolution, the code
will crash since there are regions left unprocessed.—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub
https://github.com/tensorflow/models/issues/3886#issuecomment-410295851,
or mute the thread
https://github.com/notifications/unsubscribe-auth/AObVk8Eqj6W7Y2KjKyM115PIEaJBWnbeks5uNHGegaJpZM4TIwex
.
Thanks @aquariusjay,
I trained with 512x512 and get similar performance as training with 513x513, so I want to dig into why using odd-valued crop size.
Would you mind link me to the code repository where mentioning "use odd-valued crop_size = k * output_stride + 1". Appreciate it!
Still think it is wired to set crop_size to 513. As you said, the largest height/width is 512. So why we need a crop_size of 513 rather than 512? I believe 512 is large enough to cover a whole image.
Set eval_crop_size = output_stride * k + 1 for your dataset.
The default value, 513, is set for PASCAL images whose largest image dimension is 512.
We pick k = 32, resulting in eval_crop_size = 16 * 32 + 1 = 513 > 512, since we will do whole image inference.
Similar case as we did for Cityscapes images, where we set eval_crop_size = 1025x2049.
Hey so my current image dimensions are 1098x1220, i have set vis_crop_size as [1099,1221]
Shape mismatch in tuple component 1. Expected [1099,1221,3], got [1220,1221,3]
Set eval_crop_size = output_stride * k + 1 for your dataset.
The default value, 513, is set for PASCAL images whose largest image dimension is 512.
We pick k = 32, resulting in eval_crop_size = 16 * 32 + 1 = 513 > 512, since we will do whole image inference.
Similar case as we did for Cityscapes images, where we set eval_crop_size = 1025x2049.
In which file is it modified?
Still think it is wired to set crop_size to 513. As you said, the largest height/width is 512. So why we need a crop_size of 513 rather than 512? I believe 512 is large enough to cover a whole image.
Some people say that's for center point alignment
, maybe it's useful for up sampler or something.
Set eval_crop_size = output_stride * k + 1 for your dataset.
The default value, 513, is set for PASCAL images whose largest image dimension is 512.
We pick k = 32, resulting in eval_crop_size = 16 * 32 + 1 = 513 > 512, since we will do whole image inference.
Similar case as we did for Cityscapes images, where we set eval_crop_size = 1025x2049.In which file is it modified?
It's resolved :b
Most helpful comment
Set eval_crop_size = output_stride * k + 1 for your dataset.
The default value, 513, is set for PASCAL images whose largest image dimension is 512.
We pick k = 32, resulting in eval_crop_size = 16 * 32 + 1 = 513 > 512, since we will do whole image inference.
Similar case as we did for Cityscapes images, where we set eval_crop_size = 1025x2049.