Models: [Deeplab] SemSeg Error at eval/vis when using a pre-trained model for inference in a tailored dataset

Created on 1 Apr 2019 · 6Comments · Source: tensorflow/models

System information

What is the top-level directory of the model you are using:
C:\tensorflow
Have I written custom code (as opposed to using a stock example script provided in TensorFlow):
Yes
OS Platform and Distribution (e.g., Linux Ubuntu 16.04):
Windows 8.1
TensorFlow installed from (source or binary):
pip install tensorflow (only CPU)
TensorFlow version (use command below):
version = 1.13.1
Bazel version (if compiling from source):
CUDA/cuDNN version:
Only CPU
GPU model and memory:
Exact command to reproduce:
sh local_test.sh

Describe the problem

I used a pre-trained model (deeplab3_pascal_train_aug) to perform semantic segmentation in my own dataset (one label + background, num_classes = 2) by retraining only the last layer. Training seems to go through without error. However, visualization or evaluation gives: Invalid argument: padded_shape[X] = Y is not divisible by block_shape[X] = Z. No matter what I do, I keep getting this error.

My database is composed by images of size [640, 480]. I followed the instructions to build the ground truth images with labels [background = 0, label_1 = 1] and create the tfrecords etc. I saw in previous issue reports that the selection of crop-size during evaluation has to cover the full image. Thus I increased the value eval_crop_size (641 = k * 16 +1 > imagesize) to ensure that it is larger than any dimension of the image.

Running the original local_test.sh works fine.

Source code / logs

Error for the visualization call:

...
Caused by op 'xception_65/exit_flow/block2/unit_1/xception_module/separable_conv
1_depthwise/depthwise/SpaceToBatchND', defined at:
  File "C:/tensorflow/models/research/deeplab/vis.py", line 312, in <module>
    tf.app.run()
  File "C:\ProgramData\Anaconda3\envs\delta\lib\site-packages\tensorflow\python\
platform\app.py", line 125, in run
    _sys.exit(main(argv))
  File "C:/tensorflow/models/research/deeplab/vis.py", line 230, in main
    image_pyramid=FLAGS.image_pyramid)
  File "C:\tensorflow\models\research\deeplab\model.py", line 183, in predict_la
bels
    fine_tune_batch_norm=False)
  File "C:\tensorflow\models\research\deeplab\model.py", line 313, in multi_scal
e_logits
    nas_training_hyper_parameters=nas_training_hyper_parameters)
  File "C:\tensorflow\models\research\deeplab\model.py", line 553, in _get_logit
s
    nas_training_hyper_parameters=nas_training_hyper_parameters)
  File "C:\tensorflow\models\research\deeplab\model.py", line 395, in extract_fe
atures
    use_bounded_activation=model_options.use_bounded_activation)
  File "C:\tensorflow\models\research\deeplab\core\feature_extractor.py", line 3
41, in extract_features
    scope=name_scope[model_variant])
  File "C:\tensorflow\models\research\deeplab\core\feature_extractor.py", line 4
08, in network_fn
    *args, **kwargs)
  File "C:\tensorflow\models\research\deeplab\core\xception.py", line 655, in xc
eption_65
    scope=scope)
  File "C:\tensorflow\models\research\deeplab\core\xception.py", line 464, in xc
eption
    net = stack_blocks_dense(net, blocks, output_stride)
  File "C:\ProgramData\Anaconda3\envs\delta\lib\site-packages\tensorflow\contrib
\framework\python\ops\arg_scope.py", line 182, in func_with_args
    return func(*args, **current_args)
  File "C:\tensorflow\models\research\deeplab\core\xception.py", line 379, in st
ack_blocks_dense
    net = block.unit_fn(net, rate=rate, **dict(unit, stride=1))
  File "C:\ProgramData\Anaconda3\envs\delta\lib\site-packages\tensorflow\contrib
\framework\python\ops\arg_scope.py", line 182, in func_with_args
    return func(*args, **current_args)
  File "C:\tensorflow\models\research\deeplab\core\xception.py", line 293, in xc
eption_module
    scope='separable_conv' + str(i+1))
  File "C:\tensorflow\models\research\deeplab\core\xception.py", line 284, in _s
eparable_conv
    scope=scope)
  File "C:\ProgramData\Anaconda3\envs\delta\lib\site-packages\tensorflow\contrib
\framework\python\ops\arg_scope.py", line 182, in func_with_args
    return func(*args, **current_args)
  File "C:\tensorflow\models\research\deeplab\core\xception.py", line 185, in se
parable_conv2d_same
    outputs = _split_separable_conv2d(padding='SAME')
  File "C:\tensorflow\models\research\deeplab\core\xception.py", line 175, in _s
plit_separable_conv2d
    **kwargs)
  File "C:\ProgramData\Anaconda3\envs\delta\lib\site-packages\tensorflow\contrib
\framework\python\ops\arg_scope.py", line 182, in func_with_args
    return func(*args, **current_args)
  File "C:\ProgramData\Anaconda3\envs\delta\lib\site-packages\tensorflow\contrib
\layers\python\layers\layers.py", line 2822, in separable_convolution2d
    data_format=data_format)
  File "C:\ProgramData\Anaconda3\envs\delta\lib\site-packages\tensorflow\python\
ops\nn_impl.py", line 522, in depthwise_conv2d
    op=op)
  File "C:\ProgramData\Anaconda3\envs\delta\lib\site-packages\tensorflow\python\
ops\nn_ops.py", line 435, in with_space_to_batch
    return new_op(input, None)
  File "C:\ProgramData\Anaconda3\envs\delta\lib\site-packages\tensorflow\python\
ops\nn_ops.py", line 591, in __call__
    return self.call(inp, filter)
  File "C:\ProgramData\Anaconda3\envs\delta\lib\site-packages\tensorflow\python\
ops\nn_ops.py", line 574, in _with_space_to_batch_call
    input=inp, block_shape=dilation_rate, paddings=paddings)
  File "C:\ProgramData\Anaconda3\envs\delta\lib\site-packages\tensorflow\python\
ops\gen_array_ops.py", line 8648, in space_to_batch_nd
    paddings=paddings, name=name)
  File "C:\ProgramData\Anaconda3\envs\delta\lib\site-packages\tensorflow\python\
framework\op_def_library.py", line 788, in _apply_op_helper
    op_def=op_def)
  File "C:\ProgramData\Anaconda3\envs\delta\lib\site-packages\tensorflow\python\
util\deprecation.py", line 507, in new_func
    return func(*args, **kwargs)
  File "C:\ProgramData\Anaconda3\envs\delta\lib\site-packages\tensorflow\python\
framework\ops.py", line 3300, in create_op
    op_def=op_def)
  File "C:\ProgramData\Anaconda3\envs\delta\lib\site-packages\tensorflow\python\
framework\ops.py", line 1801, in __init__
    self._traceback = tf_stack.extract_stack()

InvalidArgumentError (see above for traceback): padded_shape[0]=127 is not divis
ible by block_shape[0]=2
         [[node xception_65/exit_flow/block2/unit_1/xception_module/separable_co
nv1_depthwise/depthwise/SpaceToBatchND (defined at C:\tensorflow\models\research
\deeplab\core\xception.py:175) ]]

My code:

changes in data_generator.py

_MyDataset = DatasetDescriptor(
    splits_to_sizes={
        'train': 125,  # num of samples in images/training
        'val': 125,  # num of samples in images/validation
        'trainval': 250,  # num of samples in images/validation
    },
    num_classes=2,
    ignore_label=255,
)
_DATASETS_INFORMATION = {
    'cityscapes': _CITYSCAPES_INFORMATION,
    'pascal_voc_seg': _PASCAL_VOC_SEG_INFORMATION,
    'ade20k': _ADE20K_INFORMATION,
    'MyDataset': _MyDataset
}

My modified local_test.sh script:

cd ..
# Set up the working environment.
CURRENT_DIR=$(pwd)
WORK_DIR="${CURRENT_DIR}/deeplab"
DATASET_DIR="datasets"

# Set up the working directories.
PASCAL_FOLDER="MyDataset"
EXP_FOLDER="exp/train_on_trainval_set"
PASCAL_DATASET="C:\tensorflow\models\research\deeplab\datasets\MyDataset\tfrecord"

INIT_FOLDER="C:/tensorflow\models\research\deeplab\datasets\pascal_voc_seg\init_models"
EVAL_LOGDIR="${WORK_DIR}/${DATASET_DIR}/${PASCAL_FOLDER}/${EXP_FOLDER}/eval"
TRAIN_LOGDIR="${WORK_DIR}/${DATASET_DIR}/${PASCAL_FOLDER}/${EXP_FOLDER}/train"
VIS_LOGDIR="${WORK_DIR}/${DATASET_DIR}/${PASCAL_FOLDER}/${EXP_FOLDER}/vis"
EXPORT_LOGDIR="${WORK_DIR}/${DATASET_DIR}/${PASCAL_FOLDER}/${EXP_FOLDER}/export"


mkdir -p "${WORK_DIR}/${DATASET_DIR}/${PASCAL_FOLDER}/exp"
mkdir -p "${TRAIN_LOGDIR}"

NUM_ITERATIONS=10
python "${WORK_DIR}"/train.py \
  --logtostderr \
  --train_split="train" \
  --model_variant="xception_65" \
  --atrous_rates=6 \
  --atrous_rates=12 \
  --atrous_rates=18 \
  --output_stride=16 \
  --decoder_output_stride=4 \
  --train_crop_size=321 \
  --train_crop_size=321 \
  --train_batch_size=1 \
  --training_number_of_steps="${NUM_ITERATIONS}" \
  --fine_tune_batch_norm=false \
  --initialize_last_layer=false \
  --last_layers_contain_logits_only=true \
  --dataset="${PASCAL_FOLDER}" \
  --tf_initial_checkpoint="${INIT_FOLDER}/deeplabv3_pascal_train_aug/model.ckpt" \
  --train_logdir="${TRAIN_LOGDIR}" \
  --dataset_dir="${PASCAL_DATASET}"


# Run evaluation. 
python "${WORK_DIR}"/eval.py \
  --logtostderr \
  --eval_split="val" \
  --model_variant="xception_65" \
  --atrous_rates=6 \
  --atrous_rates=12 \
  --atrous_rates=18 \
  --output_stride=16 \
  --decoder_output_stride=4 \
  --eval_crop_size=641 \ 
  --eval_crop_size=641 \
  --checkpoint_dir="${TRAIN_LOGDIR}" \
  --eval_logdir="${EVAL_LOGDIR}" \
  --dataset_dir="${PASCAL_DATASET}" \
  --dataset="${PASCAL_FOLDER}" \
  --max_number_of_evaluations=1 

  # Visualize the results.
python "${WORK_DIR}"/vis.py \
  --logtostderr \
  --vis_split="val" \
  --model_variant="xception_65" \
  --atrous_rates=6 \
  --atrous_rates=12 \
  --atrous_rates=18 \
  --output_stride=16 \
  --decoder_output_stride=4 \
  --vis_crop_size=641\
  --vis_crop_size=641\
  --checkpoint_dir="${TRAIN_LOGDIR}" \
  --vis_logdir="${VIS_LOGDIR}" \
  --dataset_dir="${PASCAL_DATASET}" \
  --dataset="${PASCAL_FOLDER}" \
  --max_number_of_iterations=1

research

Source

Argantonio65

Most helpful comment

I just solved the problem following #6559

The thing that solved my problem was a change in: utils -> train_utils.py
I had to explicitly modify the exclude list:
From:

exclude_list = ['global_step']

to:

exclude_list = ['global_step','logits']

I think this should be controlled by the arguments in train.py:

  --fine_tune_batch_norm=false \
  --initialize_last_layer=false \
  --last_layers_contain_logits_only=true \

However it seems that for some reason they were not parsed correctly, and the exclude list has to be updated manually. Also I changed the crop size in evaluation and visualization to: 481, 641 (since my images are 480x640.

Hope this serves for you also!

Argantonio65 on 25 Apr 2019

👍2

All 6 comments

I am getting same error but using the ade20k dataset.

InvalidArgumentError (see above for traceback): padded_shape[1]=58 is not divisible by block_shape[1]=6

codysjackson on 14 Apr 2019

@codysjackson I have encountered exactly the same error like yours when using ade20k dataset. Have you fixed it? Thanks!

minghaoPA on 25 Apr 2019

I have solved this issue thanks to #3939.