Models: deeplab eval.py fails with assertion failed: [`labels` out of bound] [Condition x < y did not hold element-wise:]

Created on 14 Feb 2020 · 4Comments · Source: tensorflow/models

Hello Tensorflow team,

This is my system information for the issus I have explained below:

System information

What is the top-level directory of the model you are using:
models-master/research/deeplab
I have the latest commit (6fb5646f18d7173dd08a576a819a8dc9b36a6d8b) of the default branch (master).
Have I written custom code (as opposed to using a stock example script provided in TensorFlow):
No
OS Platform and Distribution (e.g., Linux Ubuntu 16.04):
Red Hat Enterprise Linux Server 7.5 (Maipo)
TensorFlow installed from (source or binary):
binary
TensorFlow version (use command below):
1.15.2
Bazel version (if compiling from source): No
CUDA/cuDNN version: No
GPU model and memory: No
Exact command to reproduce:

The exact command that fails:
python eval.py \
--eval_crop_size='513,513' \
--logtostderr \
--eval_split="val" \
--model_variant="xception_65" \
--atrous_rates=6 \
--atrous_rates=12 \
--atrous_rates=18 \
--output_stride=16 \
--decoder_output_stride=4 \
--dataset="cells" \
--checkpoint_dir=/mnt/lustre/LOGDIR \
--eval_logdir=/mnt/lustre/LOGDIREVAL \
--dataset_dir=/mnt/lustre/tfrecord

Describe the problem

I'm failing to run the eval.py script as above. The error I get is:

tensorflow.python.framework.errors_impl.InvalidArgumentError: assertion failed: [labelsout of bound] [Condition x < y did not hold element-wise:] [x (mean_iou/confusion_matrix/control_dependency:0) = ] [3 3 3...] [y (mean_iou/Cast_1:0) = ] [3] [[node mean_iou/confusion_matrix/assert_less/Assert/AssertGuard/Assert (defined at /jorgeenv/lib/python3.6/site-packages/tensorflow_core/python/framework/ops.py:1748) ]]

Prior to that I have:
1. Created the files in the tfrecord folder using *build_voc2012_data.py for the --dataset_dir argument of both train.py and eval.py*
My original images are 500X333 png files. The corresponding masks are 500X333 indexed png
files. There are three indexes 0,1,2, where 0 is the background. For testing purposes I have two images, one for training and one for validation. I have uploaded an example. Therefore in the datasets/data_generator.py script I have added:
_CELLS_INFORMATION = DatasetDescriptor(
splits_to_sizes={
'train': 1,
'trainval': 2,
'val': 1,
},
num_classes=3,
ignore_label=0
)

_DATASETS_INFORMATION = {
'cityscapes': _CITYSCAPES_INFORMATION,
'pascal_voc_seg': _PASCAL_VOC_SEG_INFORMATION,
'ade20k': _ADE20K_INFORMATION,
'cells': _CELLS_INFORMATION,
}

2. successfully run the train.py script like this:
python train.py \
--initialize_last_layer=False \
--last_layers_contain_logits_only=False \
--logtostderr \
--dataset="cells" \
--training_number_of_steps=1 \
--train_split="train" \
--model_variant="xception_65" \
--atrous_rates=6 \
--atrous_rates=12 \
--atrous_rates=18 \
--output_stride=16 \
--decoder_output_stride=4 \
--train_crop_size="513,513" \
--train_batch_size=1 \
--tf_initial_checkpoint=/mnt/lustre/xception/model.ckpt \
--train_logdir=/mnt/lustre/LOGDIR \
--dataset_dir=/mnt/lustre/tfrecord

3. run the eval.py script like this, which produces the error:
python eval.py \
--eval_crop_size='513,513' \
--logtostderr \
--eval_split="val" \
--model_variant="xception_65" \
--atrous_rates=6 \
--atrous_rates=12 \
--atrous_rates=18 \
--output_stride=16 \
--decoder_output_stride=4 \
--dataset="cells" \
--checkpoint_dir=/mnt/lustre/LOGDIR \
--eval_logdir=/mnt/lustre/LOGDIREVAL \
--dataset_dir=/mnt/lustre/tfrecord

The error again is:
tensorflow.python.framework.errors_impl.InvalidArgumentError: assertion failed: [labelsout of bound] [Condition x < y did not hold element-wise:] [x (mean_iou/confusion_matrix/control_dependency:0) = ] [3 3 3...] [y (mean_iou/Cast_1:0) = ] [3] [[node mean_iou/confusion_matrix/assert_less/Assert/AssertGuard/Assert (defined at /jorgeenv/lib/python3.6/site-packages/tensorflow_core/python/framework/ops.py:1748) ]]

Many Thanks
Jorge
Screenshot 2020-02-14 at 10 25 42

research support

Source

jazberna1

👀1 👍1

All 4 comments

Me too facing the same issue. awaiting for help.

thanks in advance

shubashri123 on 18 Feb 2020

👀4

Did you solve the error? I have the same issue

Etheryramirezrs on 8 Sep 2020

Hi,

I am afraid I didn't solve it. Actually I ended up using Mask R CNN.
https://github.com/matterport/Mask_RCNN

Jorge

jazberna1 on 11 Sep 2020

I got the same error.

python "${WORK_DIR}"/train.py \ --logtostderr \ --train_split="train" \ --model_variant="mobilenet_v2" \ --output_stride=16 \ --train_crop_size="513,513" \ --train_batch_size=4 \ --training_number_of_steps="${NUM_ITERATIONS}" \ --fine_tune_batch_norm=False \ --train_logdir="${TRAIN_LOGDIR}" \ --dataset_dir="${PASCAL_DATASET}" \ --dataset="rare_plane"

python "${WORK_DIR}"/eval.py \ --logtostderr \ --eval_split="trainval" \ --model_variant="mobilenet_v2" \ --output_stride=16 \ --eval_crop_size="513,513" \ --eval_batch_size=4 \ --checkpoint_dir="${TRAIN_LOGDIR}" \ --eval_logdir="${EVAL_LOGDIR}" \ --dataset_dir="${PASCAL_DATASET}" \ --dataset="rare_plane" \ --max_number_of_evaluations=1

Please give a response if someone solves this problem. Thanks.