Models: assertion failed: [`predictions` out of bound] in Deeplab eval.py with ADE20K

Created on 8 May 2018 · 12Comments · Source: tensorflow/models

What is the top-level directory of the model you are using:
/content
Have I written custom code (as opposed to using a stock example script provided in TensorFlow):
No
OS Platform and Distribution (e.g., Linux Ubuntu 16.04):
Ubuntu 17.10 in Google Colab (env: Python2 with GPU)
TensorFlow installed from (source or binary):
standard Tensorflow in Google Colab
TensorFlow version (use command below):
('unknown', '1.7.0')
Bazel version (if compiling from source):
N/A
CUDA/cuDNN version:
Cuda 8.0
GPU model and memory:
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 384.111 Driver Version: 384.111 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
|===============================+======================+======================|
| 0 Tesla K80 Off | 00000000:00:04.0 Off | 0 |
| N/A 29C P8 26W / 149W | 0MiB / 11439MiB | 0% Default |
+-------------------------------+----------------------+----------------------+
Exact command to reproduce:
In Google Colab:

Cell1:

%cd
!git clone https://github.com/tensorflow/models.git /content/models

Cell2:

%cd models/research/deeplab/datasets
!sh ./download_and_convert_ade20k.sh

Cell3:

%cd /content/models/research
%env PYTHONPATH=/env/python/:/content/models/research/:/content/models/research/slim
%env WORK_DIR=/content/models/research/deeplab

# Set up the working directories.
%env INIT_FOLDER=/content/models/research/deeplab/datasets/ADE20K/init_models
%env TRAIN_LOGDIR=/content/models/research/deeplab/datasets/ADE20K/exp/train_on_trainval_set/train
%env EVAL_LOGDIR=/content/models/research/deeplab/datasets/ADE20K/exp/train_on_trainval_set/eval
%env EXPORT_DIR=/content/models/research/deeplab/datasets/ADE20K/exp/train_on_trainval_set/export
!mkdir -p "${INIT_FOLDER}"
!mkdir -p "${TRAIN_LOGDIR}"
!mkdir -p "${EVAL_LOGDIR}"
!mkdir -p "${EXPORT_DIR}"

# Copy locally the trained checkpoint as the initial checkpoint.
%env TF_INIT_ROOT=http://download.tensorflow.org/models
%env TF_INIT_CKPT=deeplabv3_mnv2_pascal_train_aug_2018_01_29.tar.gz
%cd /content/models/research/deeplab/datasets/ADE20K/init_models
!wget -nd -c "${TF_INIT_ROOT}/${TF_INIT_CKPT}"
!tar -xf "${TF_INIT_CKPT}"
%cd "/content/models/research/"

%env ADE20K_DATASET=/content/models/research/deeplab/datasets/ADE20K/tfrecord

print('START train.py')
%env NUM_ITERATIONS=1000
!python "${WORK_DIR}"/train.py \
  --logtostderr \
  --training_number_of_steps="${NUM_ITERATIONS}" \
  --train_split="train" \
  --model_variant="mobilenet_v2" \
  --train_crop_size=513 \
  --train_crop_size=513 \
  --train_batch_size=4 \
  --min_resize_value=350 \
  --max_resize_value=500 \
  --resize_factor=16 \
  --fine_tune_batch_norm=False \
  --dataset="ade20k" \
  --initialize_last_layer=False \
  --last_layers_contain_logits_only=True \
  --tf_initial_checkpoint="${INIT_FOLDER}/deeplabv3_mnv2_pascal_train_aug/model.ckpt-30000" \
  --train_logdir="${TRAIN_LOGDIR}" \
  --dataset_dir="${ADE20K_DATASET}"


print('START eval.py')
!python "${WORK_DIR}"/eval.py \
    --logtostderr \
    --eval_split="val" \
    --model_variant="mobilenet_v2" \
    --eval_crop_size=2113 \
    --eval_crop_size=2113 \
    --dataset="ade20k" \
    --checkpoint_dir=${TRAIN_LOGDIR} \
    --eval_logdir=${EVAL_LOGDIR} \
    --dataset_dir=${ADE20K_DATASET}

Describe the problem

I try to train and evaluate deeplab model with ADE20K dataset in Google Colab.
I use as initial checkpoint mobilenetv2_coco_voc_trainaug, but I get the same error if I use xception_coco_voc_trainaug.
I see even others here #3730 has the same problem.
Can you help please?

Source code / logs

I get error in evaluation step:

InvalidArgumentError (see above for traceback): assertion failed: [`predictions` out of bound] [Condition x < y did not hold element-wise:] [x (mean_iou/confusion_matrix/control_dependency_1:0) = ] [0 3 3...] [y (mean_iou/ToInt64_2:0) = ] [150]

Source

RomRoc

Most helpful comment

hi,I have meet the same problem,and I got the solution by here,and I modify the code at line 145 in eval.py from:

metric_map = {}

metric_map[predictions_tag] = tf.metrics.mean_iou(
        predictions, labels, dataset.num_classes, weights=weights)

    metric_map = {}

    # insert by trobr
    indices = tf.squeeze(tf.where(tf.less_equal(
        labels, dataset.num_classes - 1)), 1)
    labels = tf.cast(tf.gather(labels, indices), tf.int32)
    predictions = tf.gather(predictions, indices)
    # end of insert

    metric_map[predictions_tag] = tf.metrics.mean_iou(
        predictions, labels, dataset.num_classes, weights=weights)

After that I have got the expected results.

trobr on 26 Jul 2018

👍23

All 12 comments

Thank you for your post. We noticed you have not filled out the following field in the issue template. Could you update them if they are relevant in your case, or leave them as N/A? Thanks.
What is the top-level directory of the model you are using
Have I written custom code
OS Platform and Distribution
TensorFlow installed from
TensorFlow version
Bazel version
CUDA/cuDNN version
GPU model and memory
Exact command to reproduce

tensorflowbutler on 10 May 2018

Yes sure, i just did it.
Bye

RomRoc on 10 May 2018

We will update the tutorial and provide a checkpoint for ADE20K soon. Please stay tuned.

aquariusjay on 10 May 2018

👍3

hi,I have meet the same problem,and I got the solution by here,and I modify the code at line 145 in eval.py from:

metric_map = {}

metric_map[predictions_tag] = tf.metrics.mean_iou(
        predictions, labels, dataset.num_classes, weights=weights)

    metric_map = {}

    # insert by trobr
    indices = tf.squeeze(tf.where(tf.less_equal(
        labels, dataset.num_classes - 1)), 1)
    labels = tf.cast(tf.gather(labels, indices), tf.int32)
    predictions = tf.gather(predictions, indices)
    # end of insert

    metric_map[predictions_tag] = tf.metrics.mean_iou(
        predictions, labels, dataset.num_classes, weights=weights)

After that I have got the expected results.

trobr on 26 Jul 2018

👍23

@trobr I got the same problem, and do as you said, new error has arisen:
InvalidArgumentError (see above for traceback): indices[1940480] = 4319229436281876843 is not in [0, 2100225)
[[{{node GatherV2}} = GatherV2[Taxis=DT_INT32, Tindices=DT_INT64, Tparams=DT_INT32, _device="/job:localhost/replica:0/task:0/device:CPU:0"](Select/_4713, Squeeze/_4715, GatherV2/axis)]]
[[{{node GatherV2/_4717}} = _Recv[client_terminated=false, recv_device="/job:localhost/replica:0/task:0/device:GPU:0", send_device="/job:localhost/replica:0/task:0/device:CPU:0", send_device_incarnation=1, tensor_name="edge_780_GatherV2", tensor_type=DT_INT32, _device="/job:localhost/replica:0/task:0/device:GPU:0"]()]]

How can I solve this problem? thanks in advance;

DawnWalker on 14 May 2019

@trobr
I tried your fix but it gives following error-

AttributeError: 'Dataset' object has no attribute 'num_classes'

RajatGarg45 on 6 Jun 2019

@RajatGarg45, rename 'num_classes' to 'num_of_classes'

wjspoel on 14 Jun 2019

Hi There,
We are checking to see if you still need help on this, as this seems to be considerably old issue. Please update this issue with the latest information, code snippet to reproduce your issue and error you are seeing.
If we don't hear from you in the next 7 days, this issue will be closed automatically. If you don't need help on this issue any more, please consider closing this.

tensorflowbutler on 30 Jan 2020

I had almost the same problem ici
tensorflow.python.framework.errors_impl.InvalidArgumentError: assertion failed: [labels out of bound] [Condition x < y did not hold element-wise:] [x (mean_iou/confusion_matrix/control_dependency:0) = ] [255 255 255...] [y (mean_iou/Cast_1:0) = ] [6]
[[{{node mean_iou/confusion_matrix/assert_less/Assert/AssertGuard/Assert}}]]

ouzzane on 24 Aug 2020

👍1

I also had the same error.

tensorflow.python.framework.errors_impl.InvalidArgumentError: 2 root error(s) found.
(0) Invalid argument: assertion failed: [labels out of bound] [Condition x < y did not hold element-wise:] [x (mean_iou/confusion_matrix/control_dependency:0) = ] [0 0 0...] [y (mean_iou/Cast_1:0) = ] [2]
[[{{node mean_iou/confusion_matrix/assert_less/Assert/AssertGuard/Assert}}]]
[[mean_iou/confusion_matrix/stack_1/_1731]]
(1) Invalid argument: assertion failed: [labels out of bound] [Condition x < y did not hold element-wise:] [x (mean_iou/confusion_matrix/control_dependency:0) = ] [0 0 0...] [y (mean_iou/Cast_1:0) = ] [2]
[[{{node mean_iou/confusion_matrix/assert_less/Assert/AssertGuard/Assert}}]]

trongan93 on 1 Dec 2020

tensorflow.python.framework.errors_impl.InvalidArgumentError: 2 root error(s) found.
(0) Invalid argument: assertion failed: [labels out of bound] [Condition x < y did not hold element-wise:] [x (mean_iou/confusion_matrix/control_dependency:0) = ] [0 0 0...] [y (mean_iou/Cast_1:0) = ] [2]
[[{{node mean_iou/confusion_matrix/assert_less/Assert/AssertGuard/Assert}}]]
[[ConstantFoldingCtrl/mean_iou/confusion_matrix/assert_non_negative_1/assert_less_equal/Assert/AssertGuard/Switch_0/_4462]]
(1) Invalid argument: assertion failed: [labels out of bound] [Condition x < y did not hold element-wise:] [x (mean_iou/confusion_matrix/control_dependency:0) = ] [0 0 0...] [y (mean_iou/Cast_1:0) = ] [2]
[[{{node mean_iou/confusion_matrix/assert_less/Assert/AssertGuard/Assert}}]]