What is the top-level directory of the model you are using:
/content
Have I written custom code (as opposed to using a stock example script provided in TensorFlow):
No
OS Platform and Distribution (e.g., Linux Ubuntu 16.04):
Ubuntu 17.10 in Google Colab (env: Python2 with GPU)
TensorFlow installed from (source or binary):
standard Tensorflow in Google Colab
TensorFlow version (use command below):
('unknown', '1.7.0')
Bazel version (if compiling from source):
N/A
CUDA/cuDNN version:
Cuda 8.0
GPU model and memory:
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 384.111 Driver Version: 384.111 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
|===============================+======================+======================|
| 0 Tesla K80 Off | 00000000:00:04.0 Off | 0 |
| N/A 29C P8 26W / 149W | 0MiB / 11439MiB | 0% Default |
+-------------------------------+----------------------+----------------------+
Exact command to reproduce:
In Google Colab:
Cell1:
%cd
!git clone https://github.com/tensorflow/models.git /content/models
Cell2:
%cd models/research/deeplab/datasets
!sh ./download_and_convert_ade20k.sh
Cell3:
%cd /content/models/research
%env PYTHONPATH=/env/python/:/content/models/research/:/content/models/research/slim
%env WORK_DIR=/content/models/research/deeplab
# Set up the working directories.
%env INIT_FOLDER=/content/models/research/deeplab/datasets/ADE20K/init_models
%env TRAIN_LOGDIR=/content/models/research/deeplab/datasets/ADE20K/exp/train_on_trainval_set/train
%env EVAL_LOGDIR=/content/models/research/deeplab/datasets/ADE20K/exp/train_on_trainval_set/eval
%env EXPORT_DIR=/content/models/research/deeplab/datasets/ADE20K/exp/train_on_trainval_set/export
!mkdir -p "${INIT_FOLDER}"
!mkdir -p "${TRAIN_LOGDIR}"
!mkdir -p "${EVAL_LOGDIR}"
!mkdir -p "${EXPORT_DIR}"
# Copy locally the trained checkpoint as the initial checkpoint.
%env TF_INIT_ROOT=http://download.tensorflow.org/models
%env TF_INIT_CKPT=deeplabv3_mnv2_pascal_train_aug_2018_01_29.tar.gz
%cd /content/models/research/deeplab/datasets/ADE20K/init_models
!wget -nd -c "${TF_INIT_ROOT}/${TF_INIT_CKPT}"
!tar -xf "${TF_INIT_CKPT}"
%cd "/content/models/research/"
%env ADE20K_DATASET=/content/models/research/deeplab/datasets/ADE20K/tfrecord
print('START train.py')
%env NUM_ITERATIONS=1000
!python "${WORK_DIR}"/train.py \
--logtostderr \
--training_number_of_steps="${NUM_ITERATIONS}" \
--train_split="train" \
--model_variant="mobilenet_v2" \
--train_crop_size=513 \
--train_crop_size=513 \
--train_batch_size=4 \
--min_resize_value=350 \
--max_resize_value=500 \
--resize_factor=16 \
--fine_tune_batch_norm=False \
--dataset="ade20k" \
--initialize_last_layer=False \
--last_layers_contain_logits_only=True \
--tf_initial_checkpoint="${INIT_FOLDER}/deeplabv3_mnv2_pascal_train_aug/model.ckpt-30000" \
--train_logdir="${TRAIN_LOGDIR}" \
--dataset_dir="${ADE20K_DATASET}"
print('START eval.py')
!python "${WORK_DIR}"/eval.py \
--logtostderr \
--eval_split="val" \
--model_variant="mobilenet_v2" \
--eval_crop_size=2113 \
--eval_crop_size=2113 \
--dataset="ade20k" \
--checkpoint_dir=${TRAIN_LOGDIR} \
--eval_logdir=${EVAL_LOGDIR} \
--dataset_dir=${ADE20K_DATASET}
I try to train and evaluate deeplab model with ADE20K dataset in Google Colab.
I use as initial checkpoint mobilenetv2_coco_voc_trainaug, but I get the same error if I use xception_coco_voc_trainaug.
I see even others here #3730 has the same problem.
Can you help please?
I get error in evaluation step:
InvalidArgumentError (see above for traceback): assertion failed: [`predictions` out of bound] [Condition x < y did not hold element-wise:] [x (mean_iou/confusion_matrix/control_dependency_1:0) = ] [0 3 3...] [y (mean_iou/ToInt64_2:0) = ] [150]
Thank you for your post. We noticed you have not filled out the following field in the issue template. Could you update them if they are relevant in your case, or leave them as N/A? Thanks.
What is the top-level directory of the model you are using
Have I written custom code
OS Platform and Distribution
TensorFlow installed from
TensorFlow version
Bazel version
CUDA/cuDNN version
GPU model and memory
Exact command to reproduce
Yes sure, i just did it.
Bye
We will update the tutorial and provide a checkpoint for ADE20K soon. Please stay tuned.
hi,I have meet the same problem,and I got the solution by here,and I modify the code at line 145 in eval.py from:
metric_map = {}
metric_map[predictions_tag] = tf.metrics.mean_iou(
predictions, labels, dataset.num_classes, weights=weights)
to
metric_map = {}
# insert by trobr
indices = tf.squeeze(tf.where(tf.less_equal(
labels, dataset.num_classes - 1)), 1)
labels = tf.cast(tf.gather(labels, indices), tf.int32)
predictions = tf.gather(predictions, indices)
# end of insert
metric_map[predictions_tag] = tf.metrics.mean_iou(
predictions, labels, dataset.num_classes, weights=weights)
After that I have got the expected results.
@trobr I got the same problem, and do as you said, new error has arisen:
InvalidArgumentError (see above for traceback): indices[1940480] = 4319229436281876843 is not in [0, 2100225)
[[{{node GatherV2}} = GatherV2[Taxis=DT_INT32, Tindices=DT_INT64, Tparams=DT_INT32, _device="/job:localhost/replica:0/task:0/device:CPU:0"](Select/_4713, Squeeze/_4715, GatherV2/axis)]]
[[{{node GatherV2/_4717}} = _Recv[client_terminated=false, recv_device="/job:localhost/replica:0/task:0/device:GPU:0", send_device="/job:localhost/replica:0/task:0/device:CPU:0", send_device_incarnation=1, tensor_name="edge_780_GatherV2", tensor_type=DT_INT32, _device="/job:localhost/replica:0/task:0/device:GPU:0"]()]]
How can I solve this problem? thanks in advance;
@trobr
I tried your fix but it gives following error-
AttributeError: 'Dataset' object has no attribute 'num_classes'
@RajatGarg45, rename 'num_classes' to 'num_of_classes'
Hi There,
We are checking to see if you still need help on this, as this seems to be considerably old issue. Please update this issue with the latest information, code snippet to reproduce your issue and error you are seeing.
If we don't hear from you in the next 7 days, this issue will be closed automatically. If you don't need help on this issue any more, please consider closing this.
I had almost the same problem ici
tensorflow.python.framework.errors_impl.InvalidArgumentError: assertion failed: [labels
out of bound] [Condition x < y did not hold element-wise:] [x (mean_iou/confusion_matrix/control_dependency:0) = ] [255 255 255...] [y (mean_iou/Cast_1:0) = ] [6]
[[{{node mean_iou/confusion_matrix/assert_less/Assert/AssertGuard/Assert}}]]
I also had the same error.
tensorflow.python.framework.errors_impl.InvalidArgumentError: 2 root error(s) found.
(0) Invalid argument: assertion failed: [labels
out of bound] [Condition x < y did not hold element-wise:] [x (mean_iou/confusion_matrix/control_dependency:0) = ] [0 0 0...] [y (mean_iou/Cast_1:0) = ] [2]
[[{{node mean_iou/confusion_matrix/assert_less/Assert/AssertGuard/Assert}}]]
[[mean_iou/confusion_matrix/stack_1/_1731]]
(1) Invalid argument: assertion failed: [labels
out of bound] [Condition x < y did not hold element-wise:] [x (mean_iou/confusion_matrix/control_dependency:0) = ] [0 0 0...] [y (mean_iou/Cast_1:0) = ] [2]
[[{{node mean_iou/confusion_matrix/assert_less/Assert/AssertGuard/Assert}}]]
tensorflow.python.framework.errors_impl.InvalidArgumentError: 2 root error(s) found.
(0) Invalid argument: assertion failed: [labels
out of bound] [Condition x < y did not hold element-wise:] [x (mean_iou/confusion_matrix/control_dependency:0) = ] [0 0 0...] [y (mean_iou/Cast_1:0) = ] [2]
[[{{node mean_iou/confusion_matrix/assert_less/Assert/AssertGuard/Assert}}]]
[[ConstantFoldingCtrl/mean_iou/confusion_matrix/assert_non_negative_1/assert_less_equal/Assert/AssertGuard/Switch_0/_4462]]
(1) Invalid argument: assertion failed: [labels
out of bound] [Condition x < y did not hold element-wise:] [x (mean_iou/confusion_matrix/control_dependency:0) = ] [0 0 0...] [y (mean_iou/Cast_1:0) = ] [2]
[[{{node mean_iou/confusion_matrix/assert_less/Assert/AssertGuard/Assert}}]]
same
Most helpful comment
hi,I have meet the same problem,and I got the solution by here,and I modify the code at line 145 in eval.py from:
to
After that I have got the expected results.