Models: [Deeplab] Quantization-aware training failed

Created on 10 May 2019 · 3Comments · Source: tensorflow/models

System information

What is the top-level directory of the model you are using:
Have I written custom code (as opposed to using a stock example script provided in TensorFlow):
OS Platform and Distribution (e.g., Linux Ubuntu 16.04): Linux Ubuntu16.04
TensorFlow installed from (source or binary): build tensorflow 1.12 from source
TensorFlow version (use command below): 1.12.0
Bazel version (if compiling from source):
CUDA/cuDNN version: 7.2
GPU model and memory: MobilenetV2-Deeplabv3+
Exact command to reproduce: quantized training used in the quantized.md

# From tensorflow/models/research/
python deeplab/train.py \
    --logtostderr \
    --training_number_of_steps=3000 \
    --train_split="train" \
    --model_variant="mobilenet_v2" \
    --output_stride=16 \
    --train_crop_size="513,513" \
    --train_batch_size=8 \
    --base_learning_rate=3e-5 \
    --dataset="pascal_voc_seg" \
    --initialize_last_layer \
    --quantize_delay_step=0 \
    --tf_initial_checkpoint=${PATH_TO_TRAINED_FLOAT_MODEL} \
    --train_logdir=${PATH_TO_TRAIN_DIR} \
    --dataset_dir=${PATH_TO_DATASET}

You can collect some of this information using our environment capture script:

https://github.com/tensorflow/tensorflow/tree/master/tools/tf_env_collect.sh

You can obtain the TensorFlow version with

python -c "import tensorflow as tf; print(tf.GIT_VERSION, tf.VERSION)"
('v1.12.0-5307-ge0aa938', '1.12.0')

Describe the problem

deeplab quantize failed when using script in the quantize.md for pascal voc mobilenet backbone.

Key MobilenetV2/Conv/act_quant/MobilenetV2/Conv/act_quant/max/biased not found in checkpoint
     [[node save/RestoreV2 (defined at /dl_framework/models/research/deeplab/train.py:505) ]]
     [[save/RestoreV2_1/_499]]

@YknZhu

Source

austingg

Most helpful comment

@Yknzhu， thanks for you patience. I have found the problem. The train_logdir should be empty, should not be the same dir with the TRAIN_FLOAT_MODEL.

austingg on 11 May 2019

👍4

All 3 comments

Hmm seems I cannot reproduce your error by running the same commandline. The problem here is due to quantization variables are not found in checkpoint, but we will initialize such variables at global step 0 anyway. Could you check ${PATH_TO_TRAIN_DIR} is empty?

YknZhu on 10 May 2019

@YknZhu is the name of the missing variable right? There are double MobilenetV2/Conv/act_quant prefix in the variable's name. I have checked the tf_initial_checkpoint.

austingg on 11 May 2019

@Yknzhu， thanks for you patience. I have found the problem. The train_logdir should be empty, should not be the same dir with the TRAIN_FLOAT_MODEL.

austingg on 11 May 2019

👍4

Was this page helpful?

0 / 5 - 0 ratings