Models: Problem with image decoding in Tensorflow 2

Created on 16 Aug 2020 · 9Comments · Source: tensorflow/models

Prerequisites

Please answer the following questions for yourself before submitting an issue.

[Y ] I am using the latest TensorFlow Model Garden release and TensorFlow 2.
[Y ] I am reporting the issue to the correct repository. (Model Garden official or research directory)
[Y ] I checked to make sure that this issue has not already been filed.

1. The entire URL of the file you are using

https://github.com/tensorflow/models/tree/master/research/object_detection

2. Describe the bug

I am trying to train the model using Tensorflow 2. The pipeline seems to work (training starts and the training process seems to be running), but I noticed a disturbing symptom - in Tensorboard the preview of images is incorrect - they look like they are badly decoded (color values truncated to 0 and 1 - example below).

From what I remember - when I was using Object Detection API with TF1, the preview displayed "normal" images.
I am not sure if this is a bug related to Tesnsorboard visualization only or if the training pipeline does not work as it should and the images are loaded incorrectly. Or maybe I am making a configuration mistake?

3. Steps to reproduce

Generate tfrecords for the PASCAL VOC file according to the instructions in the https://github.com/tensorflow/models/blob/master/research/object_detection/g3doc/preparing_inputs.md (to make sure that the data is not the problem).
Create a config based on any of the available configs, e.g. faster_rcnn_resnet50_v1_fpn_640x640_coco17_tpu-8.config (changenum_classes, use_bfloat16, fine_tune_checkpoint and paths to generated tfrecords and appropriate label_map in input_readers)
Start local training

# From the tensorflow/models/research/ directory
PIPELINE_CONFIG_PATH={path to pipeline config file}
MODEL_DIR={path to model directory}
python object_detection/model_main_tf2.py \
    --pipeline_config_path=${PIPELINE_CONFIG_PATH} \
    --model_dir=${MODEL_DIR} \
    --alsologtostderr

Launch Tensorboard

tensorboard --logdir=${MODEL_DIR}

4. Expected behavior

I expect to see properly decoded images.

5. Additional context

Suspect part of the training log (but I'm not sure if it has anything to do with the issue):

WARNING:tensorflow:num_readers has been reduced to 1 to match input file shards.
W0816 20:24:48.906076 140309352765248 dataset_builder.py:83] num_readers has been reduced to 1 to match input file shards.
WARNING:tensorflow:The operation `tf.image.convert_image_dtype` will be skipped since the input and output dtypes are identical.
W0816 20:24:48.957060 140309352765248 image_ops_impl.py:2018] The operation `tf.image.convert_image_dtype` will be skippedsince the input and output dtypes are identical.
WARNING:tensorflow:The operation `tf.image.convert_image_dtype` will be skipped since the input and output dtypes are identical.
W0816 20:24:48.960490 140309352765248 image_ops_impl.py:2018] The operation `tf.image.convert_image_dtype` will be skippedsince the input and output dtypes are identical.
WARNING:tensorflow:The operation `tf.image.convert_image_dtype` will be skipped since the input and output dtypes are identical.
W0816 20:24:48.966012 140309352765248 image_ops_impl.py:2018] The operation `tf.image.convert_image_dtype` will be skippedsince the input and output dtypes are identical.
WARNING:tensorflow:The operation `tf.image.convert_image_dtype` will be skipped since the input and output dtypes are identical.
W0816 20:24:48.971490 140309352765248 image_ops_impl.py:2018] The operation `tf.image.convert_image_dtype` will be skippedsince the input and output dtypes are identical.

6. System information

OS Platform and Distribution (e.g., Linux Ubuntu 16.04): Linux Ubuntu 18.04.4 (Nvidia NGC Tensorflow 20.06 docker image)
Mobile device name if the issue happens on a mobile device: NA
TensorFlow installed from (source or binary): source (Nvidia NGC Tensorflow 20.06 docker image)
TensorFlow version (use command below): 2.2.0
Python version: 3.6
Bazel version (if compiling from source): NA (Nvidia NGC Tensorflow 20.06 docker image)
GCC/Compiler version (if compiling from source): NA (Nvidia NGC Tensorflow 20.06 docker image)
CUDA/cuDNN version: CUDA 11.0 / cuDNN 8.0.1.13
GPU model and memory: GeForce GTX1080 8GB

research bug

Source

andrusza2

👍10

Most helpful comment

This seems to be a problem of visualization and not a training issue. I had a similar problem due to a different image data scaling: range (-1, 1) instead of (0, 1) as required by tf.summary.image.

I suggested a possible solution in #9019.

Moritz-Weisenboehler on 20 Aug 2020

👍2

All 9 comments

this is happening with my data as well

I had been successfully training on the TF1 pipeline for the last 8 months with these tfrecords.
When I evaluated the TF2 pipeline to see if it was ready, I get extremely low loss in my training session, but terrible performance.

Jconn on 18 Aug 2020

This seems to be a problem of visualization and not a training issue. I had a similar problem due to a different image data scaling: range (-1, 1) instead of (0, 1) as required by tf.summary.image.

I suggested a possible solution in #9019.

Moritz-Weisenboehler on 20 Aug 2020

👍2

Hi, can you guys help me with steps to train deeplab on a custom dataset using tf2, just the basic outline, I'm still a beginner and I don't know where to start !
sorry my comment is not to answer the question but I would really appreciate the help
@Jconn @andrusza2

Alloooshe on 21 Aug 2020

👎2

Like @Moritz-Weisenboehler already mentioned, this is due the fact, that for tensorboard visualization, the image needs to be scaled into 0 to 1 values. This should solve your problem.

CptK1ng on 25 Aug 2020

@Moritz-Weisenboehler @CptK1ng sorry to bother... when do you scale your images into 0 to 1 values? I assume you are writing images into TFRecord files as jpg byte streams?

Are you using the NormalizeImage data augmentation setting? Or something a bit more explicit?

Thanks in advance :)

mackdelany on 2 Sep 2020

@mackdelany There are 2 possible steps in your program where normalization could be applied:

Generating and preparing your train and val data. For example if you read images from disk into numpy arrays, you can directly apply a normalization.
In your training pipeline. If you use tf keras, you can normalize your inputs after the "Input" declaration.

A simple and yet effective way of scaling images into [0,1]could be Rescaling the the RGB values by a factor of 1/255 with https://www.tensorflow.org/api_docs/python/tf/keras/layers/experimental/preprocessing/Rescaling?hl=de

rescaling = Rescaling(scale=1.0 / 255) (input)

CptK1ng on 3 Sep 2020

@CptK1ng is this requirement new to tensorflow 2? I have had no problems with visualization of my tfrecord files using tensorboard/tf 1.14 and 1.15.

Edit: My image channels are being scaled from 0 to 1 before they're written into the tfrecord, there is another issue here.

Jconn on 7 Sep 2020

@Jconn To the best of my knowledge, this restriction was already present in tf 1.x. I see no issue with your scaling.

CptK1ng on 14 Sep 2020

I'm facing the same problem. I think it isn't a specific training data scaling issue because the eval images are displaying and evaluating fine and my tfrecords images aren't scaled. I used the same tfrecord in tf1 version too, but I can't say the problem was the same because in tf1 version my training images wasn't shown anyway.