Models: [Object Detection] Evaluation procedure throwing negative values for AP and AR

Created on 10 Jan 2019 · 9Comments · Source: tensorflow/models

System information

What is the top-level directory of the model you are using: models/research/object_detection/
Have I written custom code (as opposed to using a stock example script provided in TensorFlow): Yes
OS Platform and Distribution (e.g., Linux Ubuntu 16.04): Ubuntu
TensorFlow installed from (source or binary): Source
TensorFlow version (use command below): 1.9.0
Bazel version (if compiling from source):
CUDA/cuDNN version:
GPU model and memory: NVIDIA K80 GPUs
Exact command to reproduce: python /home/ubuntu/data/tensorflow/models/research/object_detection/metrics/offline_eval_map_corloc.py \ --eval_dir='/home/ubuntu/data/tensorflow/my_workspace/training_demo/test_eval_metrics' \ --eval_config_path='/home/ubuntu/data/tensorflow/my_workspace/training_demo/test_eval_metrics/test_eval_config.pbtxt' \ --input_config_path='/home/ubuntu/data/tensorflow/my_workspace/training_demo/test_eval_metrics/test_input_config.pbtxt'

Describe the problem

I fine-tuned the faster_rcnn_resnet101 model available on the model zoo. I had used the train and evaluation dataset for the train process. On the tensorboard I was monitoring the model performance on the metrics - mAP and AR. Once I get the fine-tuned model, I want to evaluate the performance of a test dataset that the model has not seen.

I found followed this documentation on offline evaluation but for my dataset - https://github.com/tensorflow/models/blob/master/research/object_detection/g3doc/oid_inference_and_evaluation.md. Here are the steps that I followed:

Created the TFRecord for the test dataset with similar fields as in the TFRecord for the train dataset
Ran the inference on this test dataset using the query:
python /home/ubuntu/data/tensorflow/models/research/object_detection/inference/infer_detections.py \ --input_tfrecord_paths=$TF_RECORD_FILES \ --output_tfrecord_path='/home/ubuntu/data/tensorflow/my_workspace/training_demo/Predictions/train.record' \ --inference_graph=$OUTPUT_INFERENCE_GRAPH \ --discard_image_pixels
This creates bounding box predictions for the test data.
I then evaluate the detection bounding boxes using this:
`
echo "
label_map_path: '/home/ubuntu/data/tensorflow/my_workspace/training_demo/annotations/label_map.pbtxt'
tf_record_input_reader: { input_path: '/home/ubuntu/data/tensorflow/my_workspace/training_demo/Predictions/train.record' }
" > /home/ubuntu/data/tensorflow/my_workspace/training_demo/test_eval_metrics/test_input_config.pbtxt

echo "
metrics_set: 'coco_detection_metrics'
" > /home/ubuntu/data/tensorflow/my_workspace/training_demo/test_eval_metrics/test_eval_config.pbtxt

python /home/ubuntu/data/tensorflow/models/research/object_detection/metrics/offline_eval_map_corloc.py \
--eval_dir='/home/ubuntu/data/tensorflow/my_workspace/training_demo/test_eval_metrics' \
--eval_config_path='/home/ubuntu/data/tensorflow/my_workspace/training_demo/test_eval_metrics/test_eval_config.pbtxt' \
--input_config_path='/home/ubuntu/data/tensorflow/my_workspace/training_demo/test_eval_metrics/test_input_config.pbtxt'`

This whole thing (step 1, 2, 3) works perfect however, I see negative values (-1.0) for some mAP and AR.

Here is the output of the evaluation (on train, eval and test dataset) that I ran using the above queries:

Evaluation on test data

evaluation on eval.record

evaluation on train.record

I am not sure why I see -1.0 in the AP and AR when I do have the correct label map and bounding boxes of small, medium and large sizes available in my dataset.

Source

Manish-rai21bit

Most helpful comment

Hi @typical-byte-world,

-1.00 is the default value for when the ground truth is missing for that particular bucket. Since, you are getting all -1.00, I would recommend checking the ground truth data. It could be an error with the box area, or the labels.

Manish-rai21bit on 9 Aug 2019

👍2

All 9 comments

With a couple of unit tests and investigations points to the use of wrong category mapping (label map) in the data.
For example, if the label map does not contain a class 4 but due to error in the data there is a class 4 in the ground truth then the values of metrics will be -1.0.

Manish-rai21bit on 12 Jan 2019

👍1

@Manish-rai21bit could you elaborate on your fix? For me the area= small values are negativ. My labelmap file looks like this:
item {
id: 1
name: 'Pig'
}

snphnolt on 5 May 2019

@Manish-rai21bit @snphnolt Did you figure out what was the issue? During training I can see pretty nice accuracies for the different sizes (Large, medium and small) but during test on a dataset, I only got accuracy values for the size small. Medium and Large are both -1.000.

paviddavid on 30 Jul 2019

I have almost the same problem, only worse My metrics always -1

typical-byte-world on 9 Aug 2019

Hi @typical-byte-world,

Manish-rai21bit on 9 Aug 2019

👍2

@Manish-rai21bit thank for your answer. Can you help me, for example my coordinates 0.394 0.388 0.413 0.087. Should I multiply them by 100?

typical-byte-world on 9 Aug 2019

I am assuming that "0.394 0.388 0.413 0.087" are the normalised coordinates. In that case, you should multiple the normalised x and y coordinates of the box by the image width and height respectively.

Manish-rai21bit on 9 Aug 2019

@Manish-rai21bit When I train, I get -1, but when I convert the model and try to manually predict it, it predicts 20-30 boxes for me throughout the picture. In the configuration file there is a parameter to resize the image, does it automatically calculate the coordinates? I just have different image sizes. One part 320x240, another 3000x2000, does it matter? Thank

typical-byte-world on 10 Aug 2019

@typical-byte-world coordinates should be in pixels not in percent.
Just multiply your values by w,h for each image separately, since you have different sizes.
I'm resizing images prior to training, to reduce GPU memory usage and increase batch size.
But if you do so you need to do data augmentation before training.