Hi,
I think ann_id should be initialized to 1, not 0, in convert_to_coco_api:
That's because pycocotools assumes IDs are greater than 1 when evaluating. More precisely it creates a list of matches per detection box, and uses IDs as entries in that list:
It then uses that list as boolean values, hence ID '0' is interpreted as a negative.
Also, as a separate argument, all ann['id'] values are initialized starting from 1 in their initialization code, e.g.
The consequence is almost negligible for a large dataset (you may have one more false negative) but for unit tests or for very small datasets (as our own use case) it can matter.
Looking into this a bit more, I realized loadRes() is later called and changes those IDs. I'm not quite sure how it turned out the IDs still started at 0 in our case (since we also were calling loadRes through update), but I'll close the issue for now and reopen it / open another one if I can produce a piece of code that gives an actual error (without our full codebase).
Sorry for the false alert.
Here's a very minimal script that illustrates the behavior on a one-image one-box mock dataset, mimicking what happens in detection/engine.py:
import torch
from torchvision_references.detection.coco_utils import convert_to_coco_api
from torchvision_references.detection.coco_eval import CocoEvaluator
# create dataset with a single image
im1 = torch.rand(3,600,600)
boxes1 = torch.tensor([[5.,5.,100.,100.]])
labels1 = torch.tensor([1234])
img_id1 = 987
# mock ground truth bbox
gt1 = {"boxes": boxes1.clone(),
"labels": labels1.clone(),
"area": torch.tensor([(100.-5.)*(100.-5.)]),
"image_id": torch.tensor(img_id1),
"iscrowd": torch.tensor([0])}
# mock model output with exact same box
detections1 = {"boxes": boxes1.clone(),
"labels": labels1.clone(),
"scores": torch.tensor([0.99])}
detections = {img_id1: detections1}
ground_truth_dataset = [(im1, gt1)]
coco_gt = convert_to_coco_api(ground_truth_dataset)
coco_eval = CocoEvaluator(coco_gt, iou_types=["bbox"])
coco_eval.update(detections)
coco_eval.synchronize_between_processes()
coco_eval.accumulate()
coco_eval.summarize()
I ran this with torchvision 0.4.0. The torchvision reference scripts are made importable by adding __init__.py files in the relevant directories.
If I run this with initialization ann_id = 0 in convert_to_coco_api, I get the following output:
creating index...
index created!
Accumulating evaluation results...
DONE (t=0.00s).
IoU metric: bbox
Average Precision (AP) @[ IoU=0.50:0.95 | area= all | maxDets=100 ] = 0.000
Average Precision (AP) @[ IoU=0.50 | area= all | maxDets=100 ] = 0.000
Average Precision (AP) @[ IoU=0.75 | area= all | maxDets=100 ] = 0.000
Average Precision (AP) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = -1.000
Average Precision (AP) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.000
Average Precision (AP) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = -1.000
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets= 1 ] = 0.000
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets= 10 ] = 0.000
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets=100 ] = 0.000
Average Recall (AR) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = -1.000
Average Recall (AR) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.000
Average Recall (AR) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = -1.000
And if I run it with other initializations, e.g. ann_id = 1 or ann_id = 123, I get this:
creating index...
index created!
Accumulating evaluation results...
DONE (t=0.01s).
IoU metric: bbox
Average Precision (AP) @[ IoU=0.50:0.95 | area= all | maxDets=100 ] = 1.000
Average Precision (AP) @[ IoU=0.50 | area= all | maxDets=100 ] = 1.000
Average Precision (AP) @[ IoU=0.75 | area= all | maxDets=100 ] = 1.000
Average Precision (AP) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = -1.000
Average Precision (AP) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 1.000
Average Precision (AP) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = -1.000
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets= 1 ] = 1.000
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets= 10 ] = 1.000
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets=100 ] = 1.000
Average Recall (AR) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = -1.000
Average Recall (AR) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 1.000
Average Recall (AR) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = -1.000
If I put a breakpoint after line 295 in cocoeval.py of pycocoutils (tps = np.logical_and( dtm, np.logical_not(dtIg) )) the I get the following matrix with initialization ann_id = 123:
(Pdb) tps
array([[ True],
[ True],
[ True],
[ True],
[ True],
[ True],
[ True],
[ True],
[ True],
[ True]])
but I get this with initialization ann_id = 0:
(Pdb) tps
array([[False],
[False],
[False],
[False],
[False],
[False],
[False],
[False],
[False],
[False]])
About my second message: I think ID assignment in loadRes is applied to annotations for detections, not ground truth, so it's not relevant after all.
I believe the problem only concerns the very first box, so it's very minor for most use cases, as I said. Still, the fix is simple.
Hi,
Thanks for the bug report and nice catch!
There has recently been a discussion about this in https://github.com/cocodataset/cocoapi/pull/332. I think we should fix it in torchvision by making ann_id start with 1 as you proposed, but we should keep in mind that this behavior might be changed in a future version of pycocotools.
Can you send a PR fixing it?
Sure, here you go:
https://github.com/pytorch/vision/pull/1531
I haven't run the test suite myself but I think it's being done automatically through circleci.
The issue you linked to concerns cocoapi itself. For sure it would be best if they avoided interpreting IDs as booleans, but until they do I think the fix is simple enough.
Thanks for the prompt follow-up!