Detectron2: Simple detection evaluator

Created on 16 Oct 2019  ยท  11Comments  ยท  Source: facebookresearch/detectron2

โ“ Questions and Help

General questions about detectron2.

Thanks for all the great work!
I have my own custom detection dataset(s) and a split to train/validation. I would like to run periodic evaluation during training.

I set:

cfg.DATASETS.TEST = ("car_parts/valid",)
cfg.TEST.EVAL_PERIOD = 2000

If I understand correctly I need to set MetadataCatalog.get(dataset_name).evaluator_type but not sure what to use as evaluator. I have my own get_json() method since my data is not in any usual format.
Is there a 'Simple detection evaluator'?

enhancement

Most helpful comment

there are multiple reasons for having a generic set of utility functions for evaluating the same metric:

  • transparency: it is extremely hard to reverse engineer what happens in cocoapi when trying to understand the evaluation process. Although this retrospective summary and this medium post is super helpful in this matter, I think it should be beneficial to have a straightforward implementation of the metric that is easy to see through, modify and improve.
  • flexibility: currently the end goal of the majority of the SOTAs is to improve the mAP without acknowledging any trade-off from the viewpoint of different metrics. This may hide some shortcomings of the algorithms when applied to custom datasets, that mAP is unaware of. Allowing people to easily compute multiple metrics in one pass, without reorganizing the data structure for the 100th time could help setting a new trend.
  • canonical: it is extremely frustrating that mAP scores reported can refer to VOC, COCO, or CityScapes, many papers (even useful / popular ones) pass without explicitly marking the reference implementation for that metric. In the very rare case when the source is published with trained baselines, it can be still painful to format the I/O pairs for each metric to find out which one was used exactly. OK, one might ask the authors as well, but... meh.

All 11 comments

As https://detectron2.readthedocs.io/tutorials/datasets.html#metadata-for-datasets said, evaluator_type is used by builtin datasets and you should specify the evaluator to be used in your training script.

There is currently no "simple" evaluator. So if your dataset is not in a standard format, we currently cannot evaluate it. It would be very nice to have one.

Thanks, will try to implement one if time permits.

There is currently no "simple" evaluator... It would be very nice to have one.

@ppwwyyxx
Can you check out maskrcnn-benchmark:#1104 and maskrcnn-benchmark:#1096?

I would be happy to implement something similar here.

And yeah, thanks for the supercool repo :)

https://github.com/facebookresearch/maskrcnn-benchmark/pull/1096 seems similar what we want to do here. You're welcome to contribute something alike here!

I think it's possible to make the existing COCOEvaluator support a new dataset in its __init__ directly: if the dataset is not originally in coco format, create the COCO object by getting the data from DatasetRegistry and converting the dataset dicts to COCO-format json. Then, add the json file path to the associated metadata so this conversion only happens once.

@ppwwyyxx, if I understood correctly the thing that is missing from a general "simple" evaluator is a DatasetRegistry 2 COCO-json converter, is it correct?

In the meantime I have started to modularize the mAP evaluation process that works with COCO-format json. My plan is to reproduce results from VOC, COCO and CityScapes with the same toolkit: you can find it here. So far the Precision and Recall curve computing is ready

As soon as I can validate the COCO scores with it, I'll make a PR

If you have converted all results to COCO-json, why not just use cocoapi to evaluate it?

there are multiple reasons for having a generic set of utility functions for evaluating the same metric:

  • transparency: it is extremely hard to reverse engineer what happens in cocoapi when trying to understand the evaluation process. Although this retrospective summary and this medium post is super helpful in this matter, I think it should be beneficial to have a straightforward implementation of the metric that is easy to see through, modify and improve.
  • flexibility: currently the end goal of the majority of the SOTAs is to improve the mAP without acknowledging any trade-off from the viewpoint of different metrics. This may hide some shortcomings of the algorithms when applied to custom datasets, that mAP is unaware of. Allowing people to easily compute multiple metrics in one pass, without reorganizing the data structure for the 100th time could help setting a new trend.
  • canonical: it is extremely frustrating that mAP scores reported can refer to VOC, COCO, or CityScapes, many papers (even useful / popular ones) pass without explicitly marking the reference implementation for that metric. In the very rare case when the source is published with trained baselines, it can be still painful to format the I/O pairs for each metric to find out which one was used exactly. OK, one might ask the authors as well, but... meh.

Just wanted to mention that if you follow the Collab example in this repo and you implement your own get_balloon_dicts(), you have to cast your bbox array to Python ints instead of np.uint64 because there's a problem with json.dump's serialization of numpy ints(version 1.17.2)

What I did was generate the dict for my dataset and go through all the annotations and cast the bbox parameters to Python ints, and then I was able to dump the whole dictionary.

Make sense. I think we should convert them to float before dumping the json.

done by #175

@ppwwyyxx, if I understood correctly the thing that is missing from a general "simple" evaluator is a DatasetRegistry 2 COCO-json converter, is it correct?

In the meantime I have started to modularize the mAP evaluation process that works with COCO-format json. My plan is to reproduce results from VOC, COCO and CityScapes with the same toolkit: you can find it here. So far the Precision and Recall curve computing is ready

As soon as I can validate the COCO scores with it, I'll make a PR

I was hoping to use your evaluator to get a recall curve with my COCO dataset. Just noticed though that it says "AUC evaluation done, precision recall computation is wrong", so I was a bit hesitant to try to use it. Any updates here? Thanks!

Was this page helpful?
0 / 5 - 0 ratings

Related issues

guy4261 picture guy4261  ยท  4Comments

LotharTUM picture LotharTUM  ยท  3Comments

DeepLakhani99 picture DeepLakhani99  ยท  4Comments

danielgordon10 picture danielgordon10  ยท  3Comments

invisprints picture invisprints  ยท  4Comments