Mmdetection: Validation for non-distributed training

Created on 31 Jul 2019 · 6Comments · Source: open-mmlab/mmdetection

Hi!

I was going through the code and found that the validate argument in the _non_dist_train function in mmdet/apis/train.py is not being used. I guess that needs to be incorporated in the function. Please let me know if that's the case.

Thanks!

Source

dhananjaisharma10

Most helpful comment

Just in case it helps anyone.

In mmdet/core/evaluation/eval_hooks.py, not only did I comment the dist.barrier() lines, I also replaced:
result = runner.model(return_loss=False, rescale=True, **data_gpu)
with:
result = runner.model.module(return_loss=False, rescale=True, **data_gpu)

And then I was able to train with 4 GPUs and validate. Best,

ferranrigual on 12 Aug 2019

👍3

All 6 comments

You are right. It is currently not supported for non-distributed training to evaluate the mAP after training epochs.

hellock on 31 Jul 2019

👍2

Hi,

At my company we would be interested in this feature and we are willing to develop it.

Any reason why this has not been included? Any orientation in case we want to include it? Attaching the "DistEvalHook" in the same way it is done in distributed training doesn't work, that's the first thing we tried (yesterday).

We want to try a less naive approach today. Thanks

ferranrigual on 1 Aug 2019

I use the distributed training but with a single machine to get validation.

./dist_train.sh config_file.py 1 --validate

Would that not be sufficient?

dmarnerides on 1 Aug 2019

👎1

@ferranrigual, I worked it around by simply commenting out dist.barrier() in the DistEvalHook(Hook) class in mmdet/core/evaluation/eval_hooks.py. For a cleaner implementation, I would suggest adding single_gpu as a boolean argument in the __init__ function and then use it in the after_train_epoch function. I did that in the test.py file in the collect_results function. If anyone finds a better way, please let us know.

Hope that helps!

dhananjaisharma10 on 1 Aug 2019

You are right. It is currently not supported for non-distributed training to evaluate the mAP after training epochs.

Hey! I was wondering if I could contribute to the code by simply making a few tweaks to make the code work for distributed/nondistributed training with validation. Also, something similar to the tools/test.py file. Though I'm not sure whether that will be the best way, please let me know if I should go ahead and submit a pull request. Thanks!

dhananjaisharma10 on 2 Aug 2019

👍3

Just in case it helps anyone.

And then I was able to train with 4 GPUs and validate. Best,

ferranrigual on 12 Aug 2019

👍3

Was this page helpful?

0 / 5 - 0 ratings