Hi!
I was going through the code and found that the validate argument in the _non_dist_train function in mmdet/apis/train.py is not being used. I guess that needs to be incorporated in the function. Please let me know if that's the case.
Thanks!
You are right. It is currently not supported for non-distributed training to evaluate the mAP after training epochs.
Hi,
At my company we would be interested in this feature and we are willing to develop it.
Any reason why this has not been included? Any orientation in case we want to include it? Attaching the "DistEvalHook" in the same way it is done in distributed training doesn't work, that's the first thing we tried (yesterday).
We want to try a less naive approach today. Thanks
I use the distributed training but with a single machine to get validation.
./dist_train.sh config_file.py 1 --validate
Would that not be sufficient?
@ferranrigual, I worked it around by simply commenting out dist.barrier() in the DistEvalHook(Hook) class in mmdet/core/evaluation/eval_hooks.py. For a cleaner implementation, I would suggest adding single_gpu as a boolean argument in the __init__ function and then use it in the after_train_epoch function. I did that in the test.py file in the collect_results function. If anyone finds a better way, please let us know.
Hope that helps!
You are right. It is currently not supported for non-distributed training to evaluate the mAP after training epochs.
Hey! I was wondering if I could contribute to the code by simply making a few tweaks to make the code work for distributed/nondistributed training with validation. Also, something similar to the tools/test.py file. Though I'm not sure whether that will be the best way, please let me know if I should go ahead and submit a pull request. Thanks!
Just in case it helps anyone.
In mmdet/core/evaluation/eval_hooks.py, not only did I comment the dist.barrier() lines, I also replaced:
result = runner.model(return_loss=False, rescale=True, **data_gpu)
with:
result = runner.model.module(return_loss=False, rescale=True, **data_gpu)
And then I was able to train with 4 GPUs and validate. Best,
Most helpful comment
Just in case it helps anyone.
In mmdet/core/evaluation/eval_hooks.py, not only did I comment the
dist.barrier()lines, I also replaced:result = runner.model(return_loss=False, rescale=True, **data_gpu)with:
result = runner.model.module(return_loss=False, rescale=True, **data_gpu)And then I was able to train with 4 GPUs and validate. Best,