Maskrcnn-benchmark: Does this train only for one epoch.

Created on 20 Nov 2018 · 9Comments · Source: facebookresearch/maskrcnn-benchmark

❓ Questions and Help

I went through the code. Does the code support only training for one epoch? Meaning in the defaults.py file MAX_ITER=40000 which is less than one epoch for coco dataset as it has more than 250k annotated images. I could, of course, change the max_iter value to something else for multiple epochs. But just wanted to check if I am understanding this correctly.

Source

kris-singh

Most helpful comment

Thanks, I think it would be better if we refactor with config file having a number of epochs rather than the number of iterations since it is more natural to think in terms of a number of epochs. Let me know what do you think i can help with refactoring if you want.

kris-singh on 20 Nov 2018

👍9 ❤4

All 9 comments

The code currently train for around 12 epochs of COCO with default values.
The reason is that the number of iterations also take into account the batch size and the number of GPUs.

So for 90k iterations with a batch size of 2 and 8 GPUs, we have 90k x 2 x 8 = 1.4M images seen

fmassa on 20 Nov 2018

👍3

kris-singh on 20 Nov 2018

👍9 ❤4

I agree with you that having a number of epochs would be more natural, and that was indeed what I initially had in my early versions of the codebase.
But in order to keep compatibility with Detectron (which uses number of iterations), I changed it to follow Detectron (and Caffe) stye, so I think we won't be changing it before a potential next release

fmassa on 20 Nov 2018

To follow up on this. What is the recommended way to train on multiple epochs on your own dataset (Mine has ~44k images)?

Wrap do_train in a loop and iterate number of epoch times?
Modify trainer.py?

Nacho114 on 6 Dec 2018

@Nacho114 I'd just change the number of iterations to reflect a number of epochs that you want to go through.

For example, if you have 44k images and with a global batch size of bs, here is how many iterations I'd have for num_epochs epochs

one_epoch = 44000 / bs
max_iter = one_epoch * num_epochs

fmassa on 6 Dec 2018

👍3

Some follow up questions:

Here bs refers to SOLVER.IMS_PER_BATCH? Or to the batch_size argument for the dataloader? (This part was not very clear to me when going over the code/doc).
I tried to increase the MAX_ITER value in the cfg, but that did not seem to increase the number of iterations (past my datset size). Looking at do_train it seems that it will simply go over the whole dataset once (starting from start_iter). (I feel like I am either looking at the wrong place, or seeing something wrong here, since I do not see where max_iter would allow to go over the dataset more than once)

Nacho114 on 6 Dec 2018

1 - bs is global batch size, and thus SOLVER.IMS_PER_BATCH
2 - we have a custom batch_sampler that iterates over the data as many times as we specify, see this file for more information

fmassa on 6 Dec 2018

👍3