Maskrcnn-benchmark: Does this train only for one epoch.

Created on 20 Nov 2018  ยท  9Comments  ยท  Source: facebookresearch/maskrcnn-benchmark

โ“ Questions and Help

I went through the code. Does the code support only training for one epoch? Meaning in the defaults.py file MAX_ITER=40000 which is less than one epoch for coco dataset as it has more than 250k annotated images. I could, of course, change the max_iter value to something else for multiple epochs. But just wanted to check if I am understanding this correctly.

Most helpful comment

Thanks, I think it would be better if we refactor with config file having a number of epochs rather than the number of iterations since it is more natural to think in terms of a number of epochs. Let me know what do you think i can help with refactoring if you want.

All 9 comments

The code currently train for around 12 epochs of COCO with default values.
The reason is that the number of iterations also take into account the batch size and the number of GPUs.

So for 90k iterations with a batch size of 2 and 8 GPUs, we have 90k x 2 x 8 = 1.4M images seen

Thanks, I think it would be better if we refactor with config file having a number of epochs rather than the number of iterations since it is more natural to think in terms of a number of epochs. Let me know what do you think i can help with refactoring if you want.

I agree with you that having a number of epochs would be more natural, and that was indeed what I initially had in my early versions of the codebase.
But in order to keep compatibility with Detectron (which uses number of iterations), I changed it to follow Detectron (and Caffe) stye, so I think we won't be changing it before a potential next release

To follow up on this. What is the recommended way to train on multiple epochs on your own dataset (Mine has ~44k images)?

  • Wrap do_train in a loop and iterate number of epoch times?
  • Modify trainer.py?

@Nacho114 I'd just change the number of iterations to reflect a number of epochs that you want to go through.

For example, if you have 44k images and with a global batch size of bs, here is how many iterations I'd have for num_epochs epochs

one_epoch = 44000 / bs
max_iter = one_epoch * num_epochs

Some follow up questions:

  1. Here bs refers to SOLVER.IMS_PER_BATCH? Or to the batch_size argument for the dataloader? (This part was not very clear to me when going over the code/doc).

  2. I tried to increase the MAX_ITER value in the cfg, but that did not seem to increase the number of iterations (past my datset size). Looking at do_train it seems that it will simply go over the whole dataset once (starting from start_iter). (I feel like I am either looking at the wrong place, or seeing something wrong here, since I do not see where max_iter would allow to go over the dataset more than once)

1 - bs is global batch size, and thus SOLVER.IMS_PER_BATCH
2 - we have a custom batch_sampler that iterates over the data as many times as we specify, see this file for more information

  1. and thus ```SOLVER.IMS_PER_BATCH```, and thus?
  2. I see, I'll check out the batch_sample, that was the missing link.

sorry, I didn't finish my sentence. yes, SOLVER.IMS_PER_BATCH is what I meant as bs

Was this page helpful?
0 / 5 - 0 ratings

Related issues

BelhalK picture BelhalK  ยท  4Comments

botcs picture botcs  ยท  3Comments

CF2220160244 picture CF2220160244  ยท  3Comments

jbitton picture jbitton  ยท  4Comments

zimenglan-sysu-512 picture zimenglan-sysu-512  ยท  3Comments