I went through the code. Does the code support only training for one epoch? Meaning in the defaults.py file MAX_ITER=40000 which is less than one epoch for coco dataset as it has more than 250k annotated images. I could, of course, change the max_iter value to something else for multiple epochs. But just wanted to check if I am understanding this correctly.
The code currently train for around 12 epochs of COCO with default values.
The reason is that the number of iterations also take into account the batch size and the number of GPUs.
So for 90k iterations with a batch size of 2 and 8 GPUs, we have 90k x 2 x 8 = 1.4M images seen
Thanks, I think it would be better if we refactor with config file having a number of epochs rather than the number of iterations since it is more natural to think in terms of a number of epochs. Let me know what do you think i can help with refactoring if you want.
I agree with you that having a number of epochs would be more natural, and that was indeed what I initially had in my early versions of the codebase.
But in order to keep compatibility with Detectron (which uses number of iterations), I changed it to follow Detectron (and Caffe) stye, so I think we won't be changing it before a potential next release
To follow up on this. What is the recommended way to train on multiple epochs on your own dataset (Mine has ~44k images)?
@Nacho114 I'd just change the number of iterations to reflect a number of epochs that you want to go through.
For example, if you have 44k images and with a global batch size of bs, here is how many iterations I'd have for num_epochs epochs
one_epoch = 44000 / bs
max_iter = one_epoch * num_epochs
Some follow up questions:
Here bs refers to SOLVER.IMS_PER_BATCH? Or to the batch_size argument for the dataloader? (This part was not very clear to me when going over the code/doc).
I tried to increase the MAX_ITER value in the cfg, but that did not seem to increase the number of iterations (past my datset size). Looking at do_train it seems that it will simply go over the whole dataset once (starting from start_iter). (I feel like I am either looking at the wrong place, or seeing something wrong here, since I do not see where max_iter would allow to go over the dataset more than once)
1 - bs is global batch size, and thus SOLVER.IMS_PER_BATCH
2 - we have a custom batch_sampler that iterates over the data as many times as we specify, see this file for more information
and thus ```SOLVER.IMS_PER_BATCH```, and thus?batch_sample, that was the missing link.sorry, I didn't finish my sentence. yes, SOLVER.IMS_PER_BATCH is what I meant as bs
Most helpful comment
Thanks, I think it would be better if we refactor with config file having a number of epochs rather than the number of iterations since it is more natural to think in terms of a number of epochs. Let me know what do you think i can help with refactoring if you want.