Hi,
In the train.py file, the default steps is set to 10,000 and epochs is set to 50.
However, if we are training on coco, it should contain roughly 120k images, right?
So then the step should be set to 120,000.
I'm wondering what are the steps and epochs you use given that you reached the mAP mentioned in readme?
@hgaiser isn't exactly sure anymore what the training parameters were (see also #213). If someone can reproduce the results and remember the training parameters we can update the readme.
We have other priorities ourselves now though, so I don't think one of us will retrain on COCO just to add these numbers. We're adding evaluation / visualization tools, and we want to make the anchors more tweakable. But PRs are always welcome.
In addition to what @de-vri-es said:
In my opinion, the term epoch is useless. Yeah, you could set the step size to 120k for COCO and then an epoch would mean it has seen all images exactly once, but what does that mean exactly? How does that help? The only thing it is useful for is conveying to other people for how long the network has been training, but then you might as well use the number of images (or iterations) it has seen in total instead. That is, in my opinion, a much clearer metric than epochs.
Hi @hgaiser, thank you for pointing it out. I think it is an interesting question to discuss.
I think it matters in two ways:
In terms of convergence, here's a reference on stack exchange that has a good insight why that is the case.
As I understand: step * batch size = #images
epoch is a parameter independent of the above parameters.
Most helpful comment
Hi @hgaiser, thank you for pointing it out. I think it is an interesting question to discuss.
I think it matters in two ways:
In terms of convergence, here's a reference on stack exchange that has a good insight why that is the case.