Keras-retinanet: steps and epochs

Created on 15 Jan 2018  路  4Comments  路  Source: fizyr/keras-retinanet

Hi,

In the train.py file, the default steps is set to 10,000 and epochs is set to 50.
However, if we are training on coco, it should contain roughly 120k images, right?
So then the step should be set to 120,000.

I'm wondering what are the steps and epochs you use given that you reached the mAP mentioned in readme?

Most helpful comment

Hi @hgaiser, thank you for pointing it out. I think it is an interesting question to discuss.
I think it matters in two ways:

  1. Saying that we've trained our network for x epochs directly tells the reader how many times we run through the entire dataset, which might be different in size for readers.
  2. It matters in terms of convergence. That is, random reshuffling (randomly shuffle first and then sequentially sample as used in epoch) outperforms random sampling with replacement.

In terms of convergence, here's a reference on stack exchange that has a good insight why that is the case.

All 4 comments

@hgaiser isn't exactly sure anymore what the training parameters were (see also #213). If someone can reproduce the results and remember the training parameters we can update the readme.

We have other priorities ourselves now though, so I don't think one of us will retrain on COCO just to add these numbers. We're adding evaluation / visualization tools, and we want to make the anchors more tweakable. But PRs are always welcome.

In addition to what @de-vri-es said:

In my opinion, the term epoch is useless. Yeah, you could set the step size to 120k for COCO and then an epoch would mean it has seen all images exactly once, but what does that mean exactly? How does that help? The only thing it is useful for is conveying to other people for how long the network has been training, but then you might as well use the number of images (or iterations) it has seen in total instead. That is, in my opinion, a much clearer metric than epochs.

Hi @hgaiser, thank you for pointing it out. I think it is an interesting question to discuss.
I think it matters in two ways:

  1. Saying that we've trained our network for x epochs directly tells the reader how many times we run through the entire dataset, which might be different in size for readers.
  2. It matters in terms of convergence. That is, random reshuffling (randomly shuffle first and then sequentially sample as used in epoch) outperforms random sampling with replacement.

In terms of convergence, here's a reference on stack exchange that has a good insight why that is the case.

As I understand: step * batch size = #images
epoch is a parameter independent of the above parameters.

Was this page helpful?
0 / 5 - 0 ratings

Related issues

deep-diver picture deep-diver  路  6Comments

KHBillel picture KHBillel  路  3Comments

mayur-who picture mayur-who  路  5Comments

wassname picture wassname  路  6Comments

CedarYang picture CedarYang  路  5Comments