Ignite: Why is epoch_length required by iterable dataset?

Created on 26 Mar 2020  路  9Comments  路  Source: pytorch/ignite

I noticed that in version 0.4.0, epoch_length is mandatory for iterable dataset. I am curious about the rational behind it, since very often we don't know the length of an iterable dataset beforehand and that's why we use them.

question

Most helpful comment

The concept of "epoch" can usually be applied only to the training phase. And literally one hour ago I ran into a situation when my validation loader was a finite iterator and I wanted to run validation over this loader just "until it is exhausted". In fact I had to calculate the exact number of batches (and I can think of situations when I couldn't do it, what would I do then?) just in order to make the run method happy. This was inconvenient. However, I don't have a good suggestion how to fix this. Probably, one may support something like epoch_length='inf'.

All 9 comments

@snie2012 let me try to explain the vision behind it.
First of all, epoch_length is just a value that defines after which iteration we need to update epoch's counter. For pytorch dataloader, epoch length is naturally defined by its lenght, but user still has a possibility to set it freely to any value.
Thus, a training run is a number of training steps defined by epoch_length * max_epochs (total number of iterations).
For infinite or unknown-length iterators, we need to set epoch_length to define the number of training steps for a single epoch and thus define the total number of iterations to run as previously by epoch_length * max_epochs.

What do you think ?

@vfdev-5 I totally agree with the motivation and I wish there is a solid approach to get the length of a dataset before training. But in many situations we don't know the length of the dataset beforehand and the responsibility is on the data iterator to return an iteration-finished signal. If we manually set an epoch_length without knowing the actual length, we will be under-using or over-using the data.

But in many situations we don't know the length of the dataset beforehand and the responsibility is on the data iterator to return an iteration-finished signal.

@snie2012 could you please some concrete examples of this and how we would like to work in such situation ?

From my point of view, any finite-unknown-length data iterator can be transformed into infinite data iterator on user side (eventually by catching StopIteration) and thus user can freely define any epoch length and work with it. What do you think ?

I would like to understand these cases and see if we should do something with that...
Thanks

@vfdev-5 I don't quite follow, maybe I am missing something.

To elaborate what I said, let's assume we have an iterable dataset with 5 batches of data, but we don't know the number of batches beforehand. In this situation, if we pass epoch_length=3, we'll only see 3 batches of data in one epoch, less than what should've seen; meanwhile if we pass epoch_length=8, we'll see 8 batches of data, more than what should've seen. In lack of the knowledge of how long the data is, we can be under-using or over-using the data in one epoch. Does this make sense or do I misunderstand the current implementation?

@snie2012 well, in any case I think we should have an idea of approximative size of the iterator.

I see two situations here:
1) finite-unknown-length data iterator is sufficiently large => complete training is to run over the iterator once.
2) finite-unknown-length data iterator is NOT sufficiently large => complete training is to run over the data iterator N times.

In both cases, we should control how many iterations is the training. To work with finite-unknown-length we can transform it into infinite data iterator and go on as usually:

from ignite.engine import Engine, Events

def finite_unk_size_data_iter():    
    for i in range(11):
        yield i

def make_inf_data_iter(data_iter_creator):
    data_iter = data_iter_creator()
    while True:
        try:
            yield next(data_iter)
        except StopIteration:
            data_iter = data_iter_creator()

engine = Engine(lambda e, b: print("{}:{} - {}".format(e.state.epoch, e.state.iteration, b)))

@engine.on(Events.GET_BATCH_STARTED)
def log_if_called1(e):
    print("{}:{} - GET_BATCH_STARTED".format(e.state.epoch, e.state.iteration))

@engine.on(Events.GET_BATCH_COMPLETED)
def log_if_called2(e):
    print("{}:{} - GET_BATCH_COMPLETED".format(e.state.epoch, e.state.iteration))    

data_iter = make_inf_data_iter(finite_unk_size_data_iter)
engine.run(data_iter, epoch_length=5, max_epochs=4);

In lack of the knowledge of how long the data is, we can be under-using or over-using the data in one epoch.

IMO, this is not a problem.

In 2) case, if data is, for example, a small video = image frames, e.g. 12345 frames, but this is unknown. We construct an infinite iterator built from the data. if I would like to train a model for 50 epochs and while defining epoch length as 10000, model will see all frames of the video 40 times and ~half of it in the last epoch. If model is not yet fully converged, 50 epochs is not sufficient for the training and we should add more.

In 1) case, if data is presented by unknown number of samples (e.g 1234567). We construct an infinite iterator. if I would like to train a model for 50 epochs and while defining epoch length as 100000, model will see all samples several times and some of them a bit more (which is not a big deal). Again, if model is not yet fully converged, 50 epochs is not sufficient for the training and we should add more.

HTH

The concept of "epoch" can usually be applied only to the training phase. And literally one hour ago I ran into a situation when my validation loader was a finite iterator and I wanted to run validation over this loader just "until it is exhausted". In fact I had to calculate the exact number of batches (and I can think of situations when I couldn't do it, what would I do then?) just in order to make the run method happy. This was inconvenient. However, I don't have a good suggestion how to fix this. Probably, one may support something like epoch_length='inf'.

@WeirdKeksButtonK yes, this make sense, thanks !
In a draft implementation of engine with epoch_length we had such option with unknown epoch_lenght that automatically determinates itself on the first StopIteration. This added a lot of other checkings and complicated behaviour of run. Well, let me think what can be possible to do in your case...

Essentially, the current implementation loses the information/control of how many times the data is actually iterated over. IMO, as @vfdev-5 mentioned, getting epoch_length automatically on the first StopIteration would be an ideal solution, since epoch_length can be helpful for purposes like visualization, but getting that length without iterating through it is challenging. Hope ignite can have that function!

I close this issue in favor of #871
Feel free to comment out there, or reopen this one if needed more support on the subject.

Was this page helpful?
0 / 5 - 0 ratings

Related issues

czotti picture czotti  路  3Comments

elanmart picture elanmart  路  4Comments

vfdev-5 picture vfdev-5  路  4Comments

samarth-robo picture samarth-robo  路  3Comments

Aiden-Jeon picture Aiden-Jeon  路  3Comments