Keras: fit generator not calling len on new epoch, also doesn't call on_batch_begin or on_batch_end . Update: how to have a varying steps_per_epoch for a fit_generator

Created on 6 Jul 2018 · 10Comments · Source: keras-team/keras

Hi, i'm trying to change my data in the fit_generator at each epoch.
The problem i'm having is that __ len __, which returns the batches for each epoch, is only called once at the beginning, even though the documentation states that the dataset can be modified at each epoch_end.
Nevertheless, i've tried to hard code the data change by implementing checks at on_batch_begin and on_batch_end, but those are just not called!
Here's the generator with only printing and dummy inputs to show the issue:

class DataGenerator(keras.utils.Sequence): (the code is properly indented, just missing here)
def __init__(self,inputs_all):
       print('init')
       self.inputs_all = inputs_all
       self.inputs_buck = []

def __len__(self):
    print('len')
    return 1

def on_train_end(self):
    print('train end')

def __getitem__(self,index):
    print('getting item')
    self.inputs_buck = {'the_input': self.inputs_all['the_input'][0],
              'the_labels': self.inputs_all['the_labels'][0],
              'input_length': self.inputs_all['input_length'][0],
              'label_length': self.inputs_all['label_length'][0]
              }
    batch_ = {'the_input': self.inputs_buck['the_input'],
              'the_labels': self.inputs_buck['the_labels'],
              'input_length': self.inputs_buck['input_length'],
              'label_length': self.inputs_buck['label_length']
              }
    batch = batch_
    y={'ctc': np.zeros([len(batch_['the_input'])])}
    return batch_,y

def on_batch_begin(self):
    print('batch start')

def on_batch_end(self):
    print('batch end')

def on_epoch_end(self):
    print('epoch end')   `

Which has the output:

input length: Tensor("ctc/strided_slice:0", shape=(1,), dtype=int64)  label length: Tensor("ctc/strided_slice_1:0", shape=(1,), dtype=int64)
init
len
len
Epoch 1/10
getting item
epoch end
getting item

As you can see, on_batch_begin and on_batch_end is not called at all and __ len __ is only called once.
In my full code i can change data at epoch_end since that is called, but since __ len __ isn't called again i'm stuck with the same number of batches as the very first data slice had.
The docu is very unclear on this, but hints that it should work:

Every Sequence must implement the __getitem__ and the __len__ methods. If you want to modify your dataset between epochs you may implement on_epoch_end. The method __getitem__ should return a complete batch.

the only way this would work is if __ len __ is called every epoch. i tried manually calling it at epoch_end but it didn't change the batches calculated.
And it's not any asynchronicity between GPU and CPU outputs, i've run multiple epochs on my full code and it always sticks to the first number of batches calculated.

Any clue as to what's happening?

Thanks for the help,
Nic

I'm on Keras 2.2.0, Tensorflow-gpu 1.8.0

Source

Incblob

All 10 comments

Sequences are not Callbacks. Only on_epoch_end is called. This is used primarily to shuffle your data between epochs. The data in itself should not change.

Dref360 on 6 Jul 2018

👍1

Thanks Dref :) , is there a generator class or something that i can use instead? I'm writing my own Python generator but if there's a 'official' keras implementation i can use then i'd go with that.
Or do i write a Callback Class? i didn't think those had access to the data.
Thanks

Incblob on 6 Jul 2018

generators are unsafe, but we use python generator.

I would propose that you find a way such that the number of batches doesn't change between epochs. This is highly unusual.

Dref360 on 6 Jul 2018

Yeah, now i'm having the same problem, in that even with a python generator i still have to define Steps_per_epoch :p
What i have is a dataset of images of different lengths, i set up the model with a None horizontal dimension, so it's fine with training on them, but since they are np arrays, they can't have varying lengths without padding.
I've implemented a bucketing algorithm that groups images of the same lengths, so i can feed the model a bucket at a time without having to pad the images.
The buckets have different sizes obviously, necessitating the use of different number of batches.
i can compile the model for every bucket and set the number of batches depending on that, but this is slow, so i was hoping to somehow access and modify the number of batches per epoch.
Could i queue the compiling of the model? or would that just break tensorflow?

Incblob on 6 Jul 2018

Well, i've managed to trick it into working :P

for count,steps in  enumerate( int(np.floor( len(x) / batch_size)) for x in inputs[buckets i want to use]):
    generator_data = generate_data(inputs,count+(bucket i want to start at) ,count+(bucket i want to start at +1),batch_size,epochs)
    history = model.fit_generator(generator = generator_data, steps_per_epoch=steps, epochs=5, verbose=1)

i've added another for ep in range(epochs) outer loop into my generator
now the stepsize is calculated in the steps variable and the generator returns epochs x total_batches

Incblob on 6 Jul 2018

I also like to do this in Sequence.
As my accuracy increases, I would like to predict on noisy set, and bring in more samples from the noisy set into the training set.
I tried to modify self.params['steps'] = self.train_gen.__len__() in custom callback. It only trains the initial steps size even though the log shows my steps size increased. The log shows like 120/200, and moves onto the next epoch.
Only way I could figure out is to alter the batch size and keep the steps size the same as the training set gets larger.
It would be great to be able to keep the batch size the same.

jsl303 on 4 Jun 2019

👍1

@jsl303 Well the batch size is related to the steps, at least as far as i remember. It's the number of samples in your Dataset divided by the steps. So if you want the same number of steps and batch size, you have to keep your dataset the same length. or implement some sort of early stopping for the steps. but it's probably easier to replace normal samples with noisy ones as you go along.

Incblob on 4 Jun 2019

Yes, but if fit_generator calls __len__ before each epoch (not just
before epoch 1), and updates the step size, this will solve the problem.

Then we can modify the steps size as training progresses as the OP was
asking.

jsl303 on 4 Jun 2019

👍1

@jsl303 correct, i though you wanted the step and batch size to remain the same :P

Incblob on 4 Jun 2019

In Keras 2.2.5 in training_generator.py file add following on the line 260 (end of while epoch < epochs: loop):

            steps_per_epoch = len(generator)
            callbacks.set_params({
                'epochs': epochs,
                'steps': steps_per_epoch,
                'verbose': verbose,
                'do_validation': do_validation,
                'metrics': callback_metrics,
            })

__len__ will then get called correctly on each epoch end. (Worked like charm for me)