Keras: Unexpected behaviour with steps_per_epoch leads to Out of Memory when using Sequence with workers>0 in fit_generator

Created on 19 Aug 2019  路  2Comments  路  Source: keras-team/keras

Please make sure that this is a Bug or a Feature Request and provide all applicable information asked by the template.
If your issue is an implementation question, please ask your question on StackOverflow or on the Keras Slack channel instead of opening a GitHub issue.

System information

  • OS Platform and Distribution (e.g., Linux Ubuntu 16.04): MacOS HighSierra 10.13.6 and 16.04.1-Ubuntu
  • TensorFlow backend (yes / no): yes
  • TensorFlow version: 1.13.1
  • Keras version: 2.1.5
  • Python version: 3.6.8

You can obtain the TensorFlow version with:
python -c "import tensorflow as tf; print(tf.GIT_VERSION, tf.VERSION)"
You can obtain the Keras version with:
python -c 'import keras as k; print(k.__version__)'

Describe the current behavior

The __len__ method of the keras.Sequence is used when model.fit_generator is called with workers>0 (default is workers=1). It causes a memory issue when building the queue for Sequence with big len.

This is caused by the queue initialization in keras.utils.data_utils:549 where the queue is initialized with as many items as the sequence could return.

Describe the expected behavior

model.fit_generator takes as another argument steps_per_epoch so I would expect the actual len of the Sequence (as to be used in keras.utils.data_utils:549 and elsewhere) be min(len(self.sequence), steps_per_epoch).

Code to reproduce the issue
Provide a reproducible test case that is the bare minimum necessary to generate the problem.

MWE:

from keras.layers import Dense, GlobalMaxPooling2D, Input
from keras.models import Model
from keras.utils import Sequence

# %% Test with MWE
x = Input((224, 224, 3))
y = GlobalMaxPooling2D()(x)
y = Dense(1, activation='sigmoid')(y)
model = Model(x, y)


class CustomSequence(Sequence):
    def __len__(self):
        return 1000000

    def __getitem__(self, index):
        return


train_sequence = CustomSequence()
model.compile('adam', 'mse')

# this stays stuck
model.fit_generator(train_sequence, steps_per_epoch=100)

# this will work
model.fit_generator(train_sequence, steps_per_epoch=100, workers=0)

Other info / logs
Include any logs or source code that would be helpful to diagnose the problem. If including tracebacks, please include the full traceback. Large logs and files should be attached.

tensorflow buperformance

Most helpful comment

Note : this is still the case with tensorflow 2.0beta, see keras.utils.data_utils:742 : sequence = list(range(len(self.sequence)))

All 2 comments

Note : this is still the case with tensorflow 2.0beta, see keras.utils.data_utils:742 : sequence = list(range(len(self.sequence)))

Hi! Do you guys plan to work on a proper fix for this issue? For now I have been able to patch the fit_generator methods in my scripts with the following utils:

from functools import wraps
from unittest.mock import patch


def patch_len(fit_generator):
    """
    Patch __len__ method of generator to return steps_per_epoch args of keras.fit_generator instead of actual len. This is to prevent queues to be initialized with way to many items.
    """
    @wraps(fit_generator)
    def fit_generator_patch_len(*args, **kwargs):
        generator = args[1]
        steps_per_epoch = kwargs.get('steps_per_epoch', len(generator))
        patch_train_sequence_len = patch.object(generator.__class__, '__len__', return_value=steps_per_epoch)

        validation_data = kwargs.get('validation_data', [])
        validation_steps = kwargs.get('validation_steps', len(validation_data))
        patch_val_sequence_len = patch.object(validation_data.__class__, '__len__', return_value=validation_steps)

        patch_train_sequence_len.start()
        if validation_steps:
            patch_val_sequence_len.start()

        history = fit_generator(*args, **kwargs)

        patch_train_sequence_len.stop()
        if validation_steps:
            patch_val_sequence_len.stop()

        return history

    return fit_generator_patch_len

so that in my scripts I have


patch_fit_generator = patch(
    'tensorflow.keras.Model.fit_generator',
    side_effect=patch_len(Model.fit_generator),
)
patch_fit_generator.start()

Model.fit_generator(  # instead of model.fit_generator to use patched fit_generator
    model,
    train_sequence,
    validation_data=val_sequence,
    callbacks=callbacks,
    epochs=100,
    steps_per_epoch=1000,
    validation_steps=200,
    use_multiprocessing=True,
)
Was this page helpful?
0 / 5 - 0 ratings