Keras: Callback self.validation_data is None, when fit_generator is used

Created on 19 Jun 2018  Â·  33Comments  Â·  Source: keras-team/keras

Related: https://github.com/keras-team/keras/issues/2702

  • [x] Check that you are up-to-date with the master branch of Keras. You can update with:
    pip install git+git://github.com/keras-team/keras.git --upgrade --no-deps

  • [x] If running on TensorFlow, check that you are up-to-date with the latest version. The installation instructions can be found here.

  • [x] If running on Theano, check that you are up-to-date with the master branch of Theano. You can update with:
    pip install git+git://github.com/Theano/Theano.git --upgrade --no-deps

  • [x] Provide a link to a GitHub Gist of a Python script that can reproduce your issue (or just copy the script here if it is short).

Every epoch I like to run this Callback, to see how my model is performing:

class SensitivitySpecificityCallback(Callback):
    def on_epoch_end(self, epoch, logs=None):
        if epoch:
            print('SensitivitySpecificityCallback::validation_data:',self.validation_data)
            x_test = self.validation_data[0]
            y_test = self.validation_data[1]
            predictions = self.model.predict(x_test)
            output_sensitivity_specificity(epoch, predictions, y_test)

def output_sensitivity_specificity(epoch, predictions, y_test):
    y_test = np.argmax(y_test, axis=-1)
    predictions = np.argmax(predictions, axis=-1)
    c = confusion_matrix(y_test, predictions)
    print('Confusion matrix:\n', c)
    print('[{:03d}] sensitivity'.format(epoch), c[0, 0] / (c[0, 1] + c[0, 0]))
    print('[{:03d}] specificity'.format(epoch), c[1, 1] / (c[1, 1] + c[1, 0]))

Relevant parts of my new code:

idg = ImageDataGenerator(horizontal_flip=True)
train_seq = idg.flow_from_directory(train_dir, target_size=(pixels, pixels), shuffle=True)
valid_seq = idg.flow_from_directory(valid_dir, target_size=(pixels, pixels), shuffle=True)
test_seq  = idg.flow_from_directory(test_dir, target_size=(pixels, pixels), shuffle=True)

model = Sequential()
# ...
model.compile(...)
model.fit_generator(train_seq, validation_data=valid_seq, verbose=2,
                    epochs=epochs, callbacks=[SensitivitySpecificityCallback()])
score = model.evaluate_generator(test_seq, verbose=0)

Unfortunately, since moving from fit to flow_from_directory and fit_generator, this has erred because self.validation_data is None.

Most helpful comment

I am facing the same issue too.
But when I look into the source codes of Keras, I find that if we use fit_generator and the argument validation_data is a generator, the validation_data of our custom callback object won't be set, and this generator will be used in later validation.
So I use these codes:

class Metrics(Callback):

    def __init__(self, val_data, batch_size = 20):
        super().__init__()
        self.validation_data = val_data
        self.batch_size = batch_size

    def on_train_begin(self, logs={}):
        print(self.validation_data)
        self.val_f1s = []
        self.val_recalls = []
        self.val_precisions = []

    def on_epoch_end(self, epoch, logs={}):
        batches = len(self.validation_data)
        total = batches * self.batch_size

        val_pred = np.zeros((total,1))
        val_true = np.zeros((total))

        for batch in range(batches):
            xVal, yVal = next(self.validation_data)
            val_pred[batch * self.batch_size : (batch+1) * self.batch_size] = np.asarray(self.model.predict(xVal)).round()
            val_true[batch * self.batch_size : (batch+1) * self.batch_size] = yVal

        val_pred = np.squeeze(val_pred)
        _val_f1 = f1_score(val_true, val_pred)
        _val_precision = precision_score(val_true, val_pred)
        _val_recall = recall_score(val_true, val_pred)

        self.val_f1s.append(_val_f1)
        self.val_recalls.append(_val_recall)
        self.val_precisions.append(_val_precision)

        return

In the __init__ function I set self.validation_data = val_data, and the val_data is just the validation_generator above.
The batch_size should be same as what you set in the 'validation_generator', and here it's equal to 20.
By coding like this, I can use thevalidation_generator to get some metrics while using fit_generator.

All 33 comments

I have the same problem when I need to access validation data within a custom callback when using fit_generator for validation in order to compute the aux metric. So any fix for now for this issue?

I am running in the same issue. I have some tf.summary.image() calls in my model but they are never evaluated and displayed in Tensorboard because self.validation_data is always None.

I am confused because according to this line https://github.com/keras-team/keras/blob/5fcd832b5c5025b164c99f0bd46cb94d707b93d3/keras/engine/training_generator.py#L124, validation_data should be set.

Maybe someone with a deeper understanding on the API could comment?

My current solution is to unwrap the generator:

x, y = izip(*(valid_seq[i] for i in xrange(len(valid_seq))))
x_val = np.vstack(x)
y_val = np.vstack(imap(to_categorical, y) if class_mode == 'binary' else y)
model = Sequential()
model.fit_generator(train_seq, validation_data=(x_val, y_val), epochs=epochs,
                    callbacks=callbacks, verbose=1)

With imports:

if python_version_tuple()[0] == '3':
    xrange = range
    izip = zip
    imap = map
else:
    from itertools import izip, imap

from keras.models import Sequential
from keras.utils import to_categorical
import numpy as np

I'm not sure we would want this.
If the val_data is a generator, this has some side effects.
If it's a Sequence, I guess it's fine, but we could get into some weird things like:

# Thread 1
callbacks.on_epoch_end()
    # In callback
    for x,y in self.val_data:
        action(x,y)
# Thread 2
sequence.on_epoch_end() # this shuffle the Sequence for example
# Thread 1
    # Self.val_data is corrupted

Furthermore, the use case for using a generator/Sequence is that your data is big and don't fit in your RAM.

For your SensitivitySpecificityCallback, you could look into the StatefulMetric

I am using keras.utils.Sequence as a data generator. Are you sure there is no workaround on this? A special method added to keras.utils.Sequence, that would load a thread-safe piece of data?

It's a pretty big limitation to not be able to use tf.summary.image or similar to check how a model performs during training. Or maybe there are alternative ways to use TF summaries I can't think?

I am confused because according to this line

keras/keras/engine/training_generator.py

Line 124 in 5fcd832
cbk.validation_data = val_data
, validation_data should be set.

Maybe someone with a deeper understanding on the API could comment?

Thanks for pointing this out. This actually means that the validation data is assigned to a field validation_data of the callback object and not of the model. I'm not sure if this is where it's supposed to go (because with fit it doesn't), but in the meantime for things like computing AUX metric it's possible to read the data from self.validation_data instead. (I might be reiterating the obvious, but this information was rather hard to find.)

Do you have any other alternatives about this? Or do we have to give up fit_generator?

HI,

i am facing the same issue. validation_data is empty when using fit_generator

any updates yet?

Hello,

i am facing the same issue too. I would like to use custom metrics on_epoch_end. I implemented my own data generator with keras.utils.Sequence as it is mentioned on Keras documentation. But now, I am facing the same problem to retrieve validation_data to be used in my custom metrics.
Any updates ?

This one works for me (not sure about performance, though)
In my case valid_data is keras.utils.Sequence

class MetricsCallback(keras.callbacks.Callback):
    def __init__(self, valid_data):
        super(MetricsCallback, self).__init__()
        self.valid_data = valid_data

    def on_epoch_end(self, epoch, logs=None):
        if epoch:
            for i in range(len(self.valid_data)):
                x_test_batch, y_test_batch = self.valid_data.__getitem__(i)
                ### do what you need ###

I am facing the same issue too..

I am facing the same issue too.
But when I look into the source codes of Keras, I find that if we use fit_generator and the argument validation_data is a generator, the validation_data of our custom callback object won't be set, and this generator will be used in later validation.
So I use these codes:

class Metrics(Callback):

    def __init__(self, val_data, batch_size = 20):
        super().__init__()
        self.validation_data = val_data
        self.batch_size = batch_size

    def on_train_begin(self, logs={}):
        print(self.validation_data)
        self.val_f1s = []
        self.val_recalls = []
        self.val_precisions = []

    def on_epoch_end(self, epoch, logs={}):
        batches = len(self.validation_data)
        total = batches * self.batch_size

        val_pred = np.zeros((total,1))
        val_true = np.zeros((total))

        for batch in range(batches):
            xVal, yVal = next(self.validation_data)
            val_pred[batch * self.batch_size : (batch+1) * self.batch_size] = np.asarray(self.model.predict(xVal)).round()
            val_true[batch * self.batch_size : (batch+1) * self.batch_size] = yVal

        val_pred = np.squeeze(val_pred)
        _val_f1 = f1_score(val_true, val_pred)
        _val_precision = precision_score(val_true, val_pred)
        _val_recall = recall_score(val_true, val_pred)

        self.val_f1s.append(_val_f1)
        self.val_recalls.append(_val_recall)
        self.val_precisions.append(_val_precision)

        return

In the __init__ function I set self.validation_data = val_data, and the val_data is just the validation_generator above.
The batch_size should be same as what you set in the 'validation_generator', and here it's equal to 20.
By coding like this, I can use thevalidation_generator to get some metrics while using fit_generator.

I am facing the same issue too.
But when I look into the source codes of Keras, I find that if we use fit_generator and the argument validation_data is a generator, the validation_data of our custom callback object won't be set, and this generator will be used in later validation.
So I use these codes:

class Metrics(Callback):

    def __init__(self, val_data, batch_size = 20):
        super().__init__()
        self.validation_data = val_data
        self.batch_size = batch_size

    def on_train_begin(self, logs={}):
        print(self.validation_data)
        self.val_f1s = []
        self.val_recalls = []
        self.val_precisions = []

    def on_epoch_end(self, epoch, logs={}):
        batches = len(self.validation_data)
        total = batches * self.batch_size

        val_pred = np.zeros((total,1))
        val_true = np.zeros((total))

        for batch in range(batches):
            xVal, yVal = next(self.validation_data)
            val_pred[batch * self.batch_size : (batch+1) * self.batch_size] = np.asarray(self.model.predict(xVal)).round()
            val_true[batch * self.batch_size : (batch+1) * self.batch_size] = yVal

        val_pred = np.squeeze(val_pred)
        _val_f1 = f1_score(val_true, val_pred)
        _val_precision = precision_score(val_true, val_pred)
        _val_recall = recall_score(val_true, val_pred)

        self.val_f1s.append(_val_f1)
        self.val_recalls.append(_val_recall)
        self.val_precisions.append(_val_precision)

        return

In the init function I set self.validation_data = val_data, and the val_data is just the validation_generator above.
The batch_size should be same as what you set in the 'validation_generator', and here it's equal to 20.
By coding like this, I can use thevalidation_generator to get some metrics while using fit_generator.

This method works for me.
Thanks

@EzioA what did you mean exactly when you said "the val_data is just the validation_generator above"?

If I copy your code directly I get. Do I have to set val_data equal to my validation generator function?

[`TypeError Traceback (most recent call last)
in ()
38
39
---> 40 my_metrics = Metrics()
41
42 ########################################

TypeError: __init__() missing 1 required positional argument: 'val_data'`]

@MikeDoho Yes. This val_data is the very validation generator.

@MikeDoho Yes. This val_data is the very validation generator.

I'm facing the same problem and I tried to implement the same solution as you, but when I set my validation generator to val_data I got the following error:
"object of type 'generator' has no len() "

thanks

@marmoi

"object of type 'generator' has no len() "

That means that you validation generator does not implement the __len__() function. The "length" ist needed to know how many batches the generator provides in this case. What kind of generator are you using? Something self implemented by any chance? You can easily fix this problem by implementing said function or changing the provided code from @EzioA to only loop over a certain amount of samples.

@marmoi

"object of type 'generator' has no len() "

That means that you validation generator does not implement the len() function. The "length" ist needed to know how many batches the generator provides in this case. What kind of generator are you using? Something self implemented by any chance? You can easily fix this problem by implementing said function or changing the provided code from @EzioA to only loop over a certain amount of samples.

Yes I have implemented my own generator, but not as a class, just a function like:
def generator(features, labels, batch_size):
#While true get random indexes and create my batches
Should I implement I proper data generator in keras following the steps from https://stanford.edu/~shervine/blog/keras-how-to-generate-data-on-the-fly ?

Should I implement I proper data generator in keras following the steps from https://stanford.edu/~shervine/blog/keras-how-to-generate-data-on-the-fly ?

@marmoi You could do that, but you don't have to. You can also alter EzioA's code, more precisely the on_epoch_end function:

def on_epoch_end(self, epoch, logs={}):
        batches = len(self.validation_data)  # 1. Get the amount of batches the generator provides
        # .. other code
        for batch in range(batches): # 2. Iterate over the amount of batches
            xVal, yVal = next(self.validation_data) # 3. Retrieve a batch

For example, instead of doing 1. you could pass the amount of batches def __init__() as another parameter, so you don't have to get them from the generator itself. Some other generators also don't implement the __len__() function, as they generate as many batches as you want and do some on-the-fly augmentation. So sometimes it is necessary to give the amount of batches or steps that should be validated on. It really depends on your use case and how you want to validate.

I am facing the same issue too.
But when I look into the source codes of Keras, I find that if we use and the argument is a generator, the of our custom callback object won't be set, and this generator will be used in later validation.
So I use these codes:fit_generator``validation_data``validation_data

class Metrics(Callback):

    def __init__(self, val_data, batch_size = 20):
        super().__init__()
        self.validation_data = val_data
        self.batch_size = batch_size

    def on_train_begin(self, logs={}):
        print(self.validation_data)
        self.val_f1s = []
        self.val_recalls = []
        self.val_precisions = []

    def on_epoch_end(self, epoch, logs={}):
        batches = len(self.validation_data)
        total = batches * self.batch_size

        val_pred = np.zeros((total,1))
        val_true = np.zeros((total))

        for batch in range(batches):
            xVal, yVal = next(self.validation_data)
            val_pred[batch * self.batch_size : (batch+1) * self.batch_size] = np.asarray(self.model.predict(xVal)).round()
            val_true[batch * self.batch_size : (batch+1) * self.batch_size] = yVal

        val_pred = np.squeeze(val_pred)
        _val_f1 = f1_score(val_true, val_pred)
        _val_precision = precision_score(val_true, val_pred)
        _val_recall = recall_score(val_true, val_pred)

        self.val_f1s.append(_val_f1)
        self.val_recalls.append(_val_recall)
        self.val_precisions.append(_val_precision)

        return

In the init function I set , and the is just the above.
The should be same as what you set in the 'validation_generator', and here it's equal to 20.
By coding like this, I can use the to get some metrics while using .self.validation_data = val_data``val_data``validation_generator``batch_size``validation_generator``fit_generator

This method works for me.
Thanks

@MikeDoho Yes. This val_data is the very validation generator.

hey~ thanku for your method.
I want to get these val_f1s = [] val_recalls = [] val_precisions = [] result after training,
and I find a ModelCheckpoint() method to help me automatically save best weight by monitor the val_f1
but it return some Issue with Callbacks logs & ModelCheckpoint because there is no val_f1 in logs{}
finally, I set logs['f1']=_val_f1 , and it works.

but why you set these val_f1s = [] val_recalls = [] val_precisions = [] on the epoch begin
and how could I get them after training?
thanks~ (QAQ)

I am confused because according to this line

https://github.com/keras-team/keras/blob/5fcd832b5c5025b164c99f0bd46cb94d707b93d3/keras/engine/training_generator.py#L124

, validation_data should be set.
Maybe someone with a deeper understanding on the API could comment?

@hadim I think the line cbk.validation_data = val_data will work when validation_data is not a generator.

https://github.com/keras-team/keras/blob/5fcd832b5c5025b164c99f0bd46cb94d707b93d3/keras/engine/training_generator.py#L60-L62

https://github.com/keras-team/keras/blob/5fcd832b5c5025b164c99f0bd46cb94d707b93d3/keras/engine/training_generator.py#L105

I am confused because according to this line
https://github.com/keras-team/keras/blob/5fcd832b5c5025b164c99f0bd46cb94d707b93d3/keras/engine/training_generator.py#L124

, validation_data should be set.
Maybe someone with a deeper understanding on the API could comment?

@hadim I think the line cbk.validation_data = val_data will work when validation_data is not a generator.

https://github.com/keras-team/keras/blob/5fcd832b5c5025b164c99f0bd46cb94d707b93d3/keras/engine/training_generator.py#L60-L62

https://github.com/keras-team/keras/blob/5fcd832b5c5025b164c99f0bd46cb94d707b93d3/keras/engine/training_generator.py#L105

it works when validation_data is a generator and I can use it by :
for batch in range(batches): xVal, yVal = self.validation_data.__getitem__(batch)
the method __getitem__() was defined in my dataGenerator() which extend from keras.utils.Sequence

I am facing the same issue too.
But when I look into the source codes of Keras, I find that if we use fit_generator and the argument validation_data is a generator, the validation_data of our custom callback object won't be set, and this generator will be used in later validation.
So I use these codes:

class Metrics(Callback):

    def __init__(self, val_data, batch_size = 20):
        super().__init__()
        self.validation_data = val_data
        self.batch_size = batch_size

    def on_train_begin(self, logs={}):
        print(self.validation_data)
        self.val_f1s = []
        self.val_recalls = []
        self.val_precisions = []

    def on_epoch_end(self, epoch, logs={}):
        batches = len(self.validation_data)
        total = batches * self.batch_size

        val_pred = np.zeros((total,1))
        val_true = np.zeros((total))

        for batch in range(batches):
            xVal, yVal = next(self.validation_data)
            val_pred[batch * self.batch_size : (batch+1) * self.batch_size] = np.asarray(self.model.predict(xVal)).round()
            val_true[batch * self.batch_size : (batch+1) * self.batch_size] = yVal

        val_pred = np.squeeze(val_pred)
        _val_f1 = f1_score(val_true, val_pred)
        _val_precision = precision_score(val_true, val_pred)
        _val_recall = recall_score(val_true, val_pred)

        self.val_f1s.append(_val_f1)
        self.val_recalls.append(_val_recall)
        self.val_precisions.append(_val_precision)

        return

In the init function I set self.validation_data = val_data, and the val_data is just the validation_generator above.
The batch_size should be same as what you set in the 'validation_generator', and here it's equal to 20.
By coding like this, I can use thevalidation_generator to get some metrics while using fit_generator.

Got past where I was stuck by using this, thanks!

@Dref360 Would it not be safe to use separate copies of the generator on different threads?

@EzioA Thanks for your code, but now I want two input layers. How can I modify them in this case?

I am facing the same issue too.
But when I look into the source codes of Keras, I find that if we use fit_generator and the argument validation_data is a generator, the validation_data of our custom callback object won't be set, and this generator will be used in later validation.
So I use these codes:

class Metrics(Callback):

    def __init__(self, val_data, batch_size = 20):
        super().__init__()
        self.validation_data = val_data
        self.batch_size = batch_size

    def on_train_begin(self, logs={}):
        print(self.validation_data)
        self.val_f1s = []
        self.val_recalls = []
        self.val_precisions = []

    def on_epoch_end(self, epoch, logs={}):
        batches = len(self.validation_data)
        total = batches * self.batch_size

        val_pred = np.zeros((total,1))
        val_true = np.zeros((total))

        for batch in range(batches):
            xVal, yVal = next(self.validation_data)
            val_pred[batch * self.batch_size : (batch+1) * self.batch_size] = np.asarray(self.model.predict(xVal)).round()
            val_true[batch * self.batch_size : (batch+1) * self.batch_size] = yVal

        val_pred = np.squeeze(val_pred)
        _val_f1 = f1_score(val_true, val_pred)
        _val_precision = precision_score(val_true, val_pred)
        _val_recall = recall_score(val_true, val_pred)

        self.val_f1s.append(_val_f1)
        self.val_recalls.append(_val_recall)
        self.val_precisions.append(_val_precision)

        return

In the init function I set self.validation_data = val_data, and the val_data is just the validation_generator above.
The batch_size should be same as what you set in the 'validation_generator', and here it's equal to 20.
By coding like this, I can use thevalidation_generator to get some metrics while using fit_generator.

What if the total number of samples in validation_generator is not an exact multiple of the batch_size?

@rutujagurav That's why I set batch_size such an odd number. I suggest you adjusting the total number of your validation set, or you can modify this code segmentation with left_over mechanism. You can feed the remaining data to your model in the final batch.

class F1_Metric(Callback):
        def __init__(self, val_data, batch_size):
            super().__init__()
            self.validation_data = val_data
            self.batch_size = batch_size

        def on_train_begin(self, logs={}):
            self.val_f1s = []
            self.val_recalls = []
            self.val_precisions = []

        def on_epoch_end(self, epoch, logs={}):
            batches = len(self.validation_data)
            total = batches * self.batch_size

            val_pred = []
            val_true = []
            for batch in range(batches):

                xVal, yVal = next(self.validation_data)

                val_pred_batch = np.zeros((len(xVal)))
                val_true_batch = np.zeros((len(xVal)))

                val_pred_batch = np.asarray(self.model.predict(xVal)).round() # incase of a binary classification using sigmoid activation in the output layer and binary_crossentropy loss
                val_pred_batch = np.argmax(np.asarray(self.model.predict(xVal)), axis=1) # incase of a binary or multiclass classification using softmax activation in the output layer and categorical_crossentropy loss
                val_true_batch = np.argmax(yVal,axis=1)


                val_pred.append(val_pred_batch)
                val_true.append(val_true_batch)


            val_pred = np.asarray(list(itertools.chain.from_iterable(val_pred)))
            val_true = np.asarray(list(itertools.chain.from_iterable(val_true)))


            _val_f1 = f1_score(val_true, val_pred)
            _val_precision = precision_score(val_true, val_pred)
            _val_recall = recall_score(val_true, val_pred)

            self.val_f1s.append(_val_f1)
            self.val_recalls.append(_val_recall)
            self.val_precisions.append(_val_precision)

            return

@EzioA Yeah, in a more general case of some arbitrary validation set size, the above modification could work. I am sure there is a more elegant way of doing this but this is what I hacked together quickly.

Quick question, would the val_f1s, val_recalls and val_precisions be stored in the log directory provided in the 'logs' argument passed to the function?

@rutujagurav Maybe the metrics can be stored in this way, but I'm not sure.

I am executing the code on google colab and tf version is 2.2.0-rc2 , i have built a custom callback for calculating the F1 score and AUC score.
I am getting the same error with fit. I am not using fit_generators as data augmentation in not necessary in my code.

`from tensorflow.python.util.tf_export import keras_export
@keras_export('keras.callbacks.Callback')
class MyCustomCallback(tf.keras.callbacks.Callback):
def on_train_begin(self, logs={}):
self._data = []

def on_epoch_end(self, epoch, logs={}):
print(type(self.validation_data[0]))
print(type(self.validation_data[1]))
X_val, y_val = self.validation_data[0], self.validation_data[1]
y_predict = np.asarray(model.predict(X_val))

y_val = np.argmax(y_val, axis=1)
y_predict = np.argmax(y_predict, axis=1)

self._data.append({
  'val_microF1Score': f1_score(y_val, y_pred, average='micro'),
  'val_rocauc': roc_auc_score(y_val, y_predict),
  })

print('The f1 score {:7.2f} and auc {:7.2f}  for epoch {} .'.format( logs['val_microF1Score'], logs['val_rocauc'],epoch))
return

def get_data(self):
return self._data

mycallback = MyCustomCallback()`

Invoking after compiling model calling fit
history = model.fit(X_train, y_train, epochs=nb_epoch,batch_size=batch_size,validation_data = (X_cv,y_cv),callbacks=[mycallback])

but get error as below screenshot
image

It would be very helpful to me if you could give any temporary workaround , as I am held up in my work .

Thank you

Hi
Sorry,I raised it in the wrong issue chain I guess.
Found a similar match to mine here https://github.com/tensorflow/tensorflow/issues/32981 and have added mine there too.
Just wanted to update here on that
Thanks

Was this page helpful?
0 / 5 - 0 ratings