Keras: Option for using dropout in the predict phase (as an approximation to Bayesian DL)

Created on 17 Feb 2018  路  22Comments  路  Source: keras-team/keras

As mentioned in issue #5357 (https://github.com/keras-team/keras/issues/5357#issuecomment-350276900) by @spearsem and @alexchao56 it would be nice if we could enable dropout in the prediction stage of the model and not just in training.

There is solid work motivating this use case as an approximation to Bayesian deep learning http://proceedings.mlr.press/v48/gal16.pdf (in this case as a variational approximation to deep GPs).

Ideally one would be able to run predict multiple times and use the expected value of these predictions as an estimate of the overall prediction and its std to quantify the uncertainty around the prediction.

Other than the feature request, is there a way to possibly go around the current setup in Keras to achieve this ?

Most helpful comment

There is this feature in Keras: it's the training argument in the call of the Dropout layer.

Here's a model with a Dense layer and a Dropout layer that runs both in training and testing:

import keras

inputs = keras.Input(shape=(10,))
x = keras.layers.Dense(3)(inputs)
outputs = keras.layers.Dropout(0.5)(x, training=True)

model = keras.Model(inputs, outputs)

All 22 comments

potential work around

import keras.backend as K
# for some model with dropout ...
f = K.function([model.layers[0].input, K.learning_phase()],
               [model.layers[-1].output])

def predict_with_uncertainty(f, x, no_classes, n_iter=100):
    result = np.zeros((n_iter,) + (x.shape[0], no_classes) )

    for i in range(n_iter):
        result[i,:, :] = f((x, 1))[0]

    prediction = result.mean(axis=0)
    uncertainty = result.std(axis=0)
    return prediction, uncertainty    

@franciscovargas that work around seems to be correct since it was used by Gal in the implementation for the experiments of the paper Dropout as a Bayesian Approximation: Representing Model Uncertainty in Deep Learning. See the implementation here.

Still would be nice to have this build into Keras so that it works nicely with the model predict functions.

Thanks, I wish I had seen that earlier on today :D ...

There is this feature in Keras: it's the training argument in the call of the Dropout layer.

Here's a model with a Dense layer and a Dropout layer that runs both in training and testing:

import keras

inputs = keras.Input(shape=(10,))
x = keras.layers.Dense(3)(inputs)
outputs = keras.layers.Dropout(0.5)(x, training=True)

model = keras.Model(inputs, outputs)

Maybe worth adding to the docs and saving more questions asked in the future since I can't see it in core layers for dropout. No such param is mentioned. It was not immediately clear for me when reading the source that the training flag was for this.

https://keras.io/layers/core/

In the implementation with the Training = True parameter in layer dropout, are the values scale in the training phase? Are the values scale in the prediction phase?
I am not sure about what the parameter Training=True is doing.

@franciscovargas Your method works for me but it seems to cause a memory leak. #10338

There is this feature in Keras: it's the training argument in the call of the Dropout layer.

Here's a model with a Dense layer and a Dropout layer that runs both in training and testing:

import keras

inputs = keras.Input(shape=(10,))
x = keras.layers.Dense(3)(inputs)
outputs = keras.layers.Dropout(0.5)(x, training=True)

model = keras.Model(inputs, outputs)

when I use lstm(recurrent_dropout=0.5), and I want keep the recurrent_dropout in test phase. is the following code right?

import keras

inputs = keras.Input(shape=(10,))
x = keras.layers.LSTM(10,recurrent_dropout=0.5)(inputs, training=True)
x = keras.layers.Dense(3)(x)
outputs = keras.layers.Dropout(0.5)(x, training=True)

model = keras.Model(inputs, outputs)

@fchollet thanks a lot !!! works like a charm

Does the training=True option work with LSTM layers with recurrent_dropout as well?

This doesn't seem to work with SpatialDropout laters, any suggestions?

Great thread, but how can I use training=true in the Sequential API? for example

model = Sequential()
model.add(LSTM(...))
Model.add(Dropout(0.2))
...

is this documented anywhere?

Great thread, but how can I use training=true in the Sequential API? for example

model = Sequential()
model.add(LSTM(...))
Model.add(Dropout(0.2))
...

is this documented anywhere?

I've just stumbled accross the same problem. The general question is how to override keras call-methods to toggle between call-methodology and the classical Sequential-API.
My hacky quickfix was to inherit from the keras.layers.Dropout class and overwrite its call-method. In additon I added the kwarg training=True to the __init__-method before calling super with the arguments expected by the base-class.

class Dropout(keras.layers.Dropout):
    """Applies Dropout to the input.
    Dropout consists in randomly setting
    a fraction `rate` of input units to 0 at each update during training time,
    which helps prevent overfitting.
    # Arguments
        rate: float between 0 and 1. Fraction of the input units to drop.
        noise_shape: 1D integer tensor representing the shape of the
            binary dropout mask that will be multiplied with the input.
            For instance, if your inputs have shape
            `(batch_size, timesteps, features)` and
            you want the dropout mask to be the same for all timesteps,
            you can use `noise_shape=(batch_size, 1, features)`.
        seed: A Python integer to use as random seed.
    # References
        - [Dropout: A Simple Way to Prevent Neural Networks from Overfitting](
           http://www.jmlr.org/papers/volume15/srivastava14a/srivastava14a.pdf)
    """
    def __init__(self, rate, training=None, noise_shape=None, seed=None, **kwargs):
        super(Dropout, self).__init__(rate, noise_shape=None, seed=None,**kwargs)
        self.training = training


    def call(self, inputs, training=None):
        if 0. < self.rate < 1.:
            noise_shape = self._get_noise_shape(inputs)

            def dropped_inputs():
                return K.dropout(inputs, self.rate, noise_shape,
                                 seed=self.seed)
            if not training: 
                return K.in_train_phase(dropped_inputs, inputs, training=self.training)
            return K.in_train_phase(dropped_inputs, inputs, training=training)
        return inputs

Now you can just pass the argument when adding layers via the Sequential API, such as:

model.add(keras.layers.Dense(512, activation="relu"))
model.add(Dropout(rate=0.5, training=True))
model.add(keras.layers.Dense(256, activation="relu"))
model.add(Dropout(rate=0.5, training=True))
model.add(keras.layers.Dense(2, activation="softmax"))

There is this feature in Keras: it's the training argument in the call of the Dropout layer.

Here's a model with a Dense layer and a Dropout layer that runs both in training and testing:

import keras

inputs = keras.Input(shape=(10,))
x = keras.layers.Dense(3)(inputs)
outputs = keras.layers.Dropout(0.5)(x, training=True)

model = keras.Model(inputs, outputs)

Can you also switch back to the non-dropout prediction after compiling? Or is it compiled in and do you need to make a separate model and transfer the weights?

@franciscovargas thanks for the workaround.

One question I have is if Keras rescale the weights during test phase when dropout is 'enabled'. Theoretically the average you obtain from the MC dropout should be similar with the prediction you get when you use all the connections for the same input. However, in my case the output from MC dropout is always much smaller than the prediction with out dropout.

There is this feature in Keras: it's the training argument in the call of the Dropout layer.

Here's a model with a Dense layer and a Dropout layer that runs both in training and testing:

import keras

inputs = keras.Input(shape=(10,))
x = keras.layers.Dense(3)(inputs)
outputs = keras.layers.Dropout(0.5)(x, training=True)

model = keras.Model(inputs, outputs)

@fchollet If I use training=True to enable the Dropout, is it possible to turn it off in the testing phase when necessary?

potential work around

import keras.backend as K
# for some model with dropout ...
f = K.function([model.layers[0].input, K.learning_phase()],
               [model.layers[-1].output])

def predict_with_uncertainty(f, x, no_classes, n_iter=100):
    result = np.zeros((n_iter,) + (x.shape[0], no_classes) )

    for i in range(n_iter):
        result[i,:, :] = f((x, 1))[0]

    prediction = result.mean(axis=0)
    uncertainty = result.std(axis=0)
    return prediction, uncertainty    

The workaround fails (error in defining K.function) due to the issue mentioned in https://github.com/tensorflow/tensorflow/issues/34201

@MalteEbner : See my suggestion here: https://github.com/tensorflow/tensorflow/issues/34201#issuecomment-577596280

Has anything changed in tf now? I am getting the same predictions with the suggested snippet.

potential work around

import keras.backend as K
# for some model with dropout ...
f = K.function([model.layers[0].input, K.learning_phase()],
               [model.layers[-1].output])

def predict_with_uncertainty(f, x, no_classes, n_iter=100):
    result = np.zeros((n_iter,) + (x.shape[0], no_classes) )

    for i in range(n_iter):
        result[i,:, :] = f((x, 1))[0]

    prediction = result.mean(axis=0)
    uncertainty = result.std(axis=0)
    return prediction, uncertainty    

The workaround fails (error in defining K.function) due to the issue mentioned in tensorflow/tensorflow#34201

@gieses I was wondering too. Uncertainty is always zero

There is this feature in Keras: it's the training argument in the call of the Dropout layer.
Here's a model with a Dense layer and a Dropout layer that runs both in training and testing:

import keras

inputs = keras.Input(shape=(10,))
x = keras.layers.Dense(3)(inputs)
outputs = keras.layers.Dropout(0.5)(x, training=True)

model = keras.Model(inputs, outputs)

when I use lstm(recurrent_dropout=0.5), and I want keep the recurrent_dropout in test phase. is the following code right?

import keras

inputs = keras.Input(shape=(10,))
x = keras.layers.LSTM(10,recurrent_dropout=0.5)(inputs, training=True)
x = keras.layers.Dense(3)(x)
outputs = keras.layers.Dropout(0.5)(x, training=True)

model = keras.Model(inputs, outputs)

Did you figure it out?

http://www.cs.ox.ac.uk/people/yarin.gal/website/blog_2248.html

As mentioned in this blog written by the inventor of MC dropout, fixing the dropped weights for all test inputs make better visualization.

Does anyone have a solution for fixing the dropout weights using the keras dropout?

Was this page helpful?
0 / 5 - 0 ratings