Keras: Option for using dropout in the predict phase (as an approximation to Bayesian DL)

Created on 17 Feb 2018 · 22Comments · Source: keras-team/keras

As mentioned in issue #5357 (https://github.com/keras-team/keras/issues/5357#issuecomment-350276900) by @spearsem and @alexchao56 it would be nice if we could enable dropout in the prediction stage of the model and not just in training.

There is solid work motivating this use case as an approximation to Bayesian deep learning http://proceedings.mlr.press/v48/gal16.pdf (in this case as a variational approximation to deep GPs).

Ideally one would be able to run predict multiple times and use the expected value of these predictions as an estimate of the overall prediction and its std to quantify the uncertainty around the prediction.

Other than the feature request, is there a way to possibly go around the current setup in Keras to achieve this ?

Source

franciscovargas

👍16

Most helpful comment

There is this feature in Keras: it's the training argument in the call of the Dropout layer.

Here's a model with a Dense layer and a Dropout layer that runs both in training and testing:

import keras

inputs = keras.Input(shape=(10,))
x = keras.layers.Dense(3)(inputs)
outputs = keras.layers.Dropout(0.5)(x, training=True)

model = keras.Model(inputs, outputs)

fchollet on 18 Feb 2018

👍58 🎉10 ❤9 🚀2 😄1

All 22 comments

potential work around

import keras.backend as K
# for some model with dropout ...
f = K.function([model.layers[0].input, K.learning_phase()],
               [model.layers[-1].output])

def predict_with_uncertainty(f, x, no_classes, n_iter=100):
    result = np.zeros((n_iter,) + (x.shape[0], no_classes) )

    for i in range(n_iter):
        result[i,:, :] = f((x, 1))[0]

    prediction = result.mean(axis=0)
    uncertainty = result.std(axis=0)
    return prediction, uncertainty

franciscovargas on 17 Feb 2018

👍21

@franciscovargas that work around seems to be correct since it was used by Gal in the implementation for the experiments of the paper Dropout as a Bayesian Approximation: Representing Model Uncertainty in Deep Learning. See the implementation here.

Still would be nice to have this build into Keras so that it works nicely with the model predict functions.

JamesAllingham on 18 Feb 2018

👍11

Thanks, I wish I had seen that earlier on today :D ...

franciscovargas on 18 Feb 2018

There is this feature in Keras: it's the training argument in the call of the Dropout layer.

Here's a model with a Dense layer and a Dropout layer that runs both in training and testing:

import keras

inputs = keras.Input(shape=(10,))
x = keras.layers.Dense(3)(inputs)
outputs = keras.layers.Dropout(0.5)(x, training=True)

model = keras.Model(inputs, outputs)

fchollet on 18 Feb 2018

👍58 🎉10 ❤9 🚀2 😄1

Maybe worth adding to the docs and saving more questions asked in the future since I can't see it in core layers for dropout. No such param is mentioned. It was not immediately clear for me when reading the source that the training flag was for this.

https://keras.io/layers/core/

franciscovargas on 18 Feb 2018

👍21

In the implementation with the Training = True parameter in layer dropout, are the values scale in the training phase? Are the values scale in the prediction phase?
I am not sure about what the parameter Training=True is doing.

sanchezismael on 14 May 2018

@franciscovargas Your method works for me but it seems to cause a memory leak. #10338

grantwwoodford on 1 Jun 2018

There is this feature in Keras: it's the training argument in the call of the Dropout layer.

Here's a model with a Dense layer and a Dropout layer that runs both in training and testing:
import keras

inputs = keras.Input(shape=(10,))
x = keras.layers.Dense(3)(inputs)
outputs = keras.layers.Dropout(0.5)(x, training=True)

model = keras.Model(inputs, outputs)

when I use lstm(recurrent_dropout=0.5), and I want keep the recurrent_dropout in test phase. is the following code right?

import keras

inputs = keras.Input(shape=(10,))
x = keras.layers.LSTM(10,recurrent_dropout=0.5)(inputs, training=True)
x = keras.layers.Dense(3)(x)
outputs = keras.layers.Dropout(0.5)(x, training=True)

model = keras.Model(inputs, outputs)

chjq201410695 on 1 Dec 2018

@fchollet thanks a lot !!! works like a charm

sergehijo on 7 Dec 2018

Does the training=True option work with LSTM layers with recurrent_dropout as well?

abhinav-upadhyay on 15 Jan 2019

👍6

This doesn't seem to work with SpatialDropout laters, any suggestions?

romanovzky on 20 Jun 2019

Great thread, but how can I use training=true in the Sequential API? for example

model = Sequential()
model.add(LSTM(...))
Model.add(Dropout(0.2))
...

is this documented anywhere?

fccoelho on 10 Jul 2019

Great thread, but how can I use training=true in the Sequential API? for example
model = Sequential()
model.add(LSTM(...))
Model.add(Dropout(0.2))
...
is this documented anywhere?

I've just stumbled accross the same problem. The general question is how to override keras call-methods to toggle between call-methodology and the classical Sequential-API.
My hacky quickfix was to inherit from the keras.layers.Dropout class and overwrite its call-method. In additon I added the kwarg training=True to the __init__-method before calling super with the arguments expected by the base-class.

class Dropout(keras.layers.Dropout):
    """Applies Dropout to the input.
    Dropout consists in randomly setting
    a fraction `rate` of input units to 0 at each update during training time,
    which helps prevent overfitting.
    # Arguments
        rate: float between 0 and 1. Fraction of the input units to drop.
        noise_shape: 1D integer tensor representing the shape of the
            binary dropout mask that will be multiplied with the input.
            For instance, if your inputs have shape
            `(batch_size, timesteps, features)` and
            you want the dropout mask to be the same for all timesteps,
            you can use `noise_shape=(batch_size, 1, features)`.
        seed: A Python integer to use as random seed.
    # References
        - [Dropout: A Simple Way to Prevent Neural Networks from Overfitting](
           http://www.jmlr.org/papers/volume15/srivastava14a/srivastava14a.pdf)
    """
    def __init__(self, rate, training=None, noise_shape=None, seed=None, **kwargs):
        super(Dropout, self).__init__(rate, noise_shape=None, seed=None,**kwargs)
        self.training = training


    def call(self, inputs, training=None):
        if 0. < self.rate < 1.:
            noise_shape = self._get_noise_shape(inputs)

            def dropped_inputs():
                return K.dropout(inputs, self.rate, noise_shape,
                                 seed=self.seed)
            if not training: 
                return K.in_train_phase(dropped_inputs, inputs, training=self.training)
            return K.in_train_phase(dropped_inputs, inputs, training=training)
        return inputs

Now you can just pass the argument when adding layers via the Sequential API, such as:

model.add(keras.layers.Dense(512, activation="relu"))
model.add(Dropout(rate=0.5, training=True))
model.add(keras.layers.Dense(256, activation="relu"))
model.add(Dropout(rate=0.5, training=True))
model.add(keras.layers.Dense(2, activation="softmax"))

alxhrzg on 11 Jul 2019

👍11

There is this feature in Keras: it's the training argument in the call of the Dropout layer.

Here's a model with a Dense layer and a Dropout layer that runs both in training and testing:
import keras

inputs = keras.Input(shape=(10,))
x = keras.layers.Dense(3)(inputs)
outputs = keras.layers.Dropout(0.5)(x, training=True)

model = keras.Model(inputs, outputs)

Can you also switch back to the non-dropout prediction after compiling? Or is it compiled in and do you need to make a separate model and transfer the weights?

arjangroen on 5 Aug 2019

👍8

@franciscovargas thanks for the workaround.

One question I have is if Keras rescale the weights during test phase when dropout is 'enabled'. Theoretically the average you obtain from the MC dropout should be similar with the prediction you get when you use all the connections for the same input. However, in my case the output from MC dropout is always much smaller than the prediction with out dropout.

kpelechrinis on 17 Aug 2019

👍1

There is this feature in Keras: it's the training argument in the call of the Dropout layer.

Here's a model with a Dense layer and a Dropout layer that runs both in training and testing:
import keras

inputs = keras.Input(shape=(10,))
x = keras.layers.Dense(3)(inputs)
outputs = keras.layers.Dropout(0.5)(x, training=True)

model = keras.Model(inputs, outputs)

@fchollet If I use training=True to enable the Dropout, is it possible to turn it off in the testing phase when necessary?

qiuriyi on 21 Jan 2020

potential work around

import keras.backend as K
# for some model with dropout ...
f = K.function([model.layers[0].input, K.learning_phase()],
               [model.layers[-1].output])

def predict_with_uncertainty(f, x, no_classes, n_iter=100):
    result = np.zeros((n_iter,) + (x.shape[0], no_classes) )

    for i in range(n_iter):
        result[i,:, :] = f((x, 1))[0]

    prediction = result.mean(axis=0)
    uncertainty = result.std(axis=0)
    return prediction, uncertainty

The workaround fails (error in defining K.function) due to the issue mentioned in https://github.com/tensorflow/tensorflow/issues/34201

MalteEbner on 21 Jan 2020

@MalteEbner : See my suggestion here: https://github.com/tensorflow/tensorflow/issues/34201#issuecomment-577596280

kaibrach on 29 Jan 2020

Has anything changed in tf now? I am getting the same predictions with the suggested snippet.

potential work around

import keras.backend as K
# for some model with dropout ...
f = K.function([model.layers[0].input, K.learning_phase()],
               [model.layers[-1].output])

def predict_with_uncertainty(f, x, no_classes, n_iter=100):
    result = np.zeros((n_iter,) + (x.shape[0], no_classes) )

    for i in range(n_iter):
        result[i,:, :] = f((x, 1))[0]

    prediction = result.mean(axis=0)
    uncertainty = result.std(axis=0)
    return prediction, uncertainty

The workaround fails (error in defining K.function) due to the issue mentioned in tensorflow/tensorflow#34201

gieses on 11 Apr 2020

@gieses I was wondering too. Uncertainty is always zero

gismo07 on 20 Apr 2020

There is this feature in Keras: it's the training argument in the call of the Dropout layer.
Here's a model with a Dense layer and a Dropout layer that runs both in training and testing:
import keras

inputs = keras.Input(shape=(10,))
x = keras.layers.Dense(3)(inputs)
outputs = keras.layers.Dropout(0.5)(x, training=True)

model = keras.Model(inputs, outputs)
when I use lstm(recurrent_dropout=0.5), and I want keep the recurrent_dropout in test phase. is the following code right?

import keras

inputs = keras.Input(shape=(10,))
x = keras.layers.LSTM(10,recurrent_dropout=0.5)(inputs, training=True)
x = keras.layers.Dense(3)(x)
outputs = keras.layers.Dropout(0.5)(x, training=True)

model = keras.Model(inputs, outputs)