Keras: dropout in training and testing

Created on 11 Feb 2017 · 17Comments · Source: keras-team/keras

In this link, devinplatt gives the following way to include dropout in training,

model = Sequential()
model.add(Dropout(0.5, input_shape=(20,)))
model.add(Dense(64, init='uniform'))

In this post, author mentioned that “Finally, if the training has finished, you’d use the complete network for testing (or in other words, you set the dropout probability to 0).”

In terms of keras implementation, does that mean, we have to modify the line model.add(Dropout(0.5, input_shape=(20,))) after we loading the training weight.

stale

Source

wenouyang

Most helpful comment

@unrealwill There is another use case of dropout at testing or inference time: in order to get a notion of uncertainty and variability in the prediction of the network model, you might take a given input and run predict on it many times, each with different randomly assigned dropout neurons.

Say you run predict 100 times for a single test input. The average of these will approximate what you get with no dropout, the 'expected value' over different weight schemes. And various metrics like the standard deviation of these results will give you a sense of the error bounds of your estimate (conditioned on assumptions about the validity of the underlying model structure).

In this sense, it would be very useful to have to ability to re-activate Dropout settings from training, but specifically during testing or regular inference.

spearsem on 8 Dec 2017

👍24 ❤1

All 17 comments

Hello,
By looking at the source code :
https://github.com/fchollet/keras/blob/master/keras/layers/core.py#L111
x = K.in_train_phase(dropped_inputs, lambda: x)

You can see that dropout is only applied in train phase.

unrealwill on 11 Feb 2017

That is correct - dropout should be applied during training (drop inputs with probability p) but there also needs to be a corresponding component of scaling the weights at test time as outlined in the referenced paper

I guess this is not happening at the moment, at least the results I got thus far might indicate that there is an issue here. Will investigate this further and see if I can provide an example.

radekosmulski on 23 Feb 2017

Hello, @radekosmulski
This is not a problem. See issue https://github.com/fchollet/keras/issues/3305.
Keras use inverse scaling during training (so that remaining weights are increased during training).
See :

def dropped_inputs():
  return K.dropout(x, self.p, noise_shape, seed=self.seed)

https://github.com/fchollet/keras/blob/master/keras/layers/core.py#L110

unrealwill on 23 Feb 2017

Thank you for your reply @unrealwill. I am new to keras so sorry if I misunderstand something. I still feel there is something unusual when running model.predict or model.evaluate when using dropout. Please see below:

import keras
import numpy as np

X = np.array(
    [[2, 1],
     [4, 2]])
y = np.array(
    [[5],
     [10]]
)

# Works as expected without dropout
model = keras.models.Sequential()
model.add(keras.layers.Dense(input_dim=2, output_dim=1))
model.compile(keras.optimizers.SGD(), loss='MSE')
model.fit(X, y, nb_epoch=10000, verbose=0)
model.evaluate(X, y) # => ~0

# With dropout
model = keras.models.Sequential()
model.add(keras.layers.Dense(input_dim=2, output_dim=1))
model.add(keras.layers.Dropout(0.5))
model.compile(keras.optimizers.SGD(), loss='MSE')
model.fit(X, y, nb_epoch=10000, verbose=0)
model.evaluate(X, y) # => converges to MSE of 15.625

model.predict(X) # => array([[ 2.5],
                 #          [ 5. ]], dtype=float32)

The MSE this converges to is due to the outputs being exactly half of what they should be (2.5^2+5^2)/2 = 15.625

radekosmulski on 23 Feb 2017

@radekosmulski
The Dropout noise introduce bias as it is a non symmetric noise.
Dropout shouldn't be added as a last layer (which we normally don't do).
Because "mse" is convex, Jensen inequality applies and you are training to learn the bias of the noise.

The bias of the dropout can be subsequently removed by using a dense layer after the first layer (=>average result = 7.5 ).
And if you had more hidden cells (100) you average the noise out, and get what you want.

import keras
import numpy as np

X = np.array(
    [[2, 1],
     [4, 2]])
y = np.array(
    [[5],
     [10]]
)

# Works as expected without dropout
model = keras.models.Sequential()
model.add(keras.layers.Dense(input_dim=2, output_dim=1))
model.compile(keras.optimizers.SGD(), loss='MSE')
model.fit(X, y, nb_epoch=10000, verbose=0)
print model.evaluate(X, y) # => ~0

# With dropout
model = keras.models.Sequential()
model.add(keras.layers.Dense(input_dim=2, output_dim=100))
model.add(keras.layers.Dropout(0.5))
model.add(keras.layers.Dense(1))
model.compile(keras.optimizers.adam(), loss='MSE')
model.fit(X, y, nb_epoch=100000, verbose=0)
print model.evaluate(X, y) # => converges to MSE of 15.625

print model.predict(X) # => array([[ 4.91],
                 #          [ 9.96 ]], dtype=float32)

unrealwill on 23 Feb 2017

👍1

@unrealwill thank you very much for taking the time to reply, I really appreciate it. I understand now.

radekosmulski on 24 Feb 2017

This issue has been automatically marked as stale because it has not had recent activity. It will be closed after 30 days if no further activity occurs, but feel free to re-open a closed issue if needed.

stale[bot] on 25 May 2017

In this sense, it would be very useful to have to ability to re-activate Dropout settings from training, but specifically during testing or regular inference.

spearsem on 8 Dec 2017

👍24 ❤1

I'd also like to +1 the ability to turn on or off dropout layers during test time. This is the paper that @spearsem is referring to: "Dropout as a Bayesian Approximation:
Representing Model Uncertainty in Deep Learning" http://proceedings.mlr.press/v48/gal16.pdf

alexchao56 on 18 Jan 2018

👍20 ❤1

any updates on this? or if it's not core Keras does anybody have a workaround to do this?

andrisecker on 8 Feb 2018

👍3

+1 on this too it would be nice to be able to use dropout in the bayesian sense at prediction time rather than just switch it off.

franciscovargas on 17 Feb 2018

👍2

I would also love this feature!

JamesAllingham on 17 Feb 2018

I recently tried to do something similar. This is a hacky and unoptimised way of enabling dropout at both test and training time but should do the trick.

import numpy as np
import tensorflow as tf
import keras.backend as K

class Dropout_permanent(Layer):
    def __init__(self, rate, **kwargs):
        super(Dropout_permanent, self).__init__(**kwargs)
        self.rate = min(1., max(0., rate))
        self.supports_masking = True

    def call(self, inputs, training=None):
        if 0. < self.rate < 1.:
            retain_prob = 1. - self.rate

            def dropped_inputs():
                return tf.nn.dropout(inputs, retain_prob, None, seed=np.random.randint(10e6))

            return K.in_train_phase(dropped_inputs, dropped_inputs, training=training)

        return inputs

    def get_config(self):
        config = {'rate': self.rate}
        base_config = super(Dropout_permanent, self).get_config()
        return dict(list(base_config.items()) + list(config.items()))

    def compute_output_shape(self, input_shape):
        return input_shape

Lif3line on 20 Mar 2018

There is a flag training=True, no need for hacks feature exists (its just not very well documented):
https://github.com/keras-team/keras/issues/9412#issuecomment-366487249

franciscovargas on 20 Mar 2018

👍5

Ah makes sense, it wasn't obvious to me from the source/docs either and I missed that thread, thanks for point it out.

Lif3line on 20 Mar 2018