Keras: dropout in training and testing

Created on 11 Feb 2017  ·  17Comments  ·  Source: keras-team/keras

In this link, devinplatt gives the following way to include dropout in training,

model = Sequential()
model.add(Dropout(0.5, input_shape=(20,)))
model.add(Dense(64, init='uniform'))

In this post, author mentioned that “Finally, if the training has finished, you’d use the complete network for testing (or in other words, you set the dropout probability to 0).”

In terms of keras implementation, does that mean, we have to modify the line model.add(Dropout(0.5, input_shape=(20,))) after we loading the training weight.

stale

Most helpful comment

@unrealwill There is another use case of dropout at testing or inference time: in order to get a notion of uncertainty and variability in the prediction of the network model, you might take a given input and run predict on it many times, each with different randomly assigned dropout neurons.

Say you run predict 100 times for a single test input. The average of these will approximate what you get with no dropout, the 'expected value' over different weight schemes. And various metrics like the standard deviation of these results will give you a sense of the error bounds of your estimate (conditioned on assumptions about the validity of the underlying model structure).

In this sense, it would be very useful to have to ability to re-activate Dropout settings from training, but specifically during testing or regular inference.

All 17 comments

Hello,
By looking at the source code :
https://github.com/fchollet/keras/blob/master/keras/layers/core.py#L111
x = K.in_train_phase(dropped_inputs, lambda: x)

You can see that dropout is only applied in train phase.

That is correct - dropout should be applied during training (drop inputs with probability p) but there also needs to be a corresponding component of scaling the weights at test time as outlined in the referenced paper

I guess this is not happening at the moment, at least the results I got thus far might indicate that there is an issue here. Will investigate this further and see if I can provide an example.

Hello, @radekosmulski
This is not a problem. See issue https://github.com/fchollet/keras/issues/3305.
Keras use inverse scaling during training (so that remaining weights are increased during training).
See :

def dropped_inputs():
  return K.dropout(x, self.p, noise_shape, seed=self.seed)

https://github.com/fchollet/keras/blob/master/keras/layers/core.py#L110

Thank you for your reply @unrealwill. I am new to keras so sorry if I misunderstand something. I still feel there is something unusual when running model.predict or model.evaluate when using dropout. Please see below:

import keras
import numpy as np

X = np.array(
    [[2, 1],
     [4, 2]])
y = np.array(
    [[5],
     [10]]
)

# Works as expected without dropout
model = keras.models.Sequential()
model.add(keras.layers.Dense(input_dim=2, output_dim=1))
model.compile(keras.optimizers.SGD(), loss='MSE')
model.fit(X, y, nb_epoch=10000, verbose=0)
model.evaluate(X, y) # => ~0

# With dropout
model = keras.models.Sequential()
model.add(keras.layers.Dense(input_dim=2, output_dim=1))
model.add(keras.layers.Dropout(0.5))
model.compile(keras.optimizers.SGD(), loss='MSE')
model.fit(X, y, nb_epoch=10000, verbose=0)
model.evaluate(X, y) # => converges to MSE of 15.625

model.predict(X) # => array([[ 2.5],
                 #          [ 5. ]], dtype=float32)

The MSE this converges to is due to the outputs being exactly half of what they should be (2.5^2+5^2)/2 = 15.625

@radekosmulski
The Dropout noise introduce bias as it is a non symmetric noise.
Dropout shouldn't be added as a last layer (which we normally don't do).
Because "mse" is convex, Jensen inequality applies and you are training to learn the bias of the noise.

The bias of the dropout can be subsequently removed by using a dense layer after the first layer (=>average result = 7.5 ).
And if you had more hidden cells (100) you average the noise out, and get what you want.

import keras
import numpy as np

X = np.array(
    [[2, 1],
     [4, 2]])
y = np.array(
    [[5],
     [10]]
)

# Works as expected without dropout
model = keras.models.Sequential()
model.add(keras.layers.Dense(input_dim=2, output_dim=1))
model.compile(keras.optimizers.SGD(), loss='MSE')
model.fit(X, y, nb_epoch=10000, verbose=0)
print model.evaluate(X, y) # => ~0

# With dropout
model = keras.models.Sequential()
model.add(keras.layers.Dense(input_dim=2, output_dim=100))
model.add(keras.layers.Dropout(0.5))
model.add(keras.layers.Dense(1))
model.compile(keras.optimizers.adam(), loss='MSE')
model.fit(X, y, nb_epoch=100000, verbose=0)
print model.evaluate(X, y) # => converges to MSE of 15.625

print model.predict(X) # => array([[ 4.91],
                 #          [ 9.96 ]], dtype=float32)

@unrealwill thank you very much for taking the time to reply, I really appreciate it. I understand now.

This issue has been automatically marked as stale because it has not had recent activity. It will be closed after 30 days if no further activity occurs, but feel free to re-open a closed issue if needed.

@unrealwill There is another use case of dropout at testing or inference time: in order to get a notion of uncertainty and variability in the prediction of the network model, you might take a given input and run predict on it many times, each with different randomly assigned dropout neurons.

Say you run predict 100 times for a single test input. The average of these will approximate what you get with no dropout, the 'expected value' over different weight schemes. And various metrics like the standard deviation of these results will give you a sense of the error bounds of your estimate (conditioned on assumptions about the validity of the underlying model structure).

In this sense, it would be very useful to have to ability to re-activate Dropout settings from training, but specifically during testing or regular inference.

I'd also like to +1 the ability to turn on or off dropout layers during test time. This is the paper that @spearsem is referring to: "Dropout as a Bayesian Approximation:
Representing Model Uncertainty in Deep Learning" http://proceedings.mlr.press/v48/gal16.pdf

any updates on this? or if it's not core Keras does anybody have a workaround to do this?

+1 on this too it would be nice to be able to use dropout in the bayesian sense at prediction time rather than just switch it off.

I would also love this feature!

I recently tried to do something similar. This is a hacky and unoptimised way of enabling dropout at both test and training time but should do the trick.

import numpy as np
import tensorflow as tf
import keras.backend as K

class Dropout_permanent(Layer):
    def __init__(self, rate, **kwargs):
        super(Dropout_permanent, self).__init__(**kwargs)
        self.rate = min(1., max(0., rate))
        self.supports_masking = True

    def call(self, inputs, training=None):
        if 0. < self.rate < 1.:
            retain_prob = 1. - self.rate

            def dropped_inputs():
                return tf.nn.dropout(inputs, retain_prob, None, seed=np.random.randint(10e6))

            return K.in_train_phase(dropped_inputs, dropped_inputs, training=training)

        return inputs

    def get_config(self):
        config = {'rate': self.rate}
        base_config = super(Dropout_permanent, self).get_config()
        return dict(list(base_config.items()) + list(config.items()))

    def compute_output_shape(self, input_shape):
        return input_shape

There is a flag training=True, no need for hacks feature exists (its just not very well documented):
https://github.com/keras-team/keras/issues/9412#issuecomment-366487249

Ah makes sense, it wasn't obvious to me from the source/docs either and I missed that thread, thanks for point it out.

There is a flag training=True, no need for hacks feature exists (its just not very well documented):
#9412 (comment)

Not sure if that works for models using Sequential() it is documented for the functional API only.

If dropout is deactivated when predicting, does Keras scale the weights as this paper does?

Was this page helpful?
0 / 5 - 0 ratings