Keras: Setting dropout rate via layer.rate doesn't work

Created on 18 Dec 2017 · 15Comments · Source: keras-team/keras

Hello there,

suppose you've defined a Keras Model with the functional API, and you want to change the dropout rate of the Dropout layers after you've instantiated the Model.
How do you do this?

I've tried to do the following:

from keras.layers import Dropout
for layer in model.layers:
    if isinstance(layer, Dropout):
        layer.rate = 0.0
        print layer.get_config()

Based on the updated config of the Dropout layers, this should work:

{'noise_shape': None, 'rate': 0.2, 'trainable': True, 'seed': None, 'name': 'dropout_1'} -> {'noise_shape': None, 'rate': 0.0, 'trainable': True, 'seed': None, 'name': 'dropout_1'}

However, I can tell you that this does not work: during training, the old dropout values are still used.
I've also tried to compile the model again after the layer loop (model.compile()) or even make a new model (model = Model(inputs=model.input, outputs=model.output)), but the problem still persists.

This issue can be easily tested with a VGG-like CNN with dropout layers and a small data sample (e.g. 100 images): just try to overfit the data.
If you instantiate the net with a dropout rate of e.g. 0.2, the model will have a hard time to overfit the small data sample. Using the above code snippet, which should set the dropout rate to 0, will not change anything.
However, if you directly instantiate the net with a dropout rate of 0.0, it will immediately overfit on the data sample.

Thus, it can be figured out that layer.rate changes the Dropout rate in the layer config, but somehow still the old dropout rate is used during training.

I've also tried to take a look into the Dropout layer sources.
The only thing I can think of is that maybe the __init__ of the Dropout layers is not called again after changing the rate, such that the old dropout rate is used in call:

    def __init__(self, rate, noise_shape=None, seed=None, **kwargs):
        super(Dropout, self).__init__(**kwargs)
        self.rate = min(1., max(0., rate))
        self.noise_shape = noise_shape
        self.seed = seed

But this is just a guess. I'm using Keras 2.1.2 with tensorflow backend.

Does anyone have an idea? Thanks a lot!

Source

ViaFerrata

👍2

Most helpful comment

Fixed formatting.

civilinformer on 12 Jun 2018

👍2

All 15 comments

Here is a sample code which checks if the rate is changed

import numpy as np
from keras import backend as K
from keras.layers import Dropout

dummy_input = np.ones((5,5))

K.set_learning_phase(1)
dropout_test = Dropout(0.3)
out_1 = dropout_test.call(dummy_input)
K.eval(out_1)

dropout_test.rate = 0.5
out_2 = dropout_test.call(dummy_input)
K.eval(out_2)

You can see that the dropout rate is different from the outputs.

mpariente on 20 Dec 2017

👍2

Thanks a lot! :)

I've tried your sample code with my loaded model as well (take one Dropout layer from the model), and changing the rate works in this case!
However, I couldn't test it for a rate of 0 - you get an error with the tf.backend ('numpy.ndarray' object has no attribute 'eval').

Unfortunately, this makes me wonder even more, why the rate is not changed during training with fit_generator.
Based on K.eval I can see the correct update of the dropout rate, but during training, the rate update (e.g. from 0.3 to 0 or the other way round) doesn't affect the training at all.

I.e. if I set the dropout rate to 0 during instantiating the model, I can overfit a small data sample quickly. Though, if I instantiate it with 0.3 and change it to 0 with layer.rate it still behaves like the model with a rate of 0.3 and thus cannot overfit. As a note, fit_generator is also immediately called after the K.eval test, so I'm quite puzzled.

ViaFerrata on 20 Dec 2017

Can you try with dropout_test.rate = K.epsilon() and tell me about the overfitting you talked about ?

mpariente on 20 Dec 2017

I tried it with epsilon and it works fine for K.eval, but still not while training.
So it looks like the dropout rate just remains unchanged for the training, no matter what rate you set.

About the overfitting:
Suppose you have a VGG-like CNN with a small data sample of e.g. 1000 images with batchsize 32.
The dropout is applied after every convolutional block and you have a binary classification problem (two classes, loss for random guessing is ~ 0.693).

Now you can try out three different things:

1) Instantiate the model with a dropout rate of e.g. 0.2.
Then, it should be at least a little bit difficult for the network to overfit on the small data sample.
This can be seen in the training log for the first 4 epochs:
31/31 [==============================] - 17s 540ms/step - loss: 0.8700 - acc: 0.5192 Test sample results: [0.69226435115260465, 0.52721774193548387] (['loss', 'acc']) 31/31 [==============================] - 10s 314ms/step - loss: 0.7775 - acc: 0.4960 Test sample results: [0.69207690031297742, 0.52923387096774188] (['loss', 'acc']) 31/31 [==============================] - 10s 315ms/step - loss: 0.7352 - acc: 0.5121 Test sample results: [0.6918414677343061, 0.52923387096774188] (['loss', 'acc']) 31/31 [==============================] - 10s 315ms/step - loss: 0.7161 - acc: 0.4950 Test sample results: [0.69223312024147277, 0.52923387096774188] (['loss', 'acc'])

Instantiate the model with Dropout(0) layers. The network is now able to overfit easily:

31/31 [==============================] - 15s 496ms/step - loss: 0.7272 - acc: 0.5111
Test sample results: [0.69330322742462158, 0.501008064516129] (['loss', 'acc'])
31/31 [==============================] - 9s 282ms/step - loss: 0.6709 - acc: 0.5746
Test sample results: [0.69320519508854039, 0.50201612903225812] (['loss', 'acc'])
31/31 [==============================] - 9s 285ms/step - loss: 0.6246 - acc: 0.6754
Test sample results: [0.69458910149912678, 0.48185483870967744] (['loss', 'acc'])
31/31 [==============================] - 9s 284ms/step - loss: 0.5719 - acc: 0.7651
Test sample results: [0.6960476886841559, 0.4848790322580645] (['loss', 'acc'])

Instantiate the model with a dropout rate of 0.2. After that, change the rate of each dropout layer to 0 via layer.rate = 0.
The config for each dropout layer says that it's successful: e.g. {'noise_shape': None, 'rate': 0.0, 'trainable': True, 'seed': None, 'name': 'dropout_1'}

Now, the network should be able to overfit again, but in practice it can't:
```
31/31 [==============================] - 17s 542ms/step - loss: 0.8120 - acc: 0.4859
Test sample results: [0.69336761390009238, 0.48790322580645162] (['loss', 'acc'])
31/31 [==============================] - 10s 315ms/step - loss: 0.7656 - acc: 0.5010
Test sample results: [0.69374309432122017, 0.47681451612903225] (['loss', 'acc'])
31/31 [==============================] - 10s 315ms/step - loss: 0.7337 - acc: 0.5282
Test sample results: [0.69283238341731412, 0.51209677419354838] (['loss', 'acc'])
31/31 [==============================] - 10s 316ms/step - loss: 0.7366 - acc: 0.5060
Test sample results: [0.69180465898206156, 0.52923387096774188] (['loss', 'acc'])

Actually, if you look at the starting loss in the first epoch, you can also see that 3) does not work: the initial loss of a network without Dropout should be lower than the initial loss for a network with Dropout (if you compare with 2.).

Anyways, so I would've thought that I'm just using the wrong syntax or something similar, but looks like this isn't the case.

ViaFerrata on 20 Dec 2017

👍1

Thanks for your experiments. They were very useful. I believe the issue is that the variable that you are trying to change in the Dropout Layer is not a tensorflow variable, so it never gets updated in the backend. I did some similar experiments with a slightly modified Dropout layer and associated callback and it seems to work:

class MyDropout(Layer):
    @interfaces.legacy_dropout_support
    def __init__(self, rate, noise_shape=None, seed=None, **kwargs):
        super(MyDropout, self).__init__(**kwargs)
        self.rate = K.variable(min(1., max(0., rate)))
        self.noise_shape = noise_shape
        self.seed = seed
        self.supports_masking = True

    def _get_noise_shape(self, inputs):
        if self.noise_shape is None:
            return self.noise_shape

        symbolic_shape = K.shape(inputs)
        noise_shape = [symbolic_shape[axis] if shape is None else shape
                       for axis, shape in enumerate(self.noise_shape)]
        return tuple(noise_shape)

    def call(self, inputs, training=None):
        if 0. < K.get_value(self.rate) < 1.:
            noise_shape = self._get_noise_shape(inputs)

            def dropped_inputs():
                return K.dropout(inputs, self.rate, noise_shape,
                                 seed=self.seed)
            return K.in_train_phase(dropped_inputs, inputs,
                                    training=training)
        return inputs

    def get_config(self):
        config = {'rate': K.get_value(self.rate),
                  'noise_shape': self.noise_shape,
                  'seed': self.seed}
        base_config = super(MyDropout, self).get_config()
        return dict(list(base_config.items()) + list(config.items()))

    def compute_output_shape(self, input_shape):
        return input_shape

class DropoutReducer(Callback):
    def __init__(self, patience=0, reduce_rate=0.5, verbose=1, 
                monitor='val_loss', **kwargs):
        super(DropoutReducer, self).__init__(**kwargs)
        self.patience = patience
        self.wait = 0
        self.best_score = -1.
        self.reduce_rate = reduce_rate
        self.verbose = verbose
        self.monitor = monitor
        self.TAG = "DROPOUT REDUCER: "
        self.callno = -1
        self.dropout_rate = -1

    def on_epoch_end(self, epoch, logs={}):
        current_score = logs.get(self.monitor)
        if self.verbose == 2:
            print(self.TAG + "---Current score: {:.4f} vs best score is: 
                     {:.4f}".format(current_score,self.best_score))
        self.callno += 1
        if self.callno == 0:
            self.best_score = current_score
        elif current_score < self.best_score:
            self.best_score = current_score
            self.wait = 0
        else:
            if self.wait >= self.patience:
                if self.verbose:
                    print(self.TAG + '---Reducing Dropout Rate...')
                found_layers = 0
                for layer in self.model.layers:
                    if isinstance(layer,Model):
                        for lay in layer.layers:
                            if self.verbose == 2:
                                print(lay)
                            if isinstance(lay, MyDropout):
                                self.dropout_rate = self.reduce_rate * K.get_value(lay.rate)
                                K.set_value(lay.rate, self.dropout_rate )
                                found_layers = found_layers + 1 
                if self.verbose:
                    print(self.TAG+ 'Found {} Dropout layers and reduced dropout rate to 
                            {}.'.format(found_layers,self.dropout_rate))
                self.wait = 0
            else:
                self.wait += 1

civilinformer on 22 May 2018

Sorry for the bad formatting here. The weird search through the layers is needed for multi-gpu models.

civilinformer on 22 May 2018

Is it possible to fix the formatting? I'm having trouble digesting the entire code with current formatting.

moondra2017 on 12 Jun 2018

Fixed formatting.

civilinformer on 12 Jun 2018

👍2

LTTP

Thanks for your experiments. They were very useful. I believe the issue is that the variable that you are trying to change in the Dropout Layer is not a tensorflow variable, so it never gets updated in the backend.

I just wanted to say that I experienced the same when changing the l1 and l2 values of a kernel regularizer while training. In the source code you can see that they are not backend variables, so it does not matter if you change them after compiling the model.

In the end, I had to do the same. Define a custom regularizer and use
tf.keras.backend.variable for l1 and l2 members.

fediazgon on 2 Nov 2018

LTTP

Thanks for your experiments. They were very useful. I believe the issue is that the variable that you are trying to change in the Dropout Layer is not a tensorflow variable, so it never gets updated in the backend.

I just wanted to say that I experienced the same when changing the l1 and l2 values of a kernel regularizer while training. In the source code you can see that they are not backend variables, so it does not matter if you change them after compiling the model.

In the end, I had to do the same. Define a custom regularizer and use tf.keras.backend.variable for l1 and l2 members.

Do you have a link to your code?
I want to avoid making changes to the source code if possible.

moondra2017 on 16 Jan 2019

Fixed formatting.

Thank you. Is this still the only way to do so? I would like to avoid making changes to the source code as it may break something and I may not be able to locate the source of the bug!

moondra2017 on 16 Jan 2019

After one year that has passed, I've found out that you can use the keras clone_model function in order to change the dropout rate "easily".

1) change the rate via layer.rate
2) Use ks.models.clone_model to clone the model (= rebuilds it, I've done this manually till now)
3) set_weights of cloned model with get_weights

ViaFerrata on 18 Feb 2019

ViaFerrata, after you clone do you make other updates or modifications to the cloned model before using it? When I use clone_model and set weights as you describe, there are many things missing from the model such as optimizer, loss, metrics, loss_weights, etc.

These four steps seem to work:

change the rates in the layers
Use keras.models.clone_model
Compile the clone
set_weights of cloned model with get_weights

daarong on 25 Mar 2019

Yes. you still have to compile it, since it's a fresh model. Just forgot to mention that.

There's also a problem with the required compilation:
You'll actually lose the weights of your optimizer, e.g. if you use ADAM, by compiling again. Didn't find an easy way to fix this till now.

ViaFerrata on 26 Mar 2019

class MyDropout(Layer):
    @interfaces.legacy_dropout_support
    def __init__(self, rate, noise_shape=None, seed=None, **kwargs):
        super(MyDropout, self).__init__(**kwargs)
        self.rate = K.variable(min(1., max(0., rate)))
        self.noise_shape = noise_shape
        self.seed = seed
        self.supports_masking = True

    def _get_noise_shape(self, inputs):
        if self.noise_shape is None:
            return self.noise_shape

        symbolic_shape = K.shape(inputs)
        noise_shape = [symbolic_shape[axis] if shape is None else shape
                       for axis, shape in enumerate(self.noise_shape)]
        return tuple(noise_shape)

    def call(self, inputs, training=None):
        if 0. < K.get_value(self.rate) < 1.:
            noise_shape = self._get_noise_shape(inputs)

            def dropped_inputs():
                return K.dropout(inputs, self.rate, noise_shape,
                                 seed=self.seed)
            return K.in_train_phase(dropped_inputs, inputs,
                                    training=training)
        return inputs

    def get_config(self):
        config = {'rate': K.get_value(self.rate),
                  'noise_shape': self.noise_shape,
                  'seed': self.seed}
        base_config = super(MyDropout, self).get_config()
        return dict(list(base_config.items()) + list(config.items()))

    def compute_output_shape(self, input_shape):
        return input_shape

class DropoutReducer(Callback):
    def __init__(self, patience=0, reduce_rate=0.5, verbose=1, 
                monitor='val_loss', **kwargs):
        super(DropoutReducer, self).__init__(**kwargs)
        self.patience = patience
        self.wait = 0
        self.best_score = -1.
        self.reduce_rate = reduce_rate
        self.verbose = verbose
        self.monitor = monitor
        self.TAG = "DROPOUT REDUCER: "
        self.callno = -1
        self.dropout_rate = -1

    def on_epoch_end(self, epoch, logs={}):
        current_score = logs.get(self.monitor)
        if self.verbose == 2:
            print(self.TAG + "---Current score: {:.4f} vs best score is: 
                     {:.4f}".format(current_score,self.best_score))
        self.callno += 1
        if self.callno == 0:
            self.best_score = current_score
        elif current_score < self.best_score:
            self.best_score = current_score
            self.wait = 0
        else:
            if self.wait >= self.patience:
                if self.verbose:
                    print(self.TAG + '---Reducing Dropout Rate...')
                found_layers = 0
                for layer in self.model.layers:
                    if isinstance(layer,Model):
                        for lay in layer.layers:
                            if self.verbose == 2:
                                print(lay)
                            if isinstance(lay, MyDropout):
                                self.dropout_rate = self.reduce_rate * K.get_value(lay.rate)
                                K.set_value(lay.rate, self.dropout_rate )
                                found_layers = found_layers + 1 
                if self.verbose:
                    print(self.TAG+ 'Found {} Dropout layers and reduced dropout rate to 
                            {}.'.format(found_layers,self.dropout_rate))
                self.wait = 0
            else:
                self.wait += 1

Can we set or currently use your custom Dropout layer as the layer which adaptively changes the rate according to epoch number?