Keras: Difference between Activity and weight regularizer

Created on 17 Jul 2016 · 10Comments · Source: keras-team/keras

Hi, this is more of a question, than an issue.. I want to know conceptually what is the difference between Activity and Weight regularizer?

How to decide between using either of them?

I'm fine tuning AlexNet for a problem. As usual, I'm popping out the last Dense layer, followed by adding a new Dense layer. I don't want to freeze the initial feature layers, but I want the feature layers to be updated at a slower rate than the new Dense layer I added. I got the impression that Weight regularizers are the way to do that (apply different regularization values at different layers).

Also, just to clarify, higher values of regularizer implies slower update right, since higher regularizer means more emphasis on regularization and less on the error rate?

stale

Source

sanjeevmk

👍13

Most helpful comment

Hi,

I think I can answer the question. Weight regularizers are used to
regularize the weights in the neural network.
Activity regularizers however are used to regularize the output of a neural
network. i.e. if a loss function = DataLoss + regularizationLoss.

In weight regularizer, regularizationLoss = f(Weights in a network)
Activity regularizer, regularizationLoss = f(Predicted outputs from a
network). I think a scenario to use this would be when you know
distribution of your test dataset?

On Sun, Jul 17, 2016 at 3:27 PM, sanjeevmk [email protected] wrote:

Hi, this is more of a question, than an issue.. I want to know
conceptually what is the difference between Activity and Weight
regularizer?

How to decide between using either of them?

I'm fine tuning AlexNet for a problem. As usual, I'm popping out the last
Dense layer, followed by adding a new Dense layer. I don't want to freeze
the initial feature layer, but I want the feature layers to be updated at a
slower rate than the new Dense layer I added. I got the impression that
Weight regularizers are the way to do that (apply different regularization
values at different layers).

Also, just to clarify, higher values of regularizer implies slower update
right, since higher regularizer means more emphasis on regularization and
less on the error rate?

—
You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub
https://github.com/fchollet/keras/issues/3236, or mute the thread
https://github.com/notifications/unsubscribe-auth/ABkKwN2CmXCjOT5f-vFD5_4wxdiFMA7_ks5qWfyPgaJpZM4JOMYK
.

Regards,
R.V.

gravity1989 on 17 Jul 2016

👍16 ❤3

All 10 comments

Hi,

On Sun, Jul 17, 2016 at 3:27 PM, sanjeevmk [email protected] wrote:

Hi, this is more of a question, than an issue.. I want to know
conceptually what is the difference between Activity and Weight
regularizer?

How to decide between using either of them?

I'm fine tuning AlexNet for a problem. As usual, I'm popping out the last
Dense layer, followed by adding a new Dense layer. I don't want to freeze
the initial feature layer, but I want the feature layers to be updated at a
slower rate than the new Dense layer I added. I got the impression that
Weight regularizers are the way to do that (apply different regularization
values at different layers).

Also, just to clarify, higher values of regularizer implies slower update
right, since higher regularizer means more emphasis on regularization and
less on the error rate?

—
You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub
https://github.com/fchollet/keras/issues/3236, or mute the thread
https://github.com/notifications/unsubscribe-auth/ABkKwN2CmXCjOT5f-vFD5_4wxdiFMA7_ks5qWfyPgaJpZM4JOMYK
.

Regards,
R.V.

gravity1989 on 17 Jul 2016

👍16 ❤3

Activity regularizer will tend to make the output of the layer smaller. What it means for the weights depend on the network (nonlinearities, topology) and its configuration (weight values).
If you use it in the output layer it will bias the distribution of outputs.
On the other hand, weight regularizer will constantly decay the weights.

sileod on 18 Jul 2016

👍7 ❤2

I think what you're looking for fine-tuning wise is here: https://github.com/fchollet/keras/pull/3004

benedictb on 5 Aug 2016

weight regularizer is to regularize your weight, simply put, your matrix --> lead to robust to data sensetivity...however, if you normalize your target and add batch normalization in your net, you should be able to not using this regularizer.

activity regularizer is to regularize your hidden unit ---> e.g., sparse encoder

pswpswpsw on 11 Dec 2016

👍8

And what is kernel regularizer? I couldn't find docs for weight regularizer here https://keras.io/regularizers/

Zhomart on 14 Apr 2017

maybe add some use case for each to /examples ?

mingwugmail on 20 Apr 2017

@pswpswpsw

.however, if you normalize your target and add batch normalization in your net, you should be able to not using this regularizer.

Could expand on that or refer to some source? Would like to know more about why.

cnheider on 19 May 2017

@Zhomart The kernel regularizer is another name for the weight regularizer: kernel_regularizer: Regularizer function applied to thekernelweights matrix

The docs should definitely mention this along with definitions of all three.

peterfig on 4 Jun 2017

👍1

I am having a problem with this as well. I have attempted to use the activity_regularization parameter and it appears the the LSTM model does not support this.

encoded = LSTM(latent_dim, activity_regularizer = activity_l1l2(l1=0.01,l2=0.01))(inputs)

I have used the exact code from the Auto encoder sample on the keras documentation. The error I get is the following

TypeError Traceback (most recent call last)
in ()
13 inputs = Input(shape=(timesteps, input_dim))
14 # They get encoded into a certain latent dimension
---> 15 encoded = LSTM(latent_dim, activity_regularizer = activity_l1l2(l1=0.01,l2=0.01))(inputs)
16
17 #we then take the encoded vector and run it through another lstm

/home/carnd/anaconda3/envs/dl/lib/python3.5/site-packages/keras/layers/recurrent.py in __init__(self, output_dim, init, inner_init, forget_bias_init, activation, inner_activation, W_regularizer, U_regularizer, b_regularizer, dropout_W, dropout_U, kwargs)
696 if self.dropout_W or self.dropout_U:
697 self.uses_learning_phase = True
--> 698 super(LSTM, self).__init__(kwargs)
699
700 def build(self, input_shape):

/home/carnd/anaconda3/envs/dl/lib/python3.5/site-packages/keras/layers/recurrent.py in __init__(self, weights, return_sequences, go_backwards, stateful, unroll, consume_less, input_dim, input_length, kwargs)
180 if self.input_dim:
181 kwargs['input_shape'] = (self.input_length, self.input_dim)
--> 182 super(Recurrent, self).__init__(kwargs)
183
184 def get_output_shape_for(self, input_shape):

/home/carnd/anaconda3/envs/dl/lib/python3.5/site-packages/keras/engine/topology.py in __init__(self, **kwargs)
324 for kwarg in kwargs.keys():
325 if kwarg not in allowed_kwargs:
--> 326 raise TypeError('Keyword argument not understood:', kwarg)
327 name = kwargs.get('name')
328 if not name:

TypeError: ('Keyword argument not understood:', 'activity_regularizer')

JohnnyRisk on 7 Jun 2017

This issue has been automatically marked as stale because it has not had recent activity. It will be closed after 30 days if no further activity occurs, but feel free to re-open a closed issue if needed.