Keras: Results of model losses with assigned "loss_weights" make no sense

Created on 28 Jul 2017 · 2Comments · Source: keras-team/keras

I am training a model with merge layer. I assign [0.5, 0.5] as the "loss_weights" for two loss functions since I want the loss to be

loss = m1_loss * 0.5 + m2_loss * 0.5

But, the losses of one epoch look like this

loss: 2.2328 - m1_loss: 0.3732 - m2_loss: 0.2399

After checking the documents, still no clue.

loss_weights: Optional list or dictionary specifying scalar coefficients (Python floats) to weight the loss contributions of different model outputs. The loss value that will be minimized by the model will then be the weighted sum of all individual losses, weighted by the loss_weights coefficients. If a list, it is expected to have a 1:1 mapping to the model's outputs. If a tensor, it is expected to map output names (strings) to scalar coefficients.

Besides,

The loss value that will be minimized by the model will then be the weighted sum of all individual losses, weighted by the loss_weights coefficients.

sounds weird. Please check grammar.

Thanks.

tensorflow

Source

Natural209X

Most helpful comment

I observed the same phenomenon and found that if some of your layers use regularization, the corresponding regularization losses will also be added to the total loss, but not be reported anywhere. This is stated in the documentation for regularizers:

Regularizers allow to apply penalties on layer parameters or layer activity during optimization. These penalties are incorporated in the loss function that the network optimizes.

If you happen to use L2 regularization, your loss actually is

loss = m1_loss * 0.5 + m2_loss * 0.5 + l2_reg * ||w||^2 ,

where l2_reg is the parameter of the regularizer, serving as a loss weight, and w is the weight tensor of the regularized layer(s).

Callidior on 18 Sep 2017

👍6 ❤2

All 2 comments

Regularizers allow to apply penalties on layer parameters or layer activity during optimization. These penalties are incorporated in the loss function that the network optimizes.

If you happen to use L2 regularization, your loss actually is

loss = m1_loss * 0.5 + m2_loss * 0.5 + l2_reg * ||w||^2 ,

where l2_reg is the parameter of the regularizer, serving as a loss weight, and w is the weight tensor of the regularized layer(s).

Callidior on 18 Sep 2017

👍6 ❤2

This issue has been answered and hence closing it. Please add your comments if any and we will reopen. Thanks !

Harshini-Gadige on 14 Nov 2018

Was this page helpful?

0 / 5 - 0 ratings

Related issues

Can we define each time step of a RNN with different length?

NancyZxll · 3Comments

IndexError: index 197 is out of bounds for axis 1 with size 2

KeironO · 3Comments

showing raise KeyError('%s not in index' % objarr[mask])

vinayakumarr · 3Comments

New predict API for multiple outputs

snakeztc · 3Comments

Model with Dropout layer wrapped in TimeDistributed fails on Theano

somewacko · 3Comments