I am training a model with merge layer. I assign [0.5, 0.5] as the "loss_weights" for two loss functions since I want the loss to be
loss = m1_loss * 0.5 + m2_loss * 0.5
But, the losses of one epoch look like this
- loss: 2.2328 - m1_loss: 0.3732 - m2_loss: 0.2399
After checking the documents, still no clue.
loss_weights: Optional list or dictionary specifying scalar coefficients (Python floats) to weight the loss contributions of different model outputs. The loss value that will be minimized by the model will then be the weighted sum of all individual losses, weighted by the loss_weights coefficients. If a list, it is expected to have a 1:1 mapping to the model's outputs. If a tensor, it is expected to map output names (strings) to scalar coefficients.
Besides,
The loss value that will be minimized by the model will then be the weighted sum of all individual losses, weighted by the loss_weights coefficients.
sounds weird. Please check grammar.
Thanks.
I observed the same phenomenon and found that if some of your layers use regularization, the corresponding regularization losses will also be added to the total loss, but not be reported anywhere. This is stated in the documentation for regularizers:
Regularizers allow to apply penalties on layer parameters or layer activity during optimization. These penalties are incorporated in the loss function that the network optimizes.
If you happen to use L2 regularization, your loss actually is
loss = m1_loss * 0.5 + m2_loss * 0.5 + l2_reg * ||w||^2 ,
where l2_reg is the parameter of the regularizer, serving as a loss weight, and w is the weight tensor of the regularized layer(s).
This issue has been answered and hence closing it. Please add your comments if any and we will reopen. Thanks !
Most helpful comment
I observed the same phenomenon and found that if some of your layers use regularization, the corresponding regularization losses will also be added to the total loss, but not be reported anywhere. This is stated in the documentation for regularizers:
If you happen to use L2 regularization, your loss actually is
where
l2_regis the parameter of the regularizer, serving as a loss weight, andwis the weight tensor of the regularized layer(s).