The documentation says l1 and l2 parameter of ActivityRegularization layer should be “positive float”, but the default values for them are both 0.0. Shouldn’t the documentation should say “nonnegative float”?
sure should. But why there is an constraint on the sign at all?
I checked, and ActivityRegularization does not complain about negative values, which I think is a good thing, but leaving it undocumented for now.
total loss = loss(w)+ r||w|| where loss is [mse, ce, etc...], r is the activity regularizer constant, w is the weights, and || || is some norm.
If r is negative you can minimize the total loss by setting all of the weights to infinity.
Regularizers are a penalty to reduce the weights, which is why r >= 0
If r is negative you can minimize the total loss by setting all of the weights to infinity.
not if loss(w) -> +inf faster. mse, ce aren't the only losses possible.