I read your paper Concrete Dropout. I find an inconsistency of your code and paper.
The regularizer of kernel matrix should be proportional to 1-p. (Eq.(3) of your paper)
But in your code, it is inversely proportional to 1-p.
kernel_regularizer = self.weight_regularizer * K.sum(K.square(weight)) / (1. - self.p)
I am not sure whether I misunderstand your paper or code.
that's because we reparametrise Wz (with z~Bern(p)^K) as Wz/(1-p) for it to have mean W. Then K.square(weight) has an added term 1/(1-p)^2 which cancels out the 1-p, giving 1/(1-p).
Could you please give some information how to derive equ(3) in this paper?
Could you please give some information how to derive equ(3) in this paper?
XinDongol its Proposition 1 of Dropout as a Bayesian Approximation Appendix 1 (Gal's previous paper )
Most helpful comment
that's because we reparametrise
Wz(withz~Bern(p)^K) asWz/(1-p)for it to have meanW. ThenK.square(weight)has an added term1/(1-p)^2which cancels out the1-p, giving1/(1-p).