I've taken a look at Keras' batch normalization code and it looks like it's normalising along the wrong axis
https://github.com/fchollet/keras/blob/master/keras/layers/normalization.py#L64
This means that the axis parameter specifies the dimension to leave alone rather than the dimensions to compute the mean and std over
It's very simple:
For Dense layer, all RNN layers and most other types of layers, the default of axis=-1 is what you should use,
For Convolution2D layers with dim_ordering="th" (the default), use axis=1,
For Convolution2D layers with dim_ordering="tf", use axis=-1 (i.e. the default).
Most helpful comment
It's very simple:
For
Denselayer, all RNN layers and most other types of layers, the default ofaxis=-1is what you should use,For
Convolution2Dlayers withdim_ordering="th"(the default), useaxis=1,For
Convolution2Dlayers withdim_ordering="tf", useaxis=-1(i.e. the default).