BN(Batch Normalization) is a popular method used in neural network as it often reduces training time and improves generalization to some extent.
During inference, BN uses approximated per-channel mean and variance, which makes compilers have it merged with the preceding convolution layer. It saves computational resources and simplifies the network.
Actually, fusion of Convolution and BN is already done by the compiler. Going further, I want to see if it is possible to fuse with TCONV(Transposed convolution) which looks similar to convolution.
import tensorflow as tf
import numpy as np
tf.compat.v1.disable_eager_execution()
def batch_norm(X, scale, offset):
mean = np.mean(X)
std = np.std(X)
return (X - mean) / std * scale + offset
X = np.arange(1,5).reshape((1,2,2,1))
W = np.ones(9).reshape((3,3,1,1))
input_ = tf.compat.v1.constant(X, dtype=tf.float32)
filter_ = tf.compat.v1.constant(W, dtype=tf.float32)
tconv_ = tf.compat.v1.nn.conv2d_transpose(input_, filter_, output_shape=(1,4,4,1), strides=[1,1,1,1], padding='VALID')
with tf.compat.v1.Session() as sess:
tconv_out = sess.run(tconv_)
scale = 1.05
offset = 0.015
# BN with numpy
print('-' * 10, 'numpy batch_norm', '-' * 10)
print(batch_norm(tconv_out, scale, offset))
# BN with tensorflow
print('-' * 10, 'tensorflow batch_norm', '-' * 10)
scale_ = tf.compat.v1.constant([1.05], dtype=tf.float32)
offset_ = tf.compat.v1.constant([0.015], dtype=tf.float32)
mean_ = tf.compat.v1.constant([np.mean(tconv_out)], dtype=tf.float32)
variance_ = tf.compat.v1.constant([np.var(tconv_out)], dtype=tf.float32)
bn_out, _, _ = tf.compat.v1.nn.fused_batch_norm(tconv_, scale_, offset_, mean=mean_, variance=variance_, epsilon=0, is_training=False)
with tf.compat.v1.Session() as sess:
bn_out = sess.run(bn_out)
print(bn_out)
# BN folding in TCONV
folded_W = W * scale / np.std(tconv_out)
folded_offset = offset - scale * np.mean(tconv_out) / np.std(tconv_out)
folded_filter = tf.compat.v1.constant(folded_W, dtype=tf.float32)
folded_tconv = tf.compat.v1.nn.conv2d_transpose(input_, folded_filter, output_shape=(1,4,4,1), strides=[1,1,1,1], padding='VALID')
with tf.compat.v1.Session() as sess:
folded_out = sess.run(folded_tconv)
print('-' * 10, 'folded tconv', '-' * 10)
print(folded_out + folded_offset)
---------- numpy batch_norm ----------
[[[[-1.6051569 ]
[-0.90454847]
[-0.90454847]
[-1.2548528 ]]
[[-0.5542443 ]
[ 1.5475807 ]
[ 1.5475807 ]
[ 0.14636408]]
[[-0.5542443 ]
[ 1.5475807 ]
[ 1.5475807 ]
[ 0.14636408]]
[[-0.90454847]
[ 0.49666822]
[ 0.49666822]
[-0.5542443 ]]]]
---------- tensorflow batch_norm ----------
[[[[-1.6051562 ]
[-0.90454805]
[-0.90454805]
[-1.254852 ]]
[[-0.55424404]
[ 1.5475801 ]
[ 1.5475801 ]
[ 0.1463641 ]]
[[-0.55424404]
[ 1.5475801 ]
[ 1.5475801 ]
[ 0.1463641 ]]
[[-0.90454805]
[ 0.49666798]
[ 0.49666798]
[-0.55424404]]]]
---------- folded tconv ----------
[[[[-1.6051569 ]
[-0.9045485 ]
[-0.9045485 ]
[-1.2548528 ]]
[[-0.5542443 ]
[ 1.5475811 ]
[ 1.5475811 ]
[ 0.1463641 ]]
[[-0.5542443 ]
[ 1.5475811 ]
[ 1.5475811 ]
[ 0.1463641 ]]
[[-0.9045485 ]
[ 0.49666822]
[ 0.49666822]
[-0.5542443 ]]]]
The values seems to be same:)
The last one has minor diff :)
[-0.5542443 ]]]]
[-0.55424404]]]]
[-0.5542443 ]]]]

There are many ways of thinking about transposed convolution. One of them is from the perspective of a cell in the output. If you pay attention to the output cell when you calculate the TCONV, you will notice that the input is multiplied by the flipped kernel. This is why we call it Transposed Convolution. It isn't actually "transposed" though.
Therefore, calculation of TCONV is like a normal convolution(limited to stride 1). And, since we can think of BN as 1x1 convolution, fusing it with TCONV layer is possible.
import tensorflow as tf
import numpy as np
tf.compat.v1.disable_eager_execution()
input_ = tf.compat.v1.placeholder(tf.float32, shape=(1,2,2,1), name="Hole")
W = np.ones(9).reshape((3,3,1,1))
filter_ = tf.compat.v1.constant(W, dtype=tf.float32)
tconv_ = tf.compat.v1.nn.conv2d_transpose(input_, filter_, output_shape=(1,4,4,1), strides=[1,1,1,1], padding='VALID')
scale_ = tf.compat.v1.constant([1.0177339315414429], dtype=tf.float32)
offset_ = tf.compat.v1.constant([0.015628524124622345], dtype=tf.float32)
mean_ = tf.compat.v1.constant([1.027155211195349693], dtype=tf.float32)
variance_ = tf.compat.v1.constant([0.25580066442489624], dtype=tf.float32)
bn_out, _, _ = tf.compat.v1.nn.fused_batch_norm(tconv_, scale_, offset_, mean=mean_, variance=variance_, epsilon=0.0010000000474974513, is_training=False)


After #3857, value test works well but there's name difference.
$ h5diff -d 0.001 Net_TConv_BN_000.expected.h5 Net_TConv_BN_000.opt.expected.h5
attribute: <0 of </name>> and <0 of </name>>
15 differences found
Can we check circle.TConv with bias in netron-circle?
@cgbahk Yes, you can check it from build/compiler/circle2circle-dredd-recipe-test/Net_TConv_BN_000.opt.circle.

FYI, as of now, there's no test that tests TCONV_BN_folding pass with stride or padding. This's gonna added soon.
Could you please check https://github.com/Samsung/ONE/pull/4022#discussion_r478798361 as well ? :smiley: