One: Compiler FE: Batch Norm folding in Transposed Conv2D

Created on 30 Jul 2020 · 8Comments · Source: Samsung/ONE

BN(Batch Normalization) is a popular method used in neural network as it often reduces training time and improves generalization to some extent.

During inference, BN uses approximated per-channel mean and variance, which makes compilers have it merged with the preceding convolution layer. It saves computational resources and simplifies the network.

Actually, fusion of Convolution and BN is already done by the compiler. Going further, I want to see if it is possible to fuse with TCONV(Transposed convolution) which looks similar to convolution.

TODO

[x] Mathematically check if it is possible
[x] Check at the code level
[x] Write a small network including TCONV + BN(mul+add)
[x] Write a optimization pass : (TCONV+mul fusion) + add
[x] Value test with ~~runtime(NYI)~~ luci-interpreter
[x] Create TCONV IR having bias attribute
[x] Update a optimization pass : TCONV+mul+add fusion

typproject

Source

mhs4670go

👍1

All 8 comments

Check at the code level

import tensorflow as tf
import numpy as np

tf.compat.v1.disable_eager_execution()

def batch_norm(X, scale, offset):
    mean = np.mean(X)
    std = np.std(X)
    return (X - mean) / std * scale + offset

X = np.arange(1,5).reshape((1,2,2,1))
W = np.ones(9).reshape((3,3,1,1))
input_ = tf.compat.v1.constant(X, dtype=tf.float32)
filter_ = tf.compat.v1.constant(W, dtype=tf.float32)
tconv_ = tf.compat.v1.nn.conv2d_transpose(input_, filter_, output_shape=(1,4,4,1), strides=[1,1,1,1], padding='VALID')

with tf.compat.v1.Session() as sess:
    tconv_out = sess.run(tconv_)

scale = 1.05
offset = 0.015
# BN with numpy
print('-' * 10, 'numpy batch_norm', '-' * 10)
print(batch_norm(tconv_out, scale, offset))

# BN with tensorflow
print('-' * 10, 'tensorflow batch_norm', '-' * 10)
scale_ = tf.compat.v1.constant([1.05], dtype=tf.float32)
offset_ = tf.compat.v1.constant([0.015], dtype=tf.float32)
mean_ = tf.compat.v1.constant([np.mean(tconv_out)], dtype=tf.float32)
variance_ = tf.compat.v1.constant([np.var(tconv_out)], dtype=tf.float32)
bn_out, _, _ = tf.compat.v1.nn.fused_batch_norm(tconv_, scale_, offset_, mean=mean_, variance=variance_, epsilon=0, is_training=False)

with tf.compat.v1.Session() as sess:
    bn_out = sess.run(bn_out)

print(bn_out)

# BN folding in TCONV
folded_W = W * scale / np.std(tconv_out)
folded_offset = offset - scale * np.mean(tconv_out) / np.std(tconv_out)
folded_filter = tf.compat.v1.constant(folded_W, dtype=tf.float32)
folded_tconv = tf.compat.v1.nn.conv2d_transpose(input_, folded_filter, output_shape=(1,4,4,1), strides=[1,1,1,1], padding='VALID')

with tf.compat.v1.Session() as sess:
    folded_out = sess.run(folded_tconv)

print('-' * 10, 'folded tconv', '-' * 10)
print(folded_out + folded_offset)

output

---------- numpy batch_norm ----------
[[[[-1.6051569 ]
   [-0.90454847]
   [-0.90454847]
   [-1.2548528 ]]

  [[-0.5542443 ]
   [ 1.5475807 ]
   [ 1.5475807 ]
   [ 0.14636408]]

  [[-0.5542443 ]
   [ 1.5475807 ]
   [ 1.5475807 ]
   [ 0.14636408]]

  [[-0.90454847]
   [ 0.49666822]
   [ 0.49666822]
   [-0.5542443 ]]]]
---------- tensorflow batch_norm ----------
[[[[-1.6051562 ]
   [-0.90454805]
   [-0.90454805]
   [-1.254852  ]]

  [[-0.55424404]
   [ 1.5475801 ]
   [ 1.5475801 ]
   [ 0.1463641 ]]

  [[-0.55424404]
   [ 1.5475801 ]
   [ 1.5475801 ]
   [ 0.1463641 ]]

  [[-0.90454805]
   [ 0.49666798]
   [ 0.49666798]
   [-0.55424404]]]]
---------- folded tconv ----------
[[[[-1.6051569 ]
   [-0.9045485 ]
   [-0.9045485 ]
   [-1.2548528 ]]

  [[-0.5542443 ]
   [ 1.5475811 ]
   [ 1.5475811 ]
   [ 0.1463641 ]]

  [[-0.5542443 ]
   [ 1.5475811 ]
   [ 1.5475811 ]
   [ 0.1463641 ]]

  [[-0.9045485 ]
   [ 0.49666822]
   [ 0.49666822]
   [-0.5542443 ]]]]

The values seems to be same:)

mhs4670go on 31 Jul 2020

👍1

The last one has minor diff :)

   [-0.5542443 ]]]]
   [-0.55424404]]]]
   [-0.5542443 ]]]]

seanshpark on 31 Jul 2020

Mathematically check

There are many ways of thinking about transposed convolution. One of them is from the perspective of a cell in the output. If you pay attention to the output cell when you calculate the TCONV, you will notice that the input is multiplied by the flipped kernel. This is why we call it Transposed Convolution. It isn't actually "transposed" though.

Therefore, calculation of TCONV is like a normal convolution(limited to stride 1). And, since we can think of BN as 1x1 convolution, fusing it with TCONV layer is possible.

mhs4670go on 31 Jul 2020

❤1

Write a small network including TCONV + BN

import tensorflow as tf
import numpy as np

tf.compat.v1.disable_eager_execution()
input_ = tf.compat.v1.placeholder(tf.float32, shape=(1,2,2,1), name="Hole")
W = np.ones(9).reshape((3,3,1,1))
filter_ = tf.compat.v1.constant(W, dtype=tf.float32)
tconv_ = tf.compat.v1.nn.conv2d_transpose(input_, filter_, output_shape=(1,4,4,1), strides=[1,1,1,1], padding='VALID')

scale_ = tf.compat.v1.constant([1.0177339315414429], dtype=tf.float32)
offset_ = tf.compat.v1.constant([0.015628524124622345], dtype=tf.float32)
mean_ = tf.compat.v1.constant([1.027155211195349693], dtype=tf.float32)
variance_ = tf.compat.v1.constant([0.25580066442489624], dtype=tf.float32)
bn_out, _, _ = tf.compat.v1.nn.fused_batch_norm(tconv_, scale_, offset_, mean=mean_, variance=variance_, epsilon=0.0010000000474974513, is_training=False)

pb

tflite

mhs4670go on 31 Jul 2020

After #3857, value test works well but there's name difference.

$ h5diff -d 0.001 Net_TConv_BN_000.expected.h5 Net_TConv_BN_000.opt.expected.h5
attribute: <0 of </name>> and <0 of </name>>
15 differences found

mhs4670go on 24 Aug 2020

Can we check circle.TConv with bias in netron-circle?

cgbahk on 28 Aug 2020

@cgbahk Yes, you can check it from build/compiler/circle2circle-dredd-recipe-test/Net_TConv_BN_000.opt.circle.

FYI, as of now, there's no test that tests TCONV_BN_folding pass with stride or padding. This's gonna added soon.

mhs4670go on 28 Aug 2020

👍1

Could you please check https://github.com/Samsung/ONE/pull/4022#discussion_r478798361 as well ? :smiley:

cgbahk on 28 Aug 2020

Was this page helpful?

0 / 5 - 0 ratings

Related issues

[one-cmds] one-build failed with error message

YongseopKim · 3Comments

[onert] Improve performance of WICPlanner

periannath · 3Comments

How can I get ruy to use multiple cores?

ragmani · 4Comments

Compiler FE : Speed up CI build and test time

mhs4670go · 3Comments

[cker/ruy] EXPERIMENTAL_RUY_FEATURE flag not working for android

periannath · 3Comments