When using tensorflow.keras to create a model, the first batchnormalization layer appears to be connected to all other batch normalization layers in the graph. I think this is rendered incorrectly rather than built incorrectly but have not been able to prove that.
Code follows that builds the same model with pure tensorflow, and with tensorflow.keras, as well as the graph rendered by tensorboard in each case.
This issue is probably related to this unanswered StackOverflow post:
https://stackoverflow.com/questions/52586853/batchnormalization-nodes-wrongfully-linked-with-each-other
and possibly related to this tensorflow issue:
https://github.com/tensorflow/tensorflow/issues/17985
Graph produced by pure tensorflow
Graph produced with keras model
Tensorflow code
import tensorflow as tf
import numpy as np
logdir="usingtf"
num_classes=10
x = tf.placeholder(tf.float32, shape=[None, 28,28,1], name="data_in")
y = tf.placeholder(tf.int32, shape=[None, num_classes], name="target_labels")
conv_1 =tf.layers.conv2d(inputs=x,filters=32,kernel_size=(3,3),name="Conv1")
bn_1 =tf.layers.batch_normalization(inputs=conv_1)
rl_1 =tf.nn.relu(bn_1)
conv_2 =tf.layers.conv2d(inputs=rl_1,filters=64,kernel_size=(3,3),name="Conv2")
bn_2 =tf.layers.batch_normalization(inputs=conv_2)
rl_2 =tf.nn.relu(bn_2)
maxpool_1 =tf.layers.max_pooling2d(inputs=rl_2,pool_size=2,strides=2,name="Pool1")
dropout_1 =tf.layers.dropout(inputs=maxpool_1,rate=0.25,name="Drop1")
flatten_1 =tf.layers.flatten(dropout_1)
dense_1 =tf.layers.dense(inputs=flatten_1,units=128,activation=tf.nn.relu,name="Dense1")
bn_3 =tf.layers.batch_normalization(inputs=dense_1)
rl_3 =tf.nn.relu(bn_3)
dropout_2 =tf.layers.dropout(rl_3,rate=0.5,name="Drop2")
dense_2 =tf.layers.dense(dropout_2,units=num_classes,name="Final")
with tf.Session() as sess:
tbwriter=tf.summary.FileWriter(logdir)
tbwriter.add_graph(sess.graph)
Keras equivalent
import tensorflow as tf
import tensorflow.keras as keras
import numpy as np
import tensorflow.keras.backend as K
sess=tf.Session()
K.set_session(sess)
logdir="usingkeras"
num_classes=10
model=keras.models.Sequential()
model.add(keras.layers.Conv2D(input_shape=(28,28,1),filters=32,kernel_size=(3,3),name="Conv1"))
#conv_1 =tf.layers.conv2d(inputs=x,filters=32,kernel_size=(3,3),name="Conv1")
model.add(keras.layers.BatchNormalization(name="FirstBatchnorm"))
#bn_1 =tf.layers.batch_normalization(inputs=conv_1)
model.add(keras.layers.Activation("relu"))
#rl_1 =tf.nn.relu(bn_1)
model.add(keras.layers.Conv2D(filters=64,kernel_size=(3,3),name="Conv2"))
#conv_2 =tf.layers.conv2d(inputs=rl_1,filters=64,kernel_size=(3,3),name="Conv2")
model.add(keras.layers.BatchNormalization())
#bn_2 =tf.layers.batch_normalization(inputs=conv_2)
model.add(keras.layers.Activation("relu"))
#rl_2 =tf.nn.relu(bn_2)
model.add(keras.layers.MaxPooling2D(pool_size=2,strides=2,name="Pool1"))
#maxpool_1 =tf.layers.max_pooling2d(inputs=rl_2,pool_size=2,strides=2,name="Pool1")
model.add(keras.layers.Dropout(0.25))
#dropout_1 =tf.layers.dropout(inputs=maxpool_1,rate=0.25,name="Drop1")
#flatten_1 =tf.layers.flatten(dropout_1)
model.add(keras.layers.Dense(units=128,activation="relu",name="Dense1"))
#dense_1 =tf.layers.dense(inputs=flatten_1,units=128,activation=tf.nn.relu,name="Dense1")
model.add(keras.layers.BatchNormalization())
#bn_3 =tf.layers.batch_normalization(inputs=dense_1)
model.add(keras.layers.Activation("relu"))
#rl_3 =tf.nn.relu(bn_3)
model.add(keras.layers.Dropout(0.25))
#dropout_2 =tf.layers.dropout(rl_3,rate=0.5,name="Drop2")
model.add(keras.layers.Dense(units=num_classes,name="Dense2"))
#dense_2 =tf.layers.dense(dropout_2,units=num_classes,name="Final")
model.compile("adam","categorical_crossentropy")
tbwriter=tf.summary.FileWriter(logdir)
tbwriter.add_graph(sess.graph)
model.summary()
pinging @nuance-research
Same here. It must be a rendering issue because I see the same thing (first batch norm layer connected to all other batch norm layers) on totally different networks built long time ago. Also, in models trained without batch normalization, instead, the first dropout layer appears to be connected to all other dropout layers.
Just encountered the same problem today, and after some juggling I think I have found the true root of this behaviour.
As far as I understand it, it's not per se _wrongfully linked_; it looks like it's about intrinsic optimizations or such.
Keras uses Batch Normalization (and perhaps Dropout) layer realizations that depend on flag learning_phase (a boolean value)鈥攂ecause they should work differently while fitting and while evaluating. And it looks like this flag is stored as input in the first layer that uses it; and even if we set it to false manually (e.g., calling keras.backend.set_learning_phase(0)), it is still used for some primary flag calculations, that are propagated from there to all of other layers of similar internal structure.
You could see that, I guess, just by observing the structure of batch norm layer.
True question is, how do we disable that flag propagation? Is it even possible to at least make it reducable by optimization (i.e., for manual setting learning_phase to zero鈥攊n that case the input to that keras_learning_phase node is static鈥攁nd the flags that are propagated could be static as well)?
Or, to formulate the question to be more related to this repo鈥攊s it possible for TensorBoard to automatically hide these Keras-specific autogenerated links that are unrelated to the actual logic of the graph?
I think I have the same issue here

I think I had the same problem when converting a tensorflow.model to a keras model.
I also have the problem, that using keras BatchNormalization produces 4 nodes in the Functions graph in tensorboard for each BatchNormalization.
Having a keras model with just a few BatchNormalization layers results in Tensorboard Graphs being extremely slow and buggy - I suspect because of all the Functions nodes the rendering takes too long.
It would be nice if the Functions graph is hidden per default.
For example this:
input_shape = (128, 128, 1)
inp = tf.keras.layers.Input(input_shape)
bn = tf.keras.layers.BatchNormalization()
out = bn(inp)
model = tf.keras.models.Model(inputs=inp, outputs=out)
model.compile(optimizer="Adam", loss="mse")
output_shape = out.shape
feature_batch = np.zeros([1] + list(input_shape))
label_batch = np.zeros([1] + list(output_shape)[1:])
log_dir = "/tmp/tensorboard_model/"
tensorboard_callback = tf.keras.callbacks.TensorBoard(log_dir)
model.fit(feature_batch, label_batch, callbacks=[tensorboard_callback])
results in

Most helpful comment
Just encountered the same problem today, and after some juggling I think I have found the true root of this behaviour.
As far as I understand it, it's not per se _wrongfully linked_; it looks like it's about intrinsic optimizations or such.
Keras uses Batch Normalization (and perhaps Dropout) layer realizations that depend on flag
learning_phase(a boolean value)鈥攂ecause they should work differently while fitting and while evaluating. And it looks like this flag is stored as input in the first layer that uses it; and even if we set it to false manually (e.g., callingkeras.backend.set_learning_phase(0)), it is still used for some primary flag calculations, that are propagated from there to all of other layers of similar internal structure.You could see that, I guess, just by observing the structure of batch norm layer.
True question is, how do we disable that flag propagation? Is it even possible to at least make it reducable by optimization (i.e., for manual setting
learning_phaseto zero鈥攊n that case the input to thatkeras_learning_phasenode is static鈥攁nd the flags that are propagated could be static as well)?Or, to formulate the question to be more related to this repo鈥攊s it possible for TensorBoard to automatically hide these Keras-specific autogenerated links that are unrelated to the actual logic of the graph?