Training my network works fine, but the training & validation loss stop decreasing after around 60 epochs. I wanted to visualize the gradients through the tensorboard callback, but received this error message:
ValueError: An operation has None for gradient. Please make sure that all of your ops have a gradient defined (i.e. are differentiable). Common ops without gradient: K.argmax, K.round, K.eval.
And the callback used:
tbCallBack = callbacks.TensorBoard(log_dir=path + '/run_' + str(curr_run), histogram_freq=20, write_graph=False,write_images=False, batch_size=batch_size,write_grads=True)
My network is a three-layer CNN followed by an LSTM:
x = TimeDistributed(
Conv2D(16, (1, 8), activation='relu', padding='same', kernel_initializer='he_normal'))(
input_sample)
x = TimeDistributed(MaxPooling2D((1, 4), padding='same'))(x)
x = TimeDistributed(BatchNormalization())(x)
x = TimeDistributed(Conv2D(32, (1, 4), activation='relu', padding='same', kernel_initializer='he_normal'))(x)
x = TimeDistributed(MaxPooling2D((1, 4), padding='same'))(x)
x = TimeDistributed(BatchNormalization())(x)
x = TimeDistributed(Conv2D(64, (1, 4), activation='relu', padding='same', kernel_initializer='he_normal'))(x)
x = TimeDistributed(MaxPooling2D((1, 4), padding='same'))(x)
x = TimeDistributed(Flatten())(x)
if stateful:
input_timesteps = Input(shape=(None, 1), name='input_timesteps', batch_shape=(batch_size, seq_len, 1))
else:
input_timesteps = Input(shape=(None, 1), name='input_timesteps')
x = BatchNormalization()(x)
x = CuDNNLSTM(64, stateful=stateful, name='lstm_layer1', return_sequences=True)(x)
x = BatchNormalization()(x)
x = concatenate([x, input_timesteps])
out = TimeDistributed(Dense(1, activation='linear', kernel_initializer='he_normal'), name='output_layer')(x)
model = Model(inputs=[input_sample, input_timesteps], outputs=[out])
I have no idea which operation is causing the graident to be None. I'm running the latest version of Keras (2.2.2) and tensorflow-gpu/tensorboard (1.10.0)
I have exactly the same problem. Did you find any clue about the error? Checking the Keras source code quickly, all I can see that the tf.gradients() function call returns None.
same problem. it's quite weird that removing histogram_freq=20 from the tensorboard creation would be ok.
Same.
I have implemented a ResNet architecture and have the same error. It comes from an error in the distribution of version 2.2.4 at this line.
line on master
if self.write_grads and weight in layer.trainable_weights:
line in distribution
if self.write_grads:
The moving mean of the BatchNormalization layer is not trainable and thus has no gradient.
I am facing the same issue, does anybody know if there is a fix for some version?
Most helpful comment
I have implemented a ResNet architecture and have the same error. It comes from an error in the distribution of version 2.2.4 at this line.
line on master
line in distribution
The moving mean of the
BatchNormalizationlayer is not trainable and thus has no gradient.