Keras: SimpleRNN implementation (Adding an activation function actually not adding?)

Created on 19 Sep 2016 · 7Comments · Source: keras-team/keras

Hi all,

I just wonder someone can help to find a problem I am having now,
I tried to make the network as simple as possible:
one hidden RNN layer with linear activation function and output dimension is two.
I compared the prection by keras and my calculation but they are different.

from keras.models import Sequential
from keras.layers import Activation, SimpleRNN
Du = 3; Dy = 2
model = Sequential()
model.add(SimpleRNN(Dy, return_sequences=True, input_shape=(None, Du)))
model.add(Activation("linear"))

# Input data (2 time steps)
xx = np.random.random((1, 2, 3))

# prediction using model.predict
Xpred1 = model.predict(xx)

# prediction using actual calculation
W = model.get_weights()
Xpred2 = np.empty((1, 2, 2))
h1 = np.zeros((2,))
h1 = np.dot(W[0].T, xx[0][0]) + np.dot(W[1].T, h1) # I didnt include the bias since they are zeros
Xpred2[0][0] = h1
h1 = np.dot(W[0].T, xx[0][1]) + np.dot(W[1].T, h1)
Xpred2[0][1] = h1


# print values
print Xpred1
print Xpred2
[[[ 0.70926803 -0.122134  ]
  [-0.39438242  0.55906796]]]
[[[ 0.8857093  -0.12274676]
  [-0.51366907  0.79976374]]]

I did similar approach for 'Dense' network and Xpred1 and Xpred2 were the same.
But in SimpleRNN, Xpred1 and Xpred2 are different.

The actual calculation involves the initialized hidden (recurrence) nodes h1 whose dimension is the same as the output dimension--2.

Can anyone help where I made mistake in the calculation?

Thanks.

stale

Source

iheo

Most helpful comment

# prediction using actual calculation
W = model.get_weights()
Xpred2 = np.empty((1, 2, 2))
h1 = np.zeros((2,))
h1 = np.dot(W[0].T, xx[0][0]) + np.dot(W[1].T, h1) # I didnt include the bias since they are zeros
_exp = lambda x: np.exp(-2 * x)
_h1_exp = _exp(h1)
h1 = (1 - _h1_exp) / (1 + _h1_exp)
Xpred2[0][0] = h1
h1 = np.dot(W[0].T, xx[0][1]) + np.dot(W[1].T, h1)
_h1_exp = _exp(h1)
h1 = (1 - _h1_exp) / (1 + _h1_exp)
Xpred2[0][1] = h1

farizrahman4u on 19 Sep 2016

👍4 🎉3 ❤2 😄2

All 7 comments

# prediction using actual calculation
W = model.get_weights()
Xpred2 = np.empty((1, 2, 2))
h1 = np.zeros((2,))
h1 = np.dot(W[0].T, xx[0][0]) + np.dot(W[1].T, h1) # I didnt include the bias since they are zeros
_exp = lambda x: np.exp(-2 * x)
_h1_exp = _exp(h1)
h1 = (1 - _h1_exp) / (1 + _h1_exp)
Xpred2[0][0] = h1
h1 = np.dot(W[0].T, xx[0][1]) + np.dot(W[1].T, h1)
_h1_exp = _exp(h1)
h1 = (1 - _h1_exp) / (1 + _h1_exp)
Xpred2[0][1] = h1

farizrahman4u on 19 Sep 2016

👍4 🎉3 ❤2 😄2

Oh.......
This is more than the perfect answer. -- I tried to debug the entire code but was not easy even to find where the calculation happens.....
Thank you so so much.!!!!!

But.. I used 'linear' as the activation function.
It seems to be there is a tanh-like hidden activation function that I didn't know.
Is the tanh-like function set by default or common in RNN-like network?

Thank you!!!!!

iheo on 20 Sep 2016

(The Keras version is 1.0.8 -- recently updated)

Somehow, I found the reason.
They act in different ways for how to specify activation function

.MODEL1
model.add(SimpleRNN(Dy, return_sequences=True, input_shape=(None, Du)))
model.add(Activation("linear"))

.MODEL2
model.add(SimpleRNN(Dy, return_sequences=True, input_shape=(None, Du), _activation='linear'_))

But, in the keras.io manual, they should act in the same way
( referring to: https://keras.io/activations/)

Is this a bug or am I still missing something?
Thanks.

A simple snippet code is given here:

from keras.models import Sequential
from keras.layers import Activation, SimpleRNN
Du = 3; Dy = 2

# input data
X_test = np.random.random((1, 4, 3))

# model 1
model1 = Sequential()
model1.add(SimpleRNN(Dy, return_sequences=True, input_shape=(None, Du)))
model1.add(Activation("linear"))
model1.save_weights('my_model_weights.h5')
Xpred1 = model1.predict(X_test)

# model 2
model2 = Sequential()
model2.add(SimpleRNN(Dy, return_sequences=True, input_shape=(None, Du), activation="linear"))
model2.load_weights('my_model_weights.h5')
Xpred2 = model2.predict(X_test)

print Xpred1
print Xpred2

for model1, the configuration is still have the default activation 'tanh' function inside.
[{'class_name': 'SimpleRNN',
'config': {'U_regularizer': None,
'W_regularizer': None,
'activation': 'tanh',
'b_regularizer': None,
'batch_input_shape': (None, None, 3),
'consume_less': 'cpu',
'dropout_U': 0.0,
'dropout_W': 0.0,
'go_backwards': False,
'init': 'glorot_uniform',
'inner_init': 'orthogonal',
'input_dim': 3,
'input_dtype': 'float32',
'input_length': None,
'name': 'simplernn_20',
'output_dim': 2,
'return_sequences': True,
'stateful': False,
'trainable': True,
'unroll': False}},
{'class_name': 'Activation',
'config': {'activation': 'linear',
'name': 'activation_9',
'trainable': True}}]

iheo on 20 Sep 2016

It doesn't act the same way for RNNs. The activation of an RNN is applied to the output at each time step, which means the previous output (h_tm1) is already tanh-ed , and the next output is conditioned on that.
But if you simply add a tanh layer the tanh is applied after all the RNN stuff.. not in between each time step. Hope it makes sense.

farizrahman4u on 20 Sep 2016

👍1

Thank you for your explanation. It helped a lot. THanks.

iheo on 20 Sep 2016

What is the meaning of your code: