Hi all,
I just wonder someone can help to find a problem I am having now,
I tried to make the network as simple as possible:
one hidden RNN layer with linear activation function and output dimension is two.
I compared the prection by keras and my calculation but they are different.
from keras.models import Sequential
from keras.layers import Activation, SimpleRNN
Du = 3; Dy = 2
model = Sequential()
model.add(SimpleRNN(Dy, return_sequences=True, input_shape=(None, Du)))
model.add(Activation("linear"))
# Input data (2 time steps)
xx = np.random.random((1, 2, 3))
# prediction using model.predict
Xpred1 = model.predict(xx)
# prediction using actual calculation
W = model.get_weights()
Xpred2 = np.empty((1, 2, 2))
h1 = np.zeros((2,))
h1 = np.dot(W[0].T, xx[0][0]) + np.dot(W[1].T, h1) # I didnt include the bias since they are zeros
Xpred2[0][0] = h1
h1 = np.dot(W[0].T, xx[0][1]) + np.dot(W[1].T, h1)
Xpred2[0][1] = h1
# print values
print Xpred1
print Xpred2
[[[ 0.70926803 -0.122134 ]
[-0.39438242 0.55906796]]]
[[[ 0.8857093 -0.12274676]
[-0.51366907 0.79976374]]]
I did similar approach for 'Dense' network and Xpred1 and Xpred2 were the same.
But in SimpleRNN, Xpred1 and Xpred2 are different.
The actual calculation involves the initialized hidden (recurrence) nodes h1 whose dimension is the same as the output dimension--2.
Can anyone help where I made mistake in the calculation?
Thanks.
# prediction using actual calculation
W = model.get_weights()
Xpred2 = np.empty((1, 2, 2))
h1 = np.zeros((2,))
h1 = np.dot(W[0].T, xx[0][0]) + np.dot(W[1].T, h1) # I didnt include the bias since they are zeros
_exp = lambda x: np.exp(-2 * x)
_h1_exp = _exp(h1)
h1 = (1 - _h1_exp) / (1 + _h1_exp)
Xpred2[0][0] = h1
h1 = np.dot(W[0].T, xx[0][1]) + np.dot(W[1].T, h1)
_h1_exp = _exp(h1)
h1 = (1 - _h1_exp) / (1 + _h1_exp)
Xpred2[0][1] = h1
Oh.......
This is more than the perfect answer. -- I tried to debug the entire code but was not easy even to find where the calculation happens.....
Thank you so so much.!!!!!
But.. I used 'linear' as the activation function.
It seems to be there is a tanh-like hidden activation function that I didn't know.
Is the tanh-like function set by default or common in RNN-like network?
Thank you!!!!!
(The Keras version is 1.0.8 -- recently updated)
Somehow, I found the reason.
They act in different ways for how to specify activation function
.MODEL1
model.add(SimpleRNN(Dy, return_sequences=True, input_shape=(None, Du)))
model.add(Activation("linear"))
.MODEL2
model.add(SimpleRNN(Dy, return_sequences=True, input_shape=(None, Du), _activation='linear'_))
But, in the keras.io manual, they should act in the same way
( referring to: https://keras.io/activations/)
Is this a bug or am I still missing something?
Thanks.
A simple snippet code is given here:
from keras.models import Sequential
from keras.layers import Activation, SimpleRNN
Du = 3; Dy = 2
# input data
X_test = np.random.random((1, 4, 3))
# model 1
model1 = Sequential()
model1.add(SimpleRNN(Dy, return_sequences=True, input_shape=(None, Du)))
model1.add(Activation("linear"))
model1.save_weights('my_model_weights.h5')
Xpred1 = model1.predict(X_test)
# model 2
model2 = Sequential()
model2.add(SimpleRNN(Dy, return_sequences=True, input_shape=(None, Du), activation="linear"))
model2.load_weights('my_model_weights.h5')
Xpred2 = model2.predict(X_test)
print Xpred1
print Xpred2
for model1, the configuration is still have the default activation 'tanh' function inside.
[{'class_name': 'SimpleRNN',
'config': {'U_regularizer': None,
'W_regularizer': None,
'activation': 'tanh',
'b_regularizer': None,
'batch_input_shape': (None, None, 3),
'consume_less': 'cpu',
'dropout_U': 0.0,
'dropout_W': 0.0,
'go_backwards': False,
'init': 'glorot_uniform',
'inner_init': 'orthogonal',
'input_dim': 3,
'input_dtype': 'float32',
'input_length': None,
'name': 'simplernn_20',
'output_dim': 2,
'return_sequences': True,
'stateful': False,
'trainable': True,
'unroll': False}},
{'class_name': 'Activation',
'config': {'activation': 'linear',
'name': 'activation_9',
'trainable': True}}]
It doesn't act the same way for RNNs. The activation of an RNN is applied to the output at each time step, which means the previous output (h_tm1) is already tanh-ed , and the next output is conditioned on that.
But if you simply add a tanh layer the tanh is applied after all the RNN stuff.. not in between each time step. Hope it makes sense.
Thank you for your explanation. It helped a lot. THanks.
Hi
What is the meaning of your code:
h1 = (1 - _h1_exp) / (1 + _h1_exp)
Thanks
It is calculating the exponential of the value, then apply activation function tanh.
Most helpful comment