Keras: Cell States of a LSTM

Created on 10 Feb 2018  路  17Comments  路  Source: keras-team/keras

Hi there,

I have a problem with LSTMs. I need to get the cell states out of a LSTM for each time step. Unfortunaly it is only possible to get the output for each time step with return_sequences=True. return_state=True only gives me the cell state for the last time step...

Is there any hack/modification to get the cell states for each time step?

Greetings

Most helpful comment

During prediction you can get the states for each time step by unrolling the RNN - basically you do a for loop over the LSTMCell instead using TF/Theano scan ops which are called by K.rnn.

maxlen = 10
input_dim = 10
units = 5

inputs = Input((maxlen, input_dim))

rnn = LSTM(units, return_state=True)

states = [] # list of (h, c) tuples
outputs = []

state = None
def get_indexer(t):
    return Lambda(lambda x, t: x[:, t, :], arguments={'t':t}, output_shape=lambda s: (s[0], s[2]))

def expand(x):
    return K.expand_dims(x, 1)

expand_layer = Lambda(expand, output_shape=lambda s: (s[0], 1, s[1]))
for t in range(maxlen):
    input_t = get_indexer(t)(inputs)  # basically input_t = inputs[:, t, :]
    input_t = expand_layer(input_t)
    output_t, h, c = rnn(input_t, initial_state=state)
    state = h, c
    states.append(state)
    outputs.append(output_t)

Caveat - Ignores masking.

All 17 comments

During prediction you can get the states for each time step by unrolling the RNN - basically you do a for loop over the LSTMCell instead using TF/Theano scan ops which are called by K.rnn.

maxlen = 10
input_dim = 10
units = 5

inputs = Input((maxlen, input_dim))

rnn = LSTM(units, return_state=True)

states = [] # list of (h, c) tuples
outputs = []

state = None
def get_indexer(t):
    return Lambda(lambda x, t: x[:, t, :], arguments={'t':t}, output_shape=lambda s: (s[0], s[2]))

def expand(x):
    return K.expand_dims(x, 1)

expand_layer = Lambda(expand, output_shape=lambda s: (s[0], 1, s[1]))
for t in range(maxlen):
    input_t = get_indexer(t)(inputs)  # basically input_t = inputs[:, t, :]
    input_t = expand_layer(input_t)
    output_t, h, c = rnn(input_t, initial_state=state)
    state = h, c
    states.append(state)
    outputs.append(output_t)

Caveat - Ignores masking.

Thanks for your answer, I think this will help me. Did you test your code? Unfortunately, it doesn't work: output_t, (h, c) = cell(input_t, (h, c)) TypeError: __call__() takes 2 positional arguments but 3 were given.

updated

Again, thank you so much! Now there's
File "...\engine\topology.py", line 717, in _add_inbound_node output_tensors[i]._keras_shape = output_shapes[i]
AttributeError: 'tuple' object has no attribute '_keras_shape'

After each timestamp, here is an exemplary code for accessing all of the states:

import keras.backend as K
statesAll=[]
for layer in model.layers:
    if getattr(layer,'stateful',False):
        if hasattr(layer,'states'):
            for state in layer.states:
               statesAll.append(K.get_value(state))

Hmm.. seems Cells cant be called as such, I have updated to to use LSTM layer instead. Can you try now?

Thank you so much for your patience and support! It doesn't return any error, so I will try to modify it for my problem. Thank you! :)

@brain1995 @farizrahman4u Hi, I have another relevant question that how can I compute the output of every LSTMCell with each word of an sentence and optimize on mini-batch finally. Specifically, now the output_t has the batch dimension, but in each time-step I only want to get the single LSTMCell output. The reason is that I want to control if the word needs to participate in the LSTM update. And Finally I want to add all the losses of a batch to optimize. Just like the pseudo-code below:

maxlen = 10
input_dim = 10
units = 5
batch_size = 32

inputs = Input((maxlen, input_dim))

rnn = LSTM(units, return_state=True)

states = [] # list of (h, c) tuples
outputs = []

state = None
def get_indexer(t):
    return Lambda(lambda x, t: x[:, t, :], arguments={'t':t}, output_shape=lambda s: (s[0], s[2]))

def expand(x):
    return K.expand_dims(x, 1)

def decision(x): #Just an example, maybe more complex in application.
    return np.random.choice([0,1])


expand_layer = Lambda(expand, output_shape=lambda s: (s[0], 1, s[1]))
for i in range(batch_size): #Here, I want to compute the LSTMCell by each sample
    for t in range(maxlen):
        input_t = get_indexer(t)(inputs)  # my hope: input_t = inputs[i, t, :]
        input_t = expand_layer(input_t)
        des = decision(input_t)
        if des == 0:
            continue# if the word input has no contribution decided by `decision` function, it will not participate in the LSTM updating.
        output_t, h, c = rnn(input_t, initial_state=state)
        state = h, c
        states.append(state)
        outputs.append(output_t)

Therefore, where can I modify the example code above. Thanks!

@farizrahman4u is _state_ supposed to be a list? Also, where do you provide the desired training/testing set to the model? Thanks for the help!

What does the function "Lambda" mean above ?

I knew it, thanks .

@farizrahman4u Could you help me translate your chunk of code into tensorflow? I was trying several options but still got an error:

import keras
from keras.layers import Input, LSTM, Lambda
import tensorflow as tf

maxlen = 10
input_dim = 10
units = 5

inputs = Input((maxlen, input_dim), dtype = tf.float32)

rnn = LSTM(units, return_state=True)

def get_indexer(t):
    return Lambda(lambda x, t: x[:, t, :], arguments={'t':t}, output_shape=lambda s: (s[0], s[2]))

def expand(x):
    return keras.backend.expand_dims(x, 1)

expand_layer = Lambda(expand, output_shape=lambda s: (s[0], 1, s[1]))
state = tf.Variable(tf.zeros([10]))
states = tf.TensorArray(dtype=tf.float32, size=0, dynamic_size=True, name='states')
iters = tf.constant(10, name='iters')

def cond(i, iters, states):
    return tf.less(i, iters)

def body(i, iters, states):
    input_t = get_indexer(i)(inputs)  # basically input_t = inputs[:, t, :]
    input_t = expand_layer(input_t)
    output_t, h, c = rnn(input_t, initial_state=state)
    temp_state = h, c
    assign_op = tf.assign(state, temp_state)
    states = states.write(step, state)
    return states

states = tf.while_loop(cond, body, [0, iters, states])
ValueError: Initializer for variable while_18/lstm_6/kernel/ is from inside a control-flow construct, such as a loop or conditional. When creating a variable inside a loop or conditional, use a lambda as the initializer.

I have exactly the same question !!!
And I don鈥檛 know how to use K.rnn to do this? Can you give me a example?

Please tell me how to use K.rnn to get those states from Lambda layer

a burning question for a newbie

or how to input actual values into this Lambda layer

During prediction you can get the states for each time step by unrolling the RNN - basically you do a for loop over the LSTMCell instead using TF/Theano scan ops which are called by K.rnn.

maxlen = 10
input_dim = 10
units = 5

inputs = Input((maxlen, input_dim))

rnn = LSTM(units, return_state=True)

states = [] # list of (h, c) tuples
outputs = []

state = None
def get_indexer(t):
    return Lambda(lambda x, t: x[:, t, :], arguments={'t':t}, output_shape=lambda s: (s[0], s[2]))

def expand(x):
    return K.expand_dims(x, 1)

expand_layer = Lambda(expand, output_shape=lambda s: (s[0], 1, s[1]))
for t in range(maxlen):
    input_t = get_indexer(t)(inputs)  # basically input_t = inputs[:, t, :]
    input_t = expand_layer(input_t)
    output_t, h, c = rnn(input_t, initial_state=state)
    state = h, c
    states.append(state)
    outputs.append(output_t)

Caveat - Ignores masking.

Is the below results I get from model.predict is the state of each time steps ???

time_steps = 11
input_dim = 17
units = 128

inputs = Input((time_steps, input_dim))

rnn = GRU(units, return_state=True)

states = [] # list of (h, c) tuples
outputs = []

state = None
def get_indexer(t):
return Lambda(lambda x, t: x[:, t, :],
arguments={'t':t},
output_shape=lambda s: (s[0], s[2]))

def expand(x):
return K.expand_dims(x, 1)

expand_layer = Lambda(expand, output_shape=lambda s: (s[0], 1, s[1]))
for t in range(time_steps):
input_t = get_indexer(t)(inputs) # basically input_t = inputs[:, t, :]
input_t = expand_layer(input_t)
output_t, h = rnn(input_t, initial_state=state)
state = h
states.append(state)
outputs.append(output_t)

modelX = Model(inputs,states)
every_time_step_states = modelX.predict(My_real_data_input)

Was this page helpful?
0 / 5 - 0 ratings

Related issues

fredtcaroli picture fredtcaroli  路  3Comments

braingineer picture braingineer  路  3Comments

anjishnu picture anjishnu  路  3Comments

farizrahman4u picture farizrahman4u  路  3Comments

LuCeHe picture LuCeHe  路  3Comments