Keras: Cell States of a LSTM

Created on 10 Feb 2018 · 17Comments · Source: keras-team/keras

Hi there,

I have a problem with LSTMs. I need to get the cell states out of a LSTM for each time step. Unfortunaly it is only possible to get the output for each time step with return_sequences=True. return_state=True only gives me the cell state for the last time step...

Is there any hack/modification to get the cell states for each time step?

Greetings

Source

LuckyOwl95

Most helpful comment

During prediction you can get the states for each time step by unrolling the RNN - basically you do a for loop over the LSTMCell instead using TF/Theano scan ops which are called by K.rnn.

maxlen = 10
input_dim = 10
units = 5

inputs = Input((maxlen, input_dim))

rnn = LSTM(units, return_state=True)

states = [] # list of (h, c) tuples
outputs = []

state = None
def get_indexer(t):
    return Lambda(lambda x, t: x[:, t, :], arguments={'t':t}, output_shape=lambda s: (s[0], s[2]))

def expand(x):
    return K.expand_dims(x, 1)

expand_layer = Lambda(expand, output_shape=lambda s: (s[0], 1, s[1]))
for t in range(maxlen):
    input_t = get_indexer(t)(inputs)  # basically input_t = inputs[:, t, :]
    input_t = expand_layer(input_t)
    output_t, h, c = rnn(input_t, initial_state=state)
    state = h, c
    states.append(state)
    outputs.append(output_t)

Caveat - Ignores masking.

farizrahman4u on 11 Feb 2018

👍3

All 17 comments

During prediction you can get the states for each time step by unrolling the RNN - basically you do a for loop over the LSTMCell instead using TF/Theano scan ops which are called by K.rnn.

maxlen = 10
input_dim = 10
units = 5

inputs = Input((maxlen, input_dim))

rnn = LSTM(units, return_state=True)

states = [] # list of (h, c) tuples
outputs = []

state = None
def get_indexer(t):
    return Lambda(lambda x, t: x[:, t, :], arguments={'t':t}, output_shape=lambda s: (s[0], s[2]))

def expand(x):
    return K.expand_dims(x, 1)

expand_layer = Lambda(expand, output_shape=lambda s: (s[0], 1, s[1]))
for t in range(maxlen):
    input_t = get_indexer(t)(inputs)  # basically input_t = inputs[:, t, :]
    input_t = expand_layer(input_t)
    output_t, h, c = rnn(input_t, initial_state=state)
    state = h, c
    states.append(state)
    outputs.append(output_t)

Caveat - Ignores masking.

farizrahman4u on 11 Feb 2018

👍3

Thanks for your answer, I think this will help me. Did you test your code? Unfortunately, it doesn't work: output_t, (h, c) = cell(input_t, (h, c)) TypeError: __call__() takes 2 positional arguments but 3 were given.

LuckyOwl95 on 11 Feb 2018

updated

farizrahman4u on 12 Feb 2018

👍1

Again, thank you so much! Now there's
File "...\engine\topology.py", line 717, in _add_inbound_node output_tensors[i]._keras_shape = output_shapes[i]
AttributeError: 'tuple' object has no attribute '_keras_shape'

LuckyOwl95 on 12 Feb 2018

After each timestamp, here is an exemplary code for accessing all of the states:

import keras.backend as K
statesAll=[]
for layer in model.layers:
    if getattr(layer,'stateful',False):
        if hasattr(layer,'states'):
            for state in layer.states:
               statesAll.append(K.get_value(state))

xiping-fu on 13 Feb 2018

Hmm.. seems Cells cant be called as such, I have updated to to use LSTM layer instead. Can you try now?

farizrahman4u on 13 Feb 2018

❤1

Thank you so much for your patience and support! It doesn't return any error, so I will try to modify it for my problem. Thank you! :)

LuckyOwl95 on 13 Feb 2018

@brain1995 @farizrahman4u Hi, I have another relevant question that how can I compute the output of every LSTMCell with each word of an sentence and optimize on mini-batch finally. Specifically, now the output_t has the batch dimension, but in each time-step I only want to get the single LSTMCell output. The reason is that I want to control if the word needs to participate in the LSTM update. And Finally I want to add all the losses of a batch to optimize. Just like the pseudo-code below:

maxlen = 10
input_dim = 10
units = 5
batch_size = 32

inputs = Input((maxlen, input_dim))

rnn = LSTM(units, return_state=True)

states = [] # list of (h, c) tuples
outputs = []

state = None
def get_indexer(t):
    return Lambda(lambda x, t: x[:, t, :], arguments={'t':t}, output_shape=lambda s: (s[0], s[2]))

def expand(x):
    return K.expand_dims(x, 1)

def decision(x): #Just an example, maybe more complex in application.
    return np.random.choice([0,1])


expand_layer = Lambda(expand, output_shape=lambda s: (s[0], 1, s[1]))
for i in range(batch_size): #Here, I want to compute the LSTMCell by each sample
    for t in range(maxlen):
        input_t = get_indexer(t)(inputs)  # my hope: input_t = inputs[i, t, :]
        input_t = expand_layer(input_t)
        des = decision(input_t)
        if des == 0:
            continue# if the word input has no contribution decided by `decision` function, it will not participate in the LSTM updating.
        output_t, h, c = rnn(input_t, initial_state=state)
        state = h, c
        states.append(state)
        outputs.append(output_t)

Therefore, where can I modify the example code above. Thanks!

Imorton-zd on 24 Feb 2018

@farizrahman4u is _state_ supposed to be a list? Also, where do you provide the desired training/testing set to the model? Thanks for the help!

PedroFerreiradaCosta on 26 Feb 2018

What does the function "Lambda" mean above ?

ghost on 24 Mar 2019

I knew it, thanks .

ghost on 25 Mar 2019

@farizrahman4u Could you help me translate your chunk of code into tensorflow? I was trying several options but still got an error:

import keras
from keras.layers import Input, LSTM, Lambda
import tensorflow as tf

maxlen = 10
input_dim = 10
units = 5

inputs = Input((maxlen, input_dim), dtype = tf.float32)

rnn = LSTM(units, return_state=True)

def get_indexer(t):
    return Lambda(lambda x, t: x[:, t, :], arguments={'t':t}, output_shape=lambda s: (s[0], s[2]))

def expand(x):
    return keras.backend.expand_dims(x, 1)

expand_layer = Lambda(expand, output_shape=lambda s: (s[0], 1, s[1]))
state = tf.Variable(tf.zeros([10]))
states = tf.TensorArray(dtype=tf.float32, size=0, dynamic_size=True, name='states')
iters = tf.constant(10, name='iters')

def cond(i, iters, states):
    return tf.less(i, iters)

def body(i, iters, states):
    input_t = get_indexer(i)(inputs)  # basically input_t = inputs[:, t, :]
    input_t = expand_layer(input_t)
    output_t, h, c = rnn(input_t, initial_state=state)
    temp_state = h, c
    assign_op = tf.assign(state, temp_state)
    states = states.write(step, state)
    return states

states = tf.while_loop(cond, body, [0, iters, states])

ValueError: Initializer for variable while_18/lstm_6/kernel/ is from inside a control-flow construct, such as a loop or conditional. When creating a variable inside a loop or conditional, use a lambda as the initializer.

Nicolabo on 26 Apr 2019

I have exactly the same question !!!
And I don’t know how to use K.rnn to do this? Can you give me a example?

alexmaehon on 10 Nov 2019

Please tell me how to use K.rnn to get those states from Lambda layer

alexmaehon on 10 Nov 2019

a burning question for a newbie

alexmaehon on 10 Nov 2019

or how to input actual values into this Lambda layer

alexmaehon on 10 Nov 2019

During prediction you can get the states for each time step by unrolling the RNN - basically you do a for loop over the LSTMCell instead using TF/Theano scan ops which are called by K.rnn.

maxlen = 10
input_dim = 10
units = 5

inputs = Input((maxlen, input_dim))

rnn = LSTM(units, return_state=True)

states = [] # list of (h, c) tuples
outputs = []

state = None
def get_indexer(t):
    return Lambda(lambda x, t: x[:, t, :], arguments={'t':t}, output_shape=lambda s: (s[0], s[2]))

def expand(x):
    return K.expand_dims(x, 1)

expand_layer = Lambda(expand, output_shape=lambda s: (s[0], 1, s[1]))
for t in range(maxlen):
    input_t = get_indexer(t)(inputs)  # basically input_t = inputs[:, t, :]
    input_t = expand_layer(input_t)
    output_t, h, c = rnn(input_t, initial_state=state)
    state = h, c
    states.append(state)
    outputs.append(output_t)

Caveat - Ignores masking.

Is the below results I get from model.predict is the state of each time steps ???

time_steps = 11
input_dim = 17
units = 128

inputs = Input((time_steps, input_dim))

rnn = GRU(units, return_state=True)

states = [] # list of (h, c) tuples
outputs = []

state = None
def get_indexer(t):
return Lambda(lambda x, t: x[:, t, :],
arguments={'t':t},
output_shape=lambda s: (s[0], s[2]))

def expand(x):
return K.expand_dims(x, 1)

expand_layer = Lambda(expand, output_shape=lambda s: (s[0], 1, s[1]))
for t in range(time_steps):
input_t = get_indexer(t)(inputs) # basically input_t = inputs[:, t, :]
input_t = expand_layer(input_t)
output_t, h = rnn(input_t, initial_state=state)
state = h
states.append(state)
outputs.append(output_t)

modelX = Model(inputs,states)
every_time_step_states = modelX.predict(My_real_data_input)

alexmaehon on 10 Nov 2019

Was this page helpful?

0 / 5 - 0 ratings