Keras: Integrating Recurrentshop functionalities to Keras

Created on 20 Apr 2017 · 38Comments · Source: keras-team/keras

This place is for discussing how to add Recurrentshop's features to Keras.

Recurrentshop is a framework for building complex RNNs using Keras.

https://github.com/datalogai/recurrentshop

See readme and docs/ for more info on the functionalities that recurrentshop provides.

A couple of Caveats :

When unroll=False we can not have a Dropout layer in the RNN on theano backend. This is some issue on theano's end. RandomStreams op is not supported inside scan it seems. I tried feeding the updates from scan back to the model's updates, didn't help. Perhaps @nouiz can help?
Teacherforcing not working in tensorflow. Something about the ground truth tensor being created in a different frame. I don't know what that means. @fchollet?

Interested folks please go through the documentation and post your thoughts here. Once we have a well defined roadmap, we can get to the actual work.

Cheers!

@fchollet @EderSantana @abhaikollara @malaikannan @Joshua-Chin

Enhancement contributions welcome

Source

farizrahman4u

🎉3 👍1

Most helpful comment

Other features : (Copy pasting from docs)

Decoder

Here we decode a vector into a sequence of vectors. The input could also be a sequence, such as in the case of Attention models, where the whole input sequence is available to the RNN at every time step

In this case, input to rnn is a 2d vector, not a sequence

rnn = RecurrentSequential(decode=True, output_length=10)
rnn.add(SimpleRNNCell(25, input_dim=20))

x = Input((20,))
y = rnn(x)

print(K.int_shape(y))  # >> (None, 10, 25)

Readout

Readout lets you feed the output of your RNN from the previous time step back to the current time step.

Readout in RecurrentSequential

rnn = RecurrentSequential(readout='add')
rnn.add(LSTMCell(10, input_dim=10))
rnn.add(GRUCell(10))
rnn.add(SimpleRNNCell(10))

The output from the previous time step will be added to the current input. Other modes available are : mul, avg, max. (Note : since these are elem-wise ops, output shape and input shape of the RNN should be the same.)

Readout in RecurrentModel

In case you want to do something more complex than just merge your readout with input, you can wire things up with functional API and use RecurrentModel to build your RNN.

x = Input((10,))
h_tm1 = Input((10,))
c_tm1 = Input((10,))
readout_input = Input((10,))


# Here, I simply add half the readout to the input.. you can do whatever you want.

readout_half = Lambda(lambda x: 0.5 * x)(readout_input)
lstms_input = add([x, readout_half])

# Deep LSTM
depth = 3

cells = [LSTMCell(10) for _ in range(depth)]

lstms_output, h, c = lstms_input, h_tm1, c_tm1


for cell in cells:
    lstms_output, h, c = cell([lstms_output, h, c])

y = lstms_output

rnn = RecurrentModel(input=x, initial_states=[h_tm1, c_tm1], output=y, final_states=[h, c], readout_input=readout_input)

State initialization

All states are by default initialized by zeros. If you want to use a different distribution (such as random normal) you can use the state_initializer argument available for both RecurrentModel and RecurrentSequential.

Here the random_normal distribution will be used to initialize all the 3 states of the RNN (2 from LSTM and 1 from GRU) :

rnn = RecurrentSequential(state_initializer='random_normal')
rnn.add(LSTMCell(10, input_dim=5))
rnn.add(GRUCell(10))

Here the random_normal distribution will be used to initialize the first state of the RNN. The rest will be initializer with zeros:

rnn = RecurrentSequential(state_initializer=['random_normal'])
rnn.add(LSTMCell(10, input_dim=5))
rnn.add(GRUCell(10))

Here each state gets a different initializer. Note that we have specified the batch dimension as well.. this is because glorot_uniform initialization in Keras does not support symbolic shapes:

rnn = RecurrentSequential(state_initializer=['random_normal', 'zeros', 'glorot_uniform'])
rnn.add(LSTMCell(10, batch_input_shape=(32, 10))
rnn.add(GRUCell(10))

State Sync

In a RecurrentSequential, each of the cells are given a seperate state space. For e.g:

rnn = RecurrentSequential()
rnn.add(GRUCell(10, input_dim=5))
rnn.add(Dense(5))
rnn.add(LSTMCell(10))
print(rnn.num_states)  # >> 3

GRUCell : 1 state
Dense : Not an RNNCell, so 0 states
LSTMCell : 2 states
Total : 3

Now with state sync you can have a common state space for all the cells in a RecurrentSequential.
For this, all RNNCells in the RecurrentContainer should be state homogeneous, i.e, they should all have the same number of states and corresponding states should have same shapes. Which means you can not have both LSTMCell and GRUCell in a state synced RecurrentSequential because LSTMCells have 2 states wheareas GRUCells have only 1 state.

rnn = RecurrentSequential(state_sync=True)
rnn.add(LSTMCell(10, input_dim=5))
rnn.add(LSTMCell(10))
rnn.add(LSTMCell(10))
rnn.add(LSTMCell(10))
print(rnn.num_states)  # >> 2

All the LSTMCells share the same set of 2 states.

Teacher forcing

One issue you might face with RNNs with readout is the accumulation of error. On every time step, you feed in the output of the previous time step; but during training, the output of the previous time step (which is usually a probability distribution for a class label) will not be accurate initially. Over long sequences, this error might accumulate and lead to poor performance.

You can rectify this issue by teacher forcing, where you feed the ground truth at the previous time step to the current time step as readout (instead of the prediction at previous time step).

rnn = RecurrentSequential(readout='add', teacher_force=True, return_sequences=True)
rnn.add(LSTMCell(10, input_dim=10))
rnn.add(LSTMCell(10))
rnn.add(LSTMCell(10))
rnn.add(Activation('softmax'))


x = Input((7, 10))
y_true = Input((7, 10))  # This is where you feed the ground truth values

y = rnn(x, ground_truth=y_true)

model = Model([x, y_true], y)

model.compile(loss='categorical_crossentropy', optimizer='sgd')

# Training

X_true = np.random.random((32, 7, 10))
Y_true = np.random.random((32, 7, 10))

model.fit([X_true, Y_true], Y_true)  # Note that Y_true is part of both input and output


# Prediction

X = np.random.random((32, 7, 10))

model.predict(X)  # >> Error! the graph still has an input for ground truth.. 

zeros = np.zeros((32, 7, 10)) # Need not be zeros.. any array of same shape would do

model.predict([X, zeros])

farizrahman4u on 29 Apr 2017

👍2

All 38 comments

Please post a list of the API additions you propose for Keras (note that they can only be additions, since the current API must be supported for the long term).

fchollet on 20 Apr 2017

I was thinking we make everything cells + container based? Existing recurrent layers (LSTM, GRU) would simply map to Containers with a single cell, and we keep the base Recurrent class so that custom layers of users are not broken.

farizrahman4u on 20 Apr 2017

Or simply, we replace recurrent.py with directory recurrent. We add the files from Recurrentshop to this directory. Also we move the contents of recurrent.py to __init__.py of the new directory.

farizrahman4u on 20 Apr 2017

Hi @farizrahman4u, I'm the author of PR#6314. For the RNN network that I was working on I needed to have access to the states from the network to make a prediction from them and then update the next input (t + 1) based on that predicted value. In order to accomplish that I made a new LSTM Layer that only returns the state from the current timepoint called TimeStepLSTM. The TimeStepLSTM uses a similar call statement as the regular LSTM but uses K.single_step_rnn() from the backend.

In my original design of the network, I was working with RecurrentShop and designed the Layer as a recurrent container called TimeStepRecurrentContainer and added an LSTMCell to it. The advantage of that approach is that you can use a GRUCell or LSTMCell. When I built this version, Keras was not at 2.0.2 yet and upgrading to that broke my code... In cleaning up stuff I found it much neater and tidier just to build the TimeStepLSTM within Keras as a Layer (I wasn't actually using a Seq2Seq style, so it was just added complexity for what I wanted).

In regards to your above posts, I think that long term going towards Contrainers and Cells would be the best approach. I want to experiment with deep RNNs for my problem, which I could accomplish by feeding 1 RNN with return_sequences=True into another RNN, but the stacked cell implementation provided by RecurrentShop makes much more sense and is easier to read and understand. Going forward with my code, I wanted to suggest turning my TimeStepLSTM into a TimeStep wrapper to be able to apply to other types of RNN components (LSTM or Simple or GRU). This fits with the Container method quite nicely.

slaterb1 on 27 Apr 2017

👍2

When unroll=False we can not have a Dropout layer in the RNN on theano backend. This is some issue on theano's end. RandomStreams op is not supported inside scan it seems. I tried feeding the updates from scan back to the model's updates, didn't help. Perhaps @nouiz can help?

dropout inside RNNs is not the best practice anyways. we should be fine here.

I was thinking we make everything cells + container based? Existing recurrent layers (LSTM, GRU) would simply map to Containers with a single cell, and we keep the base Recurrent class so that custom layers of users are not broken.

Let me tangent a little here: can we have Lambda Containers instead? i.e Lambda layers + weights. This might be too crazy but beyond Recurrentshop, this might allow us to use layers and models written with any other lower level framework. Essentially we would be exposing more instead of abstracting away.

More user focused questions for an RNN API would be:

1) Keras historically made it easy to do the main things in Deep Learning. Next, we may ask how do we support Seq2Seq (with and without attention)? What are the smallest changes we can make and still support the main things?

2) What the DyNet people did right? How did they get the RNN/Language people? On that respect, there must be something we should learn from them.

EderSantana on 28 Apr 2017

Hi Eder!

To implement Seq2seq, we need the following things:

All cells in a stack (encoder and decoder) would share the same state.
Decoder RNNs (currently we use RepeatVector layer.. not good).
Readout
Teacher forcing
RNNs should have return_states option.

Regarding Lambda containers.. I don't know.. serialization, collecting weights, updates... would be a mess.. Maybe possible in the TF-only version.

One simple thing you could add to Keras to completely eliminate custom layers : A Weight layer, wrapped by a weight function. It would simply output a weight. Now you can wire your custom layer logic using functional API.

farizrahman4u on 28 Apr 2017

One simple thing you could add to Keras to completely eliminate custom layers : A Weight layer, wrapped by a weight function. It would simply output a weight. Now you can wire your custom layer logic using functional API.

This is what I do with! I use Embedding layer though :D It would be nice if that had a better interface...

To implement Seq2seq, we need the following things:

So that is what Recurrentshop offers right now, right? So the idea would be to write the LSTM layer as a wrapper of the cells? This way the API keeps the same and we can reuse the cell code elsewhere for the Recurrent model. That sounds sensible.

How about we start with Recurrentshop as a contrib or namespace? People can git clone recursive and start using it. Just getting the full audience to try Recurrentshop would help us figure what are the main needs and difficulties.

EderSantana on 28 Apr 2017

So the idea would be to write the LSTM layer as a wrapper of the cells

Either we would leave the 3 recurrent layers as it is.. or they will map to containers with a single cell.. The earlier would be easier and would ensure total backward compatibility.

How about we start with Recurrentshop as a contrib or namespace?

That would be the easiest solution. That way, nothing breaks. We would have to first match the code, docs quality and test coverage of recurrentshop with that of Keras.. but that's just refactoring and should't take long.

@fchollet What do you think?

farizrahman4u on 28 Apr 2017

Can we start with a list of APIs from RecurrentShop you plan on introducing in Keras? Then we can discuss what changes we should make and what potential incompatibilities there might be.

One constraint I want us to keep in mind is that on the TF-Keras side, there are plans to "merge" TF RNNCells (currently in tf.contrib) and Keras RNN layers, in particular by having RNNCells subclass Layer (ongoing) and having Keras RNN layers rely on RNNCells internally. The choices we make for multi-backend Keras should stay close to these ideas, even if we don't rely on TF RNNCells here.

fchollet on 29 Apr 2017

Can we start with a list of APIs from RecurrentShop you plan on introducing in Keras?

Cool.

First of all, Cells. A cell is a layer which works on single timestep. It accepts a list of tensors [x_t, state1_tm1, state2_tm1, ..] and outputs [y_t, state1_t, state2_t, ...]. Checkout code for LSTMCell and GRUCell here and here.
RecurrentSequential : This is the recurrent analog of the Sequential model. Basically, you can stack cells (and other layers such as Dense and Activation) using .add().

rnn = RecurrentSequential(unroll=False, return_sequences=False)
rnn.add(SimpleRNNCell(10, input_dim=5))
rnn.add(LSTMCell(12))
rnn.add(Dense(5))
rnn.add(GRU(8))

RecurrentModel : This gives the user more flexibility than simply stacking cells. You can wire your custom RNN using functional API, and convert it to a standard Recurrent instance. For e.g . you could write a SimpleRNN from scratch :

# The RNN logic is written using Keras's functional API.
# Which means we use Keras layers instead of theano/tensorflow ops
from keras.layers import *
from keras.models import *
from recurrentshop import *

x_t = Input(5,)) # The input to the RNN at time t
h_tm1 = Input((10,))  # Previous hidden state

# Compute new hidden state
h_t = add([Dense(10)(x_t), Dense(10, use_bias=False)(h_tm1)])

# tanh activation
h_t = Activation('tanh')(h_t)

# Build the RNN
rnn = RecurrentModel(input=x_t, initial_states=[h_tm1], output=h_t, output_states=[h_t])

# rnn is a standard Keras `Recurrent` instance. RecuurentModel also accepts arguments such as unroll, return_sequences etc

# Run the RNN over a random sequence

x = Input((7,5))
y = rnn(x)

model = Model(x, y)
model.predict(np.random.random((7, 5)))

You can also reuse cells in RecurrentModels. Checkout this ensemble of LSTM and GRU :

from recurrentshop import *
fom keras.layers import *
from keras.models import Model

input = Input((5,))
state1_tm1 = Input((10,))
state2_tm1 = Input((10,))
state3_tm1 = Input((10,))

lstm_output, state1_t, state2_t = LSTMCell(10)([input, state1_tm1, state2_tm1])
gru_output, state3_t = GRUCell(10)([input, state3_tm1])

output = add([lstm_output, gru_output])
output = Activation('tanh')(output)

rnn = RecurrentModel(input=input, initial_states=[state1_tm1, state2_tm1, state3_tm1], output=output, final_states=[state1_t, state2_t, state3_t])

farizrahman4u on 29 Apr 2017

👍2

Other features : (Copy pasting from docs)

Decoder

In this case, input to rnn is a 2d vector, not a sequence

rnn = RecurrentSequential(decode=True, output_length=10)
rnn.add(SimpleRNNCell(25, input_dim=20))

x = Input((20,))
y = rnn(x)

print(K.int_shape(y))  # >> (None, 10, 25)

Readout

Readout lets you feed the output of your RNN from the previous time step back to the current time step.

Readout in RecurrentSequential

rnn = RecurrentSequential(readout='add')
rnn.add(LSTMCell(10, input_dim=10))
rnn.add(GRUCell(10))
rnn.add(SimpleRNNCell(10))

Readout in RecurrentModel

In case you want to do something more complex than just merge your readout with input, you can wire things up with functional API and use RecurrentModel to build your RNN.

x = Input((10,))
h_tm1 = Input((10,))
c_tm1 = Input((10,))
readout_input = Input((10,))


# Here, I simply add half the readout to the input.. you can do whatever you want.

readout_half = Lambda(lambda x: 0.5 * x)(readout_input)
lstms_input = add([x, readout_half])

# Deep LSTM
depth = 3

cells = [LSTMCell(10) for _ in range(depth)]

lstms_output, h, c = lstms_input, h_tm1, c_tm1


for cell in cells:
    lstms_output, h, c = cell([lstms_output, h, c])

y = lstms_output

rnn = RecurrentModel(input=x, initial_states=[h_tm1, c_tm1], output=y, final_states=[h, c], readout_input=readout_input)

State initialization

Here the random_normal distribution will be used to initialize all the 3 states of the RNN (2 from LSTM and 1 from GRU) :

rnn = RecurrentSequential(state_initializer='random_normal')
rnn.add(LSTMCell(10, input_dim=5))
rnn.add(GRUCell(10))

Here the random_normal distribution will be used to initialize the first state of the RNN. The rest will be initializer with zeros:

rnn = RecurrentSequential(state_initializer=['random_normal'])
rnn.add(LSTMCell(10, input_dim=5))
rnn.add(GRUCell(10))

Here each state gets a different initializer. Note that we have specified the batch dimension as well.. this is because glorot_uniform initialization in Keras does not support symbolic shapes:

rnn = RecurrentSequential(state_initializer=['random_normal', 'zeros', 'glorot_uniform'])
rnn.add(LSTMCell(10, batch_input_shape=(32, 10))
rnn.add(GRUCell(10))

State Sync

In a RecurrentSequential, each of the cells are given a seperate state space. For e.g:

rnn = RecurrentSequential()
rnn.add(GRUCell(10, input_dim=5))
rnn.add(Dense(5))
rnn.add(LSTMCell(10))
print(rnn.num_states)  # >> 3

GRUCell : 1 state
Dense : Not an RNNCell, so 0 states
LSTMCell : 2 states
Total : 3

rnn = RecurrentSequential(state_sync=True)
rnn.add(LSTMCell(10, input_dim=5))
rnn.add(LSTMCell(10))
rnn.add(LSTMCell(10))
rnn.add(LSTMCell(10))
print(rnn.num_states)  # >> 2

All the LSTMCells share the same set of 2 states.

Teacher forcing

You can rectify this issue by teacher forcing, where you feed the ground truth at the previous time step to the current time step as readout (instead of the prediction at previous time step).

rnn = RecurrentSequential(readout='add', teacher_force=True, return_sequences=True)
rnn.add(LSTMCell(10, input_dim=10))
rnn.add(LSTMCell(10))
rnn.add(LSTMCell(10))
rnn.add(Activation('softmax'))


x = Input((7, 10))
y_true = Input((7, 10))  # This is where you feed the ground truth values

y = rnn(x, ground_truth=y_true)

model = Model([x, y_true], y)

model.compile(loss='categorical_crossentropy', optimizer='sgd')

# Training

X_true = np.random.random((32, 7, 10))
Y_true = np.random.random((32, 7, 10))

model.fit([X_true, Y_true], Y_true)  # Note that Y_true is part of both input and output


# Prediction

X = np.random.random((32, 7, 10))

model.predict(X)  # >> Error! the graph still has an input for ground truth.. 

zeros = np.zeros((32, 7, 10)) # Need not be zeros.. any array of same shape would do

model.predict([X, zeros])

farizrahman4u on 29 Apr 2017

👍2

@fchollet

farizrahman4u on 29 Apr 2017

This looks like good API extensions. I don't see any breaking changes to the API we already have.

Questions:
1) how do the Recurrent models work with the other models? Say, can I use the output of Model as the state initialization?

2) Can I use a Sequential or a Model as part of a RecurrentModel or vice-versa (you know... users...)?

Teacher forcing

We already support teacher forcing in the default Keras RNN behavior, right? The problem is to turn off teacher forcing during test time. In the case above, if I use teacher forcing during training, how do I turn it off for testing?

For continuous prediction, an alternative solution to teacher forcing in practice are things like curriculum learning and/or hidden state warmup (run the RNN for a few steps to get a hidden state, but not backprop through those states) Here we go back to question (1) above about using hidden states coming from another model.

Conclusions:
Overall this looks really impressive! So impressive it might be a bit overwhelming at first. I'd roll things out slowly, around examples. For instance, first we write a proper char-RNN and add the features required. Later a Seq2Seq for machine translation adding the features required. Than previous with attention. Dynamic neural computers, etc... We should write a blog post introducing each new feature. This is gonna be great!!!

EderSantana on 3 May 2017

how do the Recurrent models work with the other models?

Works flawlessly with core Keras components.

Say, can I use the output of Model as the state initialization?

Yes. (FYI, this is possible with core Keras too)

Can I use a Sequential or a Model as part of a RecurrentModel or vice-versa (you know... users...)?

Yes!

We already support teacher forcing in the default Keras RNN behavior, right?

Teacher forcing can't be properly done in Keras right now, because there is no readout.. (A deep LSTM is simply a stack of independent LSTM layers, and information can only flow in one direction)

The problem is to turn off teacher forcing during test time. In the case above, if I use teacher forcing during training, how do I turn it off for testing?

It will turn off automatically. (Similar to how dropout works.)

farizrahman4u on 3 May 2017

RecurrentModel, RecurrentSequential

I don't think we need RecurrentSequential or any such additional classes.

In general, the "class zoo" paradigm is not conductive to a great UX; better to have fewer classes and to combine them in a modular way. Truth be told, we shouldn't even have Sequential, it's there for historical reasons.

It seems like we have the following API option:

Recurrent(inputs=list_of_inputs,
          initial_states=list_of_initial_states,
          outputs=list_of_outputs,
          final_states=list_of_final_states)

[Side note: we have one problem, which is that we need the keywords inputs and outputs to match Model, but we also need initial_state to match call in recurrent layers, which itself is meant for compatibility with TF RNNs. Singular/plural mismatch :(]

Here's another possibility:

Recurrent(step_function, initial_states=list_of_states)

The step_function would be allowed to create layers, and the weights/updates/etc of these layers would be automatically collected. The advantage of such a method is that you would not need to already create a graph of ops in order to instantiate your recurrent model/layer --the ops are created when calling step_function. Having garbage ops in the graph that end up never being used is a pretty serious problem with Keras model templating right now.

One more possibility: using Model/Sequential to handle layer topology, and having a wrapper class to turn it into a recurrent layer. We would just need to figure out how to specify states (among inputs/outputs).

cells

Cells are good to have; they should be API-compatible with RNNCells in tf.contrib. Since RNNCells are already subclassing Layer, that should be totally doable.

Other features

We will consider that later.

fchollet on 3 May 2017

One more possibility: using Model/Sequential to handle layer topology, and having a wrapper class to turn it into a recurrent layer.

That is exactly how it works right now. RecurrentModel is a wrapper around a standard Keras model. (You can access this model using recurrent_model.model. )

There is also a RNNCellFromModel method which will return an cell given a Keras model.

farizrahman4u on 3 May 2017

👍 for these features to be integrated in Keras!

tboquet on 3 May 2017

👍1

I think there is a problem with characterizing a RNN via:

inputs=list_of_inputs,
initial_states=list_of_initial_states,
outputs=list_of_outputs,
final_states=list_of_final_states

This problem is that RNNs are not isomorphic to generic NNs in the functional API; in the functional API the input set and output set maybe well be completely different, there is no guarantee of 1:1 mapping between inputs and outputs. In RNNs however, there should be a 1:1 mapping between inputs and outputs as well as initial_states and final_states. Hence specifying all of these seems redundant.

Arguably what really characterizes a RNN is:

initial_state(s)
step function

Everything else can be derived from these.

Thus I would propose:

y = Recurrent(RNNCell(units=...))(list_of_inputs, initial_state=list_of_initial_states)

The RNNCell would be a step function, i.e. any layer-like instance (could be a model or a layer) that accepts a list of tensors [2d_input, 2d_state_1, 2d_state_2, ...] and returns [2d_output, new_2d_state_1, new_2d_state_2, ...]

e.g.

inp_2d = Input(...)
state = Input(...)
x = Dense(...)(inp_2d)
s = Dense(...)(state)
output, new_state = SimpleRNNCell(...)(x, state=s)
# also works: output, new_state = SimpleRNNCell(...)([x, s])
rnn_cell = Model([inp, state], [output, new_state])
rnn = Recurrent(rnn_cell)
output_3d = rnn(input_3d, initial_state=None)  # states that are not specified are 0-initialized?

What would be some defects and limitations of such an approach?

fchollet on 8 May 2017

👍1

There is no redundancy in the current recurrentshop API.

current RS API:
rnn = RecurrentModel(input, output, initial_states, output_states)
In the API you propose:
rnn_cell = Model([input] + initial_states, [output] + final_states)
rnn = Recurrent(rnn_cell)

Same set of information (tensors) is used to build the RNN in both cases.

What would be some defects and limitations of such an approach?

The issue is when you factor in features like readout + teacher forcing. In this case the rnn will take an extra input state, and I can't figure out a neat way to do that in the proposed API.

https://github.com/datalogai/recurrentshop/blob/master/docs/readout.md
https://github.com/datalogai/recurrentshop/blob/master/docs/teacher_force.md

Even if we are not adding these features right now, we should be able to accommodate them in the feature without API changes.

states that are not specified are 0-initialized?

Yes, also see https://github.com/datalogai/recurrentshop/blob/master/docs/state_init.md

Also, regarding RecurrentSequential, it can let you do a lot of things which can be hard to do using RecurrentModel. Are you sure about ditching it?

farizrahman4u on 8 May 2017

Also, regarding RecurrentSequential, it can let you do a lot of things which can be hard to do using RecurrentModel. Are you sure about ditching it?

Can you give a couple of use cases? I find it hard to wrap my head around "cell stacking": it's easy to understand stacking when you always have a single input and a single output (i.e. Sequential), but for RNNs, what happens with states? A RNN cell always has at least 2 inputs and 2 outputs (i.e. [input, state]).

In general I am not a big fan of Sequential (outside of the context of RNNs): if Keras had been written with the functional API on day one, I would not have included it (conceptually we shouldn't need both a more general and a less general way to do the same things).

fchollet on 8 May 2017

I just want to offer my perspective on the current state of this discussion (This is partially to offer my 2 cents and better understand where we are at).

Note: Just reviewed RecurrentShop and apparently it has changed a lot since I was using a few months ago... I'm leaving most of the points I've written because I think it gets to the bottom of the issue we are currently on.

Having reworked the source code from both Keras and RecurrentShop / Seq2Seq to implement my TimeStepLSTM, I ultimately went with my Keras solution because the code was much neater and easier to trace and understand. The beauty of using Keras as a a Deep Learning toolset is that each more complicated Layer has a simple set of "tweaks" to override the "basic" instructions it inherits from its parent and its parent's parent, etc. I know that RecurrentShop does something similar, but it's more complicated in that it is Cells within Containers (which are pseudo stacked Layers, kinda like Model but not). Note1: this is not an attack on RecurrentShop I'm just giving a user perspective. Note2: I see that RecurrentShop is using Recurrent Models instead of Containers now.

What I like about the solution that @fchollet just proposed is that it takes the complexity of RecurrentShop and simplifies the logic to mesh it better into Keras (although I don't love the Recurrent Model within a Model solution).

I haven't tested the solution proposed by @farizrahman4u (earlier in the discussion) that tweaked the cells to return after single steps in the RNN, which is a great solution to incorporate the kind of stuff I want to get out of the RNN, but in reading the code it was more complex and less "user friendly" (I'm not exactly sure how I could add a Dense layer to act on the current state to then update the input sequence before handing it back to the RNN).

From what I've seen in this thread, it seems like we are trying to create a "Recurrent Model" that holds the "Recurrent Cells" that can be embedded in a larger NN Model solution. Is that a correct interpretation? Why was there the switch from Containers to Models?

Instead of building "RNN Models" that get packaged inside the over all Model, could we keep the original Container idea from RecurrentShop and only use Model for the overall network input and output? I think that would mesh the RecurrentShop with Keras a little more seemlessly in terms of the logic.

Current Keras:

"""
in1 = Input(...)
lstm1 = LSTM(units=..., ...)(in1)
out1 = Dense(...)(lstm1)
model = Model(in1, out1)
"""

Proposed "Container" Keras:

"""
in1 = Input(...)
lstm1 = RecurrentContainer(LSTMCell, ...) # can add run instructions such as: initial_states, whether to process all time steps or use single time step methods, etc
out1 = Dense(...)(lstm1)
model = Model(int1, out1)
"""

slaterb1 on 8 May 2017

@fchollet

Can you give a couple of use cases?

When you write a deep LSTM (or any deep RNN), there are 2 ways in which states are managed:

Each cell gets its own state space. Similar to how it works in Keras right now, when you repeatedly do model.add(LSTM(...)) or x = LSTM(...)(x) (Each cell is an independent layer)
The cells share a common state space. Similar to how it is done in the original Seq2seq paper. Can not be done in Keras right now, without writing a custom layer. (In other words, the final state of the nth cell becomes the initial state of the (n+1)th cell)

Implementing the second one can get a bit tricky. Also, you can't easily toggle between each mode for experimentation. In RecurrentSequential, this behavior is controlled by the state_sync argument (True means second case; default is False).

See : https://github.com/datalogai/recurrentshop/blob/master/docs/state_sync.md

Also, things like simple readout (as in Seq2seq) can be hard to correctly implement using Functional API, but with RecurrentSequential, this is can be done with readout=True.

conceptually we shouldn't need both a more general and a less general way to do the same things

But in the RNN context, we actually use the knowledge that the model is linear to do things which are otherwise impossible (or very difficult).

farizrahman4u on 8 May 2017

@slaterb1

Why was there the switch from Containers to Models?

This was to support non-sequential RNNs. This enabled the framework to handle any arbitrary RNN. So now you have RecurrentSequential for simple linear models (Behaves similar to the old container) and RecurrentModel for more complex stuff.

I'm not exactly sure how I could add a Dense layer to act on the current state to then update the input sequence before handing it back to the RNN

you can simply do state = Dense(output_dim)(state). If you are using RecurrentSequential, you can add a dense layer : recurrent_sequential.add(Dense(output_dim)) after you have added your cells.

farizrahman4u on 8 May 2017

Gotcha, I understand now. Thanks for clearing that up @farizrahman4u!

In terms of the "keep or ditch" RecurrentSequential, I would say that it is worth having it because most RNNs are structured in a linear manner (i.e. follow these steps to compute hidden states and outputs). I liked being able to to use recurrent_sequential.add(LSTMCell) to create depth in my RNN network and based on the new implementation of Cells, it would be clear to add the Dense Layer and a Lambda Layer after my LSTMCell to update the sequence within the RecurrentSequential model. It makes the RNN logic ("run instruction set") easier to trace.

Overall, I much prefer the Model functional API over Sequential, but that does not prevent me from calling the RecurrentSequential inside a network structured via the functional API.

slaterb1 on 8 May 2017

👍1

The API

output, new_state = SimpleRNNCell(...)(x, state=s)
# and
y = Recurrent(RNNCell(units=...))(list_of_inputs, initial_state=list_of_initial_states)

Is immediately compatible with TF RNN cells, in particular via the API (x, state=s) (with the latest update, cells are layers, and have a trainable_weight attribute which can be collected by the wrapper model).

Additionally, it is in continuity with the existing API, the only addition being

Recurrent(RNNCell(units=...))

The issue is when you factor in features like readout + teacher forcing.

Can you expand on this with specific examples?

I think the API in RecurrentShop for readout involves a fairly high cognitive load. I.e. even though it reduces the quantity of code needed for specific use cases, it requires non-obvious mental models in order to be used, and will not be self-evident to most users. What we should really try to optimize for is cognitive load itself --reducing lines of codes may not always correlate with a better UX.

The ideal API would one that:

is modular, and thus flexible + expressive
does not require additional mental models besides what Keras already requires
is immediately understandable to the average user, after quickly glancing at a single example
can be memorized and reproduced from mind after just using it a few times

The last point is important --a good UX means extreme simplicity. Sometimes this means less code for the user to write, but sometimes less code is actually a step backwards.

fchollet on 8 May 2017

To give a specific example, I believe it is not possible for the average Keras user to just read the following code snippet and immediately understand what it does (only based on knowledge of the current Keras abstractions and API), without further explanation:

rnn = RecurrentSequential(readout='add')
rnn.add(LSTMCell(10, input_dim=10))
rnn.add(GRUCell(10))
rnn.add(SimpleRNNCell(10))

fchollet on 8 May 2017

Good points, what would an ideal API for readout look like to you?

farizrahman4u on 8 May 2017

Note that you can wire up an RNN with readout using functional API, but readouts are usually used with teacher forcing (otherwise it's just another state), which means the user should then take care of the following things:

Maintain an additional state (in addition to readout), to count what time step it
Slice the t-1 timestep from ground truth, (indexing of tensors is slightly different for backends, so that has to be taken care of)
Use K.in_train_phase to switch between the prediction and last timestep.
Also if timestep is zero, don't index the ground truth(t-1=-1) . Use K.switch to do this, oh..wait you have to cast to int to use K.switch..

This is a lot more cognitive load than rnn = RecurrentSequential(readout='add', teacher_force=True)

farizrahman4u on 9 May 2017

See : https://github.com/datalogai/recurrentshop/blob/master/recurrentshop/engine.py#L641

farizrahman4u on 9 May 2017

I agree on going with Recurrent(RNNCell(units=...)) since it would be compatible TF immediately.

farizrahman4u on 9 May 2017

Going along the lines of "user friendliness" of incorporating RecurrentShop into Keras, I agree that presenting the RecurrentSequential in the way shown by @fchollet would be hard to follow and even more difficult to understand how to customise the RNN functionality.

Could we combine RecurrentShop in a way that does not use the terminology or Layer / Model names from RecurrentShop. I think that would make things more user friendly.

i.e. Include a "CustomRNN" as a Layer (or Model) that the user can populate with Cells via ".add()" in the recurrent.py script?

That way if someone wants to build a regular LSTM, they do not have to unlearn the existing Keras to build it in the new RecurrentShop Keras.

slaterb1 on 9 May 2017

Or does that risk making things even more complicated and more difficult to understand?

slaterb1 on 9 May 2017

Well, regular RNN layers should still work the way they used to, anyway.

farizrahman4u on 9 May 2017

True, but it would be more clear that you do not have to switch to using the RecurrentSequential to build the same kind of network that was built before in Keras.

slaterb1 on 9 May 2017

Where is this issue at?

phuicy on 14 Feb 2018

Since now keras uses a cell-based approach for RNNs, I think we can close this issue. Feel free to comment if something isn't solved and I'll reopen it.

gabrieldemarmiesse on 11 Oct 2018

I am trying to use LSTMCell (and GRUCell) to build a custom RNN. Would be great to see an example of this functionality.

from keras.layers import Input, Dense, LSTMCell, GRUCell
input = Input((5,))
state0 = Input((10,))
state1 = Input((10,))

u, h = LSTMCell(10)( inputs=input, states=[state0, state1] )
model = keras.models.Model( inputs=input, outputs=[ output] )

No success so far. If someone can help out I can make a fully working example with LSTMCell?

Here is my error:

Traceback (most recent call last):
  File "7_rnn_toy_lstmcell.py", line 56, in <module>
    u, h =  LSTMCell( 10 )( inputs=input, states=[state0, state1] )
  File "/usr/local/lib/python2.7/dist-packages/keras/engine/base_layer.py", line 497, in __call__
    arguments=user_kwargs)
  File "/usr/local/lib/python2.7/dist-packages/keras/engine/base_layer.py", line 565, in _add_inbound_node
    output_tensors[i]._keras_shape = output_shapes[i]
AttributeError: 'tuple' object has no attribute '_keras_shape'

mpkuse on 22 Oct 2018

I am trying to use LSTMCell (and GRUCell) to build a custom RNN. Would be great to see an example of this functionality.

from keras.layers import Input, Dense, LSTMCell, GRUCell
input = Input((5,))
state0 = Input((10,))
state1 = Input((10,))

u, h = LSTMCell(10)( inputs=input, states=[state0, state1] )
model = keras.models.Model( inputs=input, outputs=[ output] )

No success so far. If someone can help out I can make a fully working example with LSTMCell?

Here is my error:

Traceback (most recent call last):
  File "7_rnn_toy_lstmcell.py", line 56, in <module>
    u, h =  LSTMCell( 10 )( inputs=input, states=[state0, state1] )
  File "/usr/local/lib/python2.7/dist-packages/keras/engine/base_layer.py", line 497, in __call__
    arguments=user_kwargs)
  File "/usr/local/lib/python2.7/dist-packages/keras/engine/base_layer.py", line 565, in _add_inbound_node
    output_tensors[i]._keras_shape = output_shapes[i]
AttributeError: 'tuple' object has no attribute '_keras_shape'

Have you solved the problem?