Keras: Need more flexibility with RNN -- Stateless RNN with arbitrary input length or Statefull RNN with arbitrary batch size

Created on 12 Jun 2016  路  13Comments  路  Source: keras-team/keras

Hi,

I want to train a language model with more flexibilities. I am using Keras with tensorflow so that I can train easily on multiple GPUs and use the nice visualization with tensorboad. But I got several problems.

I looked into two examples.
[Stateless LSTM] https://github.com/fchollet/keras/blob/master/examples/lstm_text_generation.py
[Statefull LSTM] https://github.com/fchollet/keras/blob/master/examples/stateful_lstm.py

In Stateless example, I did not want to fix the input length. Especially in terms of prediction, I want to designate arbitrary length of seed. The example use a seed of the same length as input length (i.e. give the first 40 characters), and predict the characters after that. But I just want to give only one or two characters as a seed. Only way I can think is, pad zeros before the seed, and make it to input length, but it's a waste of resource. So I turned to statefull RNN.

In statefull example, it seems to solve the issue above. BUT, now I need to fix the batch size in advance. That will end up another resource waste. For example, if I need only one batch prediction, I will waste (batch size - 1) predictions by padding zeros.

In short, I want to request stateless RNN with arbitrary input length or statefull RNN with arbitrary batch size.

Best,
Satoshi

stale

Most helpful comment

@Cdfghglz I can confirm that you can use the following approach:

  • training the model with stateless model.
  • copy the weight to the stateful model and use this model to perform the prediction.
    (At least in my case, the prediction from this stageful model give the same result as the original stateless model).

For your reference, I simply use new_model.set_weights(old_model.get_weights) to copy the trained parameters.

All 13 comments

I want to request stateless RNN with arbitrary input length

We have that already.

statefull RNN with arbitrary batch size.

That doesn't make any sense. Hopefully you understand why?

For statefull RNN, I still think it is inconvenient that we need to fix batch size in advance. Here is error message I got when I tried.

Exception: If a RNN is stateful, a complete input_shape must be provided (including batch size).

But I want to use different batch sizes when training and predicting.

As for the stateless RNN with arbitrary input length, I am happy to know Keras already has that function. But I could not figure out how. Can you show me an example?

Please do not forget that I want use Tensorflow backend. When I tried to add LSTM layer without specifying length of inputs, I got this error message.

Exception: When using TensorFlow, you should define explicitly the number of timesteps of your sequences.
If your first layer is an Embedding, make sure to pass it an "input_length" argument. Otherwise, make sure the first layer has an "input_shape" or "batch_input_shape" argument, including the time axis. Found input shape at layer lstm_4: (None, None, 10)

statefull RNN with arbitrary batch size.

That doesn't make any sense. Hopefully you understand why?

Could you give a hint why this does not make sense? Lets say I want to statefully predict incoming real-time data one vector by vector with input shape (1, 1, nr_features), but train with some more reasonable batch sizes, that actually give some results.
Is there currently a way to do this in Keras?

I'm guessing that, since the stateful LSTM takes the last cell/hidden state at sample index i and feeds it into the same LSTM at index i in the next batch/timestep, the number of samples cannot change from batch to batch (i needs to match to i). This doesn't matter too much though, you could always train with some large-enough batch size (for speed) and recompile the model to have batch size 1, and reload the weights (or, just have empty samples and ignore them altogether, no recompiling).

Is that close to what you are asking about? Or rather, is that really what the problem is @fchollet ?

I tried to explain my point in more detail here:

https://groups.google.com/forum/#!msg/keras-users/hwmPWu7Piug/nuJEEvQYBAAJ
(+ correction: i'm using many to many, not many to one )

@phdowling Could you please elaborate more on how to recompile the model to have batch size 1?

Do you mean recompile the model, and simply load the weight from the trained model?

@pvmilk Yes, that's pretty much it. Build a new model with the same architecture, but an input layer with sequence length 1, and make the model stateful. Then, load the stored weights of the model trained before.

You may however want to compare this approach to that of simply choosing a long sequence length, pre-padding with 0s and then shifiting the input left with the new characters added to make new predictions (i.e. the inefficient method). From what I've heard, stateful models do not necessarily achieve the same accuracy for some people, I'm not sure what the status of this is at the moment.

@phdowling thank you very much for a swift reply, and a head-up on stateless vs stateful. I am actually experiencing an exact same issue, where the model learned on stateless, but not on stateful. However, I am too inexperience in both RNN&keras to fire any new issue on that.

I am planning to solve the issue by training on stateless model and then copy the weight to stateful model for real-time prediction. Anyone has any idea/experience whether they are compatible?

They are, that's what I meant in my post before. But I'm not 100% sure that the stateful prediction model will always produce the same output, that's what you should test to make sure.

@pvmilk could you please eventually report whether you succeeded with any of the discussed approaches? It was not my case when copying the weights to a single input model, as is indicated the groups.google post above.

@Cdfghglz I can confirm that you can use the following approach:

  • training the model with stateless model.
  • copy the weight to the stateful model and use this model to perform the prediction.
    (At least in my case, the prediction from this stageful model give the same result as the original stateless model).

For your reference, I simply use new_model.set_weights(old_model.get_weights) to copy the trained parameters.

i would like to use variable batch_size,my code:

`
with tf.variable_scope('forward'):

    cellL = tf.nn.rnn_cell.BasicLSTMCell(hidden_size, forget_bias=1.0)`

`
with tf.variable_scope("init_variable_L"):

   init_L = tf.get_variable("init_L",initializer=tf.zeros([1, 2*hidden_size]))

    state_init_L = tf.tile(init_L, [batch_size, 1])`

`
with tf.variable_scope("RNNL"):

for time_step in range(num_steps-1):
  if time_step > 0:
        tf.get_variable_scope().reuse_variables()
    (cell_output, stateL) = cellL(inputs[:, time_step, :], stateL)
    outputsL = tf.concat(1,[outputsL,cell_output])`

if i replace batch_size with tf.shape(input_data)[0], the error is
ValueError: initial_value must have a shape specified: Tensor("model/init_variable_L/zeros:0", shape=(?, 100), dtype=float32, device=/device:GPU:0)

how could i do?

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs, but feel free to re-open it if needed.

Was this page helpful?
0 / 5 - 0 ratings