Keras: Add dynamic RNN to Tensorflow backend

Created on 8 Apr 2016 · 16Comments · Source: keras-team/keras

Tensorflow currently packs a dynamic RNN function and an experimental scan op. @fchollet, do you think we should add a "scan-based" K.rnn implementation?

I am currently working with some networks that process really long sequences and unrolling the loops in TensorFlow takes forever, but compiling the scan in Theano is fairly quick, so that's an interesting use-case for this. I can run some benchmarks as well.

stale

Source

jfsantos

Most helpful comment

I'll do my best! :)

jfsantos on 8 Apr 2016

👍7

All 16 comments

Yes, let's do this. Keras 1.0 is scheduled for release early next week, do you think you can add the scan-based RNN version to rnn() in tensorflow_backend by then?

fchollet on 8 Apr 2016

I'll do my best! :)

jfsantos on 8 Apr 2016

👍7

Any news on this? I'd like this added asap, so if you already have a PR that'd be great. Otherwise I'll take care of it.

fchollet on 16 May 2016

👍1

Sorry, I did not have time to look into this, as I've been restricted to working with the Theano backend because I need Windows support for my current project.

jfsantos on 17 May 2016

Any updates on this? I'd be willing to give it a shot if @fchollet hasn't had time to look into it yet. The compilation time for long sequences is indeed pretty annoying.

bnaul on 15 Jun 2016

You can go ahead, it's on my backlog but I haven't started working on it
yet.

On 15 June 2016 at 11:03, Brett Naul [email protected] wrote:

Any updates on this? I'd be willing to give it a shot if @fchollet
https://github.com/fchollet hasn't had time to look into it yet. The
compilation time for long sequences is indeed pretty annoying.

—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
https://github.com/fchollet/keras/issues/2228#issuecomment-226269987,
or mute the thread
https://github.com/notifications/unsubscribe/AArWb9ZkPSdT_Gq0eKEe10WwS1Knjo6_ks5qMD5zgaJpZM4ICmaC
.

fchollet on 15 Jun 2016

I'm a bit unsure of how to handle the differences between Theano and Tensorflow scans: my understanding is that Theano's returns both the outputs and states, whereas in Tensorflow's scan would create the list of states and map_fn would turn those into outputs (cf. this for example). Does this mean it's necessary to have two loops calling step_function, one to get the states and one for the outputs? Or would it be better to somehow cram both pieces of information together such that we can get it all in one pass of scan and just unpack it after the fact?

bnaul on 16 Jun 2016

If you had to call step_function twice, wouldn't this require some sort change in step_funcion signature in order to specify whether you are asking for the outputs or for the hidden states? If yes, then I guess this would make the Theano/TensorFlow abstraction a bit harder, wouldn't it?

Can't the outputs and the hidden states be simply concatenated in a single tensor, then unpacked later? I suppose this would be better than calling step_function twice. Plus, any inefficiency that might be caused by this looks like something that could eventually be handled by TensorFlow as its development team goes implementing more graph optimizations.

cesarsouza on 28 Jun 2016

@cesarsouza I haven't looked at the situation so I can't answer you. Have made any progress? Otherwise I will have a shot at it.

fchollet on 29 Jun 2016

@fchollet no progress yet, probably won't be able to til after SciPy in a couple of weeks.

@cesarsouza those are the two options I was referring to above; I agree that concatenating things is better since it'd be far more efficient, it'll just make the code quite hacky-feeling.

bnaul on 29 Jun 2016

Hmmm... So, I have written a minimum implementation (without support for masking) by packing and unpacking the hidden states inside tf.scan. With it I can set unroll=False and leave both the batch size and temporal length unspecified, although currently it only works with consume_less='mem'.

However, since I am quite new to Keras, TensorFlow and recurrent nets altogether, I am not perfectly sure it is entirely correct.

def rnn(step_function, inputs, initial_states,
        go_backwards=False, mask=None, constants=None,
        unroll=False, input_length=None):
    '''Iterates over the time dimension of a tensor.

    # Arguments
        inputs: tensor of temporal data of shape (samples, time, ...)
            (at least 3D).
        step_function:
            Parameters:
                input: tensor with shape (samples, ...) (no time dimension),
                    representing input for the batch of samples at a certain
                    time step.
                states: list of tensors.
            Returns:
                output: tensor with shape (samples, ...) (no time dimension),
                new_states: list of tensors, same length and shapes
                    as 'states'.
        initial_states: tensor with shape (samples, ...) (no time dimension),
            containing the initial values for the states used in
            the step function.
        go_backwards: boolean. If True, do the iteration over
            the time dimension in reverse order.
        mask: binary tensor with shape (samples, time, 1),
            with a zero for every element that is masked.
        constants: a list of constant values passed at each step.
        unroll: with TensorFlow the RNN is always unrolled, but with Theano you
            can use this boolean flag to unroll the RNN.
        input_length: not relevant in the TensorFlow implementation.
            Must be specified if using unrolling with Theano.

    # Returns
        A tuple (last_output, outputs, new_states).

        last_output: the latest output of the rnn, of shape (samples, ...)
        outputs: tensor with shape (samples, time, ...) where each
            entry outputs[s, t] is the output of the step function
            at time t for sample s.
        new_states: list of tensors, latest states returned by
            the step function, of shape (samples, ...).
    '''
    ndim = len(inputs.get_shape())
    assert ndim >= 3, "Input should be at least 3D."
    axes = [1, 0] + list(range(2, ndim))
    inputs = tf.transpose(inputs, (axes))
    if constants is None:
        constants = []

    if unroll:
        states = initial_states
        successive_states = []
        successive_outputs = []

        input_list = tf.unpack(inputs)
        if go_backwards:
            input_list.reverse()

        if mask is not None:
            # Transpose not supported by bool tensor types, hence round-trip to uint8.
            mask = tf.cast(mask, tf.uint8)
            if len(mask.get_shape()) == ndim-1:
                mask = expand_dims(mask)
            mask = tf.cast(tf.transpose(mask, axes), tf.bool)
            mask_list = tf.unpack(mask)

            if go_backwards:
                mask_list.reverse()

            for input, mask_t in zip(input_list, mask_list):
                output, new_states = step_function(input, states + constants)

                # tf.select needs its condition tensor to be the same shape as its two
                # result tensors, but in our case the condition (mask) tensor is
                # (nsamples, 1), and A and B are (nsamples, ndimensions). So we need to
                # broadcast the mask to match the shape of A and B. That's what the
                # tile call does, is just repeat the mask along its second dimension
                # ndimensions times.
                tiled_mask_t = tf.tile(mask_t, tf.pack([1, tf.shape(output)[1]]))

                if len(successive_outputs) == 0:
                    prev_output = zeros_like(output)
                else:
                    prev_output = successive_outputs[-1]

                output = tf.select(tiled_mask_t, output, prev_output)

                return_states = []
                for state, new_state in zip(states, new_states):
                    # (see earlier comment for tile explanation)
                    tiled_mask_t = tf.tile(mask_t, tf.pack([1, tf.shape(new_state)[1]]))
                    return_states.append(tf.select(tiled_mask_t, new_state, state))

                states = return_states
                successive_outputs.append(output)
                successive_states.append(states)

        else: # Mask is None
            for input in input_list:
                output, states = step_function(input, states + constants)
                successive_outputs.append(output)
                successive_states.append(states)

            last_output = successive_outputs[-1]
            outputs = tf.pack(successive_outputs)
            new_states = successive_states[-1]

    else: # Unroll is False

        if mask is not None:
            raise NotImplementedError('Unrolled loops with masking still not implemented.')
        else: # Mask is None
            def _step(prev, input):
                _, new_states = step_function(input, tf.unpack(prev) + constants)
                return tf.pack(new_states)

            results = tf.scan(_step,
                              inputs, 
                              initializer=tf.pack(initial_states), 
                              swap_memory=True, 
                              name='rnn_scan',
                              back_prop=True, 
                              parallel_iterations=10)

            successive_outputs = results[:, 0, :, :]
            successive_states  = results[:, 1:, :, :]

            outputs = successive_outputs
            last_output = tf.reverse(successive_outputs, [True, False, False])[0, :, :]
            new_states  = tf.reverse(successive_states, [True, False, False, False])[0, :, :, :]

    axes = [1, 0] + list(range(2, len(outputs.get_shape())))
    outputs = tf.transpose(outputs, axes)
    return last_output, outputs, new_states

cesarsouza on 4 Jul 2016

It seems to work on the LSTM demo, though:

# ...
print('Build model...')
model = Sequential()
model.add(Embedding(max_features, 128, dropout=0.2))
model.add(LSTM(128, dropout_W=0.2, dropout_U=0.2, unroll=False, consume_less='mem'))  # try using a GRU instead, for fun
model.add(Dense(1))
model.add(Activation('sigmoid'))
# ...

Loading data...
20000 train sequences
5000 test sequences
Pad sequences (samples x time)
X_train shape: (20000, 80)
X_test shape: (5000, 80)
Build model...
____________________________________________________________________________________________________
Layer (type)                     Output Shape          Param #     Connected to                     
====================================================================================================
embedding_1 (Embedding)          (None, None, 128)     2560000     embedding_input_1[0][0]          
____________________________________________________________________________________________________
lstm_1 (LSTM)                    (None, 128)           131584      embedding_1[0][0]                
____________________________________________________________________________________________________
dense_1 (Dense)                  (None, 1)             129         lstm_1[0][0]                     
____________________________________________________________________________________________________
activation_1 (Activation)        (None, 1)             0           dense_1[0][0]                    
====================================================================================================
Total params: 2691713
____________________________________________________________________________________________________
None
Train...
(20000, 80)
(20000,)
/opt/CV_tools/conda/envs/cdesouza/lib/python2.7/site-packages/tensorflow/python/ops/gradients.py:89: UserWarning: Converting sparse IndexedSlices to a dense Tensor of unknown shape. This may consume a large amount of memory.
  "Converting sparse IndexedSlices to a dense Tensor of unknown shape. "
Train on 20000 samples, validate on 5000 samples
Epoch 1/15
 7904/20000 [==========>...................] - ETA: 140s - loss: 0.6624 - acc: 0.5902

cesarsouza on 4 Jul 2016

I am curious if this is still on the roadmap

shagunsodhani on 9 Aug 2017

Feel free to use my implementation if you need, I agree to license it under the MIT for anyone who would like to use it (it has been a while I haven't delved into Keras, but is it the case that Keras still doesn't support dynamic RNNs with the TensorFlow backend? I would find it a bit surprising if it hasn't been implemented after more than a year)

cesarsouza on 12 Aug 2017

It was implemented about a year ago.

fchollet on 12 Aug 2017

@fchollet You are referring to Mask + RNN combination?

shagunsodhani on 12 Aug 2017

Was this page helpful?

0 / 5 - 0 ratings