Keras: Temporal Pooling for Sequence classification w/ RNNs

Created on 31 Mar 2016 · 12Comments · Source: keras-team/keras

I am using Keras for sequence classification, and I would like to add temporal pooling to the classification task. Thus, rather than handing off only the last hidden layer output from the rnn layer, I would like to hand off the sequence of hidden outputs and pool over the time steps to produce a single temporal component. The input to the pooling layer would have dimensionality (samples, steps, features) and the output would have (samples, features). It should be able to handle masked input. It would look like the following:

X = [[0,0,0,1,2,5], [0,2,7,3,5,1], [0,0,5,5,1,3]] # this is returned from pad_sequences
y = [1, 0, 1]

model = Sequential()
model.add(Embedding(input_dim=10, output_dim=5, mask_zero=True))
model.add(LSTM(output_dim=5, return_sequences=True))
model.add(TemporalPooling())
model.add(Dense(output_dim=1, activation='sigmoid'))
model.compile(loss='binary_crossentropy', optimizer='rmsprop')

model.fit(X, y)

Does anything like this exist? I have written what I think it would look like (this is the max pooling case):

from keras.layers.core import MaskedLayer

class TemporalPooling(MaskedLayer):
    def __init__(self):
        super(MaskedLayer, self).__init__()
        self.input = T.tensor3()

    @property
    def output_shape(self):
        # remove temporal dimension
        return (self.input_shape[0], self.input_shape[2])

    def get_output_mask(self, train=False):
        return None

    def get_output(self, train=False):
        data = self.get_input(train)
        mask = self.get_input_mask(train)
        if mask is None:
            mask = T.sum(T.ones_like(data), axis=-1)
        mask = mask.dimshuffle(0, 1, "x")
        masked_data = T.switch(T.eq(mask, 0), -np.inf, data)

        return masked_data.max(axis=1)

stale

Source

rfeinman

👍1

Most helpful comment

If you masked inputs with all zeros, you can use my code:

from keras.engine.topology import Layer, InputSpec
from theano import tensor as T
import theano
import numpy as np

class TemporalMeanPooling(Layer):
    """
    This is a custom Keras layer. This pooling layer accepts the temporal
    sequence output by a recurrent layer and performs temporal pooling,
    looking at only the non-masked portion of the sequence. The pooling
    layer converts the entire variable-length hidden vector sequence
    into a single hidden vector, and then feeds its output to the Dense
    layer.

    input shape: (nb_samples, nb_timesteps, nb_features)
    output shape: (nb_samples, nb_features)
    """
    def __init__(self, **kwargs):
        super(TemporalMeanPooling, self).__init__(**kwargs)
        self.supports_masking = True
        self.input_spec = [InputSpec(ndim=3)]

    def get_output_shape_for(self, input_shape):
        return (input_shape[0], input_shape[2])

    def call(self, x, mask=None): #mask: (nb_samples, nb_timesteps)
        if mask is None:
            mask = T.mean(T.ones_like(x), axis=-1)
        ssum = T.sum(x,axis=-2) #(nb_samples, np_features)
        rcnt = T.sum(mask,axis=-1,keepdims=True) #(nb_samples)
        rcnt = T.tile(rcnt,x.shape[-1])
        return (ssum/rcnt).astype(theano.config.floatX)
        #return rcnt

    def compute_mask(self, input, mask):
        return None

The test code:

from MeanLayer import *
from keras.models import Sequential
import numpy as np
from keras.layers import Masking

model=Sequential([
    Masking(0.0, input_shape=(5,3)),
    TemporalMeanPooling(),
    ])

model.compile(optimizer='rmsprop', loss='categorical_crossentropy')
A=np.array([[1,2,3],[4,5,6],[0,0,0],[0,0,0],[0,0,0]])
B=np.array([[1,3,0],[4,0,0],[0,0,1],[0,0,0],[0,0,0]])
C=np.array([A,B])
print("The input is "+str(C))
print("The output is "+str(model.predict(C)))

Outputs of test code:

The input is [[[1 2 3]
  [4 5 6]
  [0 0 0]
  [0 0 0]
  [0 0 0]]

 [[1 3 0]
  [4 0 0]
  [0 0 1]
  [0 0 0]
  [0 0 0]]]
The output is [[ 2.5         3.5         4.5       ]
 [ 1.66666663  1.          0.33333334]]

bobchennan on 14 Oct 2016

❤3

All 12 comments

    g = Graph()
    g.add_input(name='in', input_shape=track_input_shape)
    g.add_node(Masking(mask_value=0.,), name='mask', input='in')
    g.add_node(LSTM(128, return_sequences=True), name='forward', input='mask')
    g.add_node(LSTM(128, return_sequences=True, go_backwards=True), name='backward', input='mask')
    g.add_node(Permute((2, 1)), inputs=['forward', 'backward'], merge_mode='sum', name='permute')
    g.add_node(MaxPooling1D(pool_length=128), input='permute', name='max_pool')
    g.add_node(Flatten(), input='max_pool', name='flat')

try this?

jiumem on 2 Apr 2016

Hi @jiumem,

Thanks for this tip, and thank you for suggesting using a bi-directional model. If I may ask: why did you leave out the embedding layer? Also, what exactly does the Permute layer do?

Thanks!
-Reuben

rfeinman on 2 Apr 2016

@rfeinman
Permute is a core layer of Keras http://keras.io/layers/core/#permute
And my data don't need the Embedding layer, you can add it as the first node as you like.

jiumem on 2 Apr 2016

@jiumem
I'm trying to understand what exactly you are doing with the Permute + MaxPooling1D layers. If I understand correctly, the LSTM layers should give an output of dimensionality (n_samples, n_timesteps, n_features). Permute((2,1)) should flip that so that it's (n_timesteps, n_samples, n_features). Then you are using MaxPooling1D, which pools over the 2nd dimension, n_samples. Why are you pooling over n_samples? Don't we want to pool over n_timesteps?

Thanks,
Reuben

rfeinman on 2 Apr 2016

@rfeinman Have you solved this issue? Because I am dealing with same problem and I think there is no function in Keras doing this task automatically.

ersinyar on 14 Sep 2016

@jiumem I don't think that what you are suggesting with MaxPooling1D solves masking issue. MaxPooling1D does not take into account the masked time_steps. So, instead of averaging only non-masked inputs it takes average over all time_steps, which is wrong basically.

ersinyar on 15 Sep 2016

@ersinyar yes I solved it with the custom layer that I wrote (shown in my initial comment). This takes care of the issue that you mention because it pushes the values at all masked time steps to negative infinity, so they should never contribute to the pooled output.

However, if you are using the bleeding edge Keras, the syntax for layers has changed, so please see my new function below.

`class TemporalMaxPooling(Layer):
"""
This is a custom Keras layer. This pooling layer accepts the temporal
sequence output by a recurrent layer and performs temporal pooling,
looking at only the non-masked portion of the sequence. The pooling
layer converts the entire variable-length hidden vector sequence
into a single hidden vector, and then feeds its output to the Dense
layer.

input shape: (nb_samples, nb_timesteps, nb_features)
output shape: (nb_samples, nb_features)
"""
def __init__(self, **kwargs):
    super(TemporalMaxPooling, self).__init__(**kwargs)
    self.supports_masking = True
    self.input_spec = [InputSpec(ndim=3)]

def get_output_shape_for(self, input_shape):
    return (input_shape[0], input_shape[2])

def call(self, x, mask=None):
    if mask is None:
        mask = T.sum(T.ones_like(x), axis=-1)
    mask = mask.dimshuffle(0, 1, "x")
    masked_data = T.switch(T.eq(mask, 0), -np.inf, x)

    return masked_data.max(axis=1)

def compute_mask(self, input, mask):
    return None`

rfeinman on 15 Sep 2016

❤1 👍1

@rfeinman Thank you for your reply. That helped me a lot. Do you also have mean pooling version of your code?

ersinyar on 16 Sep 2016

@ersinyar I haven't solved mean pooling yet. Can't use the negative infinity trick to deal with masked inputs for that one. Let me know if you come up with anything.

rfeinman on 16 Sep 2016

If you masked inputs with all zeros, you can use my code:

from keras.engine.topology import Layer, InputSpec
from theano import tensor as T
import theano
import numpy as np

class TemporalMeanPooling(Layer):
    """
    This is a custom Keras layer. This pooling layer accepts the temporal
    sequence output by a recurrent layer and performs temporal pooling,
    looking at only the non-masked portion of the sequence. The pooling
    layer converts the entire variable-length hidden vector sequence
    into a single hidden vector, and then feeds its output to the Dense
    layer.

    input shape: (nb_samples, nb_timesteps, nb_features)
    output shape: (nb_samples, nb_features)
    """
    def __init__(self, **kwargs):
        super(TemporalMeanPooling, self).__init__(**kwargs)
        self.supports_masking = True
        self.input_spec = [InputSpec(ndim=3)]

    def get_output_shape_for(self, input_shape):
        return (input_shape[0], input_shape[2])

    def call(self, x, mask=None): #mask: (nb_samples, nb_timesteps)
        if mask is None:
            mask = T.mean(T.ones_like(x), axis=-1)
        ssum = T.sum(x,axis=-2) #(nb_samples, np_features)
        rcnt = T.sum(mask,axis=-1,keepdims=True) #(nb_samples)
        rcnt = T.tile(rcnt,x.shape[-1])
        return (ssum/rcnt).astype(theano.config.floatX)
        #return rcnt

    def compute_mask(self, input, mask):
        return None

The test code:

from MeanLayer import *
from keras.models import Sequential
import numpy as np
from keras.layers import Masking

model=Sequential([
    Masking(0.0, input_shape=(5,3)),
    TemporalMeanPooling(),
    ])

model.compile(optimizer='rmsprop', loss='categorical_crossentropy')
A=np.array([[1,2,3],[4,5,6],[0,0,0],[0,0,0],[0,0,0]])
B=np.array([[1,3,0],[4,0,0],[0,0,1],[0,0,0],[0,0,0]])
C=np.array([A,B])
print("The input is "+str(C))
print("The output is "+str(model.predict(C)))

Outputs of test code:

The input is [[[1 2 3]
  [4 5 6]
  [0 0 0]
  [0 0 0]
  [0 0 0]]

 [[1 3 0]
  [4 0 0]
  [0 0 1]
  [0 0 0]
  [0 0 0]]]
The output is [[ 2.5         3.5         4.5       ]
 [ 1.66666663  1.          0.33333334]]

bobchennan on 14 Oct 2016

❤3

@rfeinman @ersinyar Thanks for the temporal max pooling code. In case you guys are interested, I've modified it so it works on tensorflow backend as well (K.switch() doesn't work the same way in tf). Also updated the layer definition to match Keras 2.0 spec.

https://gist.github.com/nigeljyng/881ae30e7c35ca2b77f6975e50736493

nigeljyng on 26 Apr 2017

This issue has been automatically marked as stale because it has not had recent activity. It will be closed after 30 days if no further activity occurs, but feel free to re-open a closed issue if needed.