I am using Keras for sequence classification, and I would like to add temporal pooling to the classification task. Thus, rather than handing off only the last hidden layer output from the rnn layer, I would like to hand off the sequence of hidden outputs and pool over the time steps to produce a single temporal component. The input to the pooling layer would have dimensionality (samples, steps, features) and the output would have (samples, features). It should be able to handle masked input. It would look like the following:
X = [[0,0,0,1,2,5], [0,2,7,3,5,1], [0,0,5,5,1,3]] # this is returned from pad_sequences
y = [1, 0, 1]
model = Sequential()
model.add(Embedding(input_dim=10, output_dim=5, mask_zero=True))
model.add(LSTM(output_dim=5, return_sequences=True))
model.add(TemporalPooling())
model.add(Dense(output_dim=1, activation='sigmoid'))
model.compile(loss='binary_crossentropy', optimizer='rmsprop')
model.fit(X, y)
Does anything like this exist? I have written what I think it would look like (this is the max pooling case):
from keras.layers.core import MaskedLayer
class TemporalPooling(MaskedLayer):
def __init__(self):
super(MaskedLayer, self).__init__()
self.input = T.tensor3()
@property
def output_shape(self):
# remove temporal dimension
return (self.input_shape[0], self.input_shape[2])
def get_output_mask(self, train=False):
return None
def get_output(self, train=False):
data = self.get_input(train)
mask = self.get_input_mask(train)
if mask is None:
mask = T.sum(T.ones_like(data), axis=-1)
mask = mask.dimshuffle(0, 1, "x")
masked_data = T.switch(T.eq(mask, 0), -np.inf, data)
return masked_data.max(axis=1)
g = Graph()
g.add_input(name='in', input_shape=track_input_shape)
g.add_node(Masking(mask_value=0.,), name='mask', input='in')
g.add_node(LSTM(128, return_sequences=True), name='forward', input='mask')
g.add_node(LSTM(128, return_sequences=True, go_backwards=True), name='backward', input='mask')
g.add_node(Permute((2, 1)), inputs=['forward', 'backward'], merge_mode='sum', name='permute')
g.add_node(MaxPooling1D(pool_length=128), input='permute', name='max_pool')
g.add_node(Flatten(), input='max_pool', name='flat')
try this?
Hi @jiumem,
Thanks for this tip, and thank you for suggesting using a bi-directional model. If I may ask: why did you leave out the embedding layer? Also, what exactly does the Permute layer do?
Thanks!
-Reuben
@rfeinman
Permute is a core layer of Keras http://keras.io/layers/core/#permute
And my data don't need the Embedding layer, you can add it as the first node as you like.
@jiumem
I'm trying to understand what exactly you are doing with the Permute + MaxPooling1D layers. If I understand correctly, the LSTM layers should give an output of dimensionality (n_samples, n_timesteps, n_features). Permute((2,1)) should flip that so that it's (n_timesteps, n_samples, n_features). Then you are using MaxPooling1D, which pools over the 2nd dimension, n_samples. Why are you pooling over n_samples? Don't we want to pool over n_timesteps?
Thanks,
Reuben
@rfeinman Have you solved this issue? Because I am dealing with same problem and I think there is no function in Keras doing this task automatically.
@jiumem I don't think that what you are suggesting with MaxPooling1D solves masking issue. MaxPooling1D does not take into account the masked time_steps. So, instead of averaging only non-masked inputs it takes average over all time_steps, which is wrong basically.
@ersinyar yes I solved it with the custom layer that I wrote (shown in my initial comment). This takes care of the issue that you mention because it pushes the values at all masked time steps to negative infinity, so they should never contribute to the pooled output.
However, if you are using the bleeding edge Keras, the syntax for layers has changed, so please see my new function below.
`class TemporalMaxPooling(Layer):
"""
This is a custom Keras layer. This pooling layer accepts the temporal
sequence output by a recurrent layer and performs temporal pooling,
looking at only the non-masked portion of the sequence. The pooling
layer converts the entire variable-length hidden vector sequence
into a single hidden vector, and then feeds its output to the Dense
layer.
input shape: (nb_samples, nb_timesteps, nb_features)
output shape: (nb_samples, nb_features)
"""
def __init__(self, **kwargs):
super(TemporalMaxPooling, self).__init__(**kwargs)
self.supports_masking = True
self.input_spec = [InputSpec(ndim=3)]
def get_output_shape_for(self, input_shape):
return (input_shape[0], input_shape[2])
def call(self, x, mask=None):
if mask is None:
mask = T.sum(T.ones_like(x), axis=-1)
mask = mask.dimshuffle(0, 1, "x")
masked_data = T.switch(T.eq(mask, 0), -np.inf, x)
return masked_data.max(axis=1)
def compute_mask(self, input, mask):
return None`
@rfeinman Thank you for your reply. That helped me a lot. Do you also have mean pooling version of your code?
@ersinyar I haven't solved mean pooling yet. Can't use the negative infinity trick to deal with masked inputs for that one. Let me know if you come up with anything.
If you masked inputs with all zeros, you can use my code:
from keras.engine.topology import Layer, InputSpec
from theano import tensor as T
import theano
import numpy as np
class TemporalMeanPooling(Layer):
"""
This is a custom Keras layer. This pooling layer accepts the temporal
sequence output by a recurrent layer and performs temporal pooling,
looking at only the non-masked portion of the sequence. The pooling
layer converts the entire variable-length hidden vector sequence
into a single hidden vector, and then feeds its output to the Dense
layer.
input shape: (nb_samples, nb_timesteps, nb_features)
output shape: (nb_samples, nb_features)
"""
def __init__(self, **kwargs):
super(TemporalMeanPooling, self).__init__(**kwargs)
self.supports_masking = True
self.input_spec = [InputSpec(ndim=3)]
def get_output_shape_for(self, input_shape):
return (input_shape[0], input_shape[2])
def call(self, x, mask=None): #mask: (nb_samples, nb_timesteps)
if mask is None:
mask = T.mean(T.ones_like(x), axis=-1)
ssum = T.sum(x,axis=-2) #(nb_samples, np_features)
rcnt = T.sum(mask,axis=-1,keepdims=True) #(nb_samples)
rcnt = T.tile(rcnt,x.shape[-1])
return (ssum/rcnt).astype(theano.config.floatX)
#return rcnt
def compute_mask(self, input, mask):
return None
The test code:
from MeanLayer import *
from keras.models import Sequential
import numpy as np
from keras.layers import Masking
model=Sequential([
Masking(0.0, input_shape=(5,3)),
TemporalMeanPooling(),
])
model.compile(optimizer='rmsprop', loss='categorical_crossentropy')
A=np.array([[1,2,3],[4,5,6],[0,0,0],[0,0,0],[0,0,0]])
B=np.array([[1,3,0],[4,0,0],[0,0,1],[0,0,0],[0,0,0]])
C=np.array([A,B])
print("The input is "+str(C))
print("The output is "+str(model.predict(C)))
Outputs of test code:
The input is [[[1 2 3]
[4 5 6]
[0 0 0]
[0 0 0]
[0 0 0]]
[[1 3 0]
[4 0 0]
[0 0 1]
[0 0 0]
[0 0 0]]]
The output is [[ 2.5 3.5 4.5 ]
[ 1.66666663 1. 0.33333334]]
@rfeinman @ersinyar Thanks for the temporal max pooling code. In case you guys are interested, I've modified it so it works on tensorflow backend as well (K.switch() doesn't work the same way in tf). Also updated the layer definition to match Keras 2.0 spec.
https://gist.github.com/nigeljyng/881ae30e7c35ca2b77f6975e50736493
This issue has been automatically marked as stale because it has not had recent activity. It will be closed after 30 days if no further activity occurs, but feel free to re-open a closed issue if needed.
Most helpful comment
If you masked inputs with all zeros, you can use my code:
The test code:
Outputs of test code: