Keras: Using a Masking layer with a TimeDistributed Layer

Created on 20 Jun 2016  路  1Comment  路  Source: keras-team/keras

I am trying to do a tagging system, where there is a label for each word, but the input is a list of lists, where each word is represented by its chars.

This network works with some success

inputs = Input(shape=(MAX_WORDS,MAX_LEN), dtype='int32', name='main_input')
embLayer = TimeDistributed(Embedding(output_dim=300, input_dim=len(charDict), input_length=MAX_LEN, mask_zero=True))(inputs)
wordLayer = TimeDistributed(LSTM(128, return_sequences=False))(embLayer)

left = SimpleRNN(128, return_sequences=True)(Dropout(0.2)(wordLayer))
right = SimpleRNN(128, return_sequences=True, go_backwards=True)(Dropout(0.2)(wordLayer))
merged = merge([left, right], mode='sum')
predictions = TimeDistributed(Dense(3, activation='softmax'))(merged)

The problem is that I noticed that the last few outputs change value, even if all input chars are 0. This makes think that masking is not done at the "word" level.

I have tried to add a masking layer, but having a mistake .

The updated network is :

inputs = Input(shape=(MAX_WORDS,MAX_LEN), dtype='int32', name='main_input')
inputs = Masking(mask_value=0)(inputs)
embLayer = TimeDistributed(Embedding(output_dim=300, input_dim=len(charDict), input_length=MAX_LEN, mask_zero=True))(inputs)

and the error I get is

/Users/pedro/anaconda/envs/dev/lib/python2.7/site-packages/keras/engine/topology.pyc in __call__(self, x, mask)
    483         if inbound_layers:
    484             # this will call layer.build() if necessary
--> 485             self.add_inbound_node(inbound_layers, node_indices, tensor_indices)
    486             input_added = True
    487 

/Users/pedro/anaconda/envs/dev/lib/python2.7/site-packages/keras/engine/topology.pyc in add_inbound_node(self, inbound_layers, node_indices, tensor_indices)
    541         # creating the node automatically updates self.inbound_nodes
    542         # as well as outbound_nodes on inbound layers.
--> 543         Node.create_node(self, inbound_layers, node_indices, tensor_indices)
    544 
    545     def get_output_shape_for(self, input_shape):

site-packages/keras/engine/topology.pyc in create_node(cls, outbound_layer, inbound_layers, node_indices, tensor_indices)
    146 
    147         if len(input_tensors) == 1:
--> 148             output_tensors = to_list(outbound_layer.call(input_tensors[0], mask=input_masks[0]))
    149             output_masks = to_list(outbound_layer.compute_mask(input_tensors[0], input_masks[0]))
    150             # TODO: try to auto-infer shape if exception is raised by get_output_shape_for

site-packages/keras/layers/wrappers.pyc in call(self, X, mask)
    129                 input_length = K.shape(X)[1]
    130             X = K.reshape(X, (-1, ) + input_shape[2:])  # (nb_samples * timesteps, ...)
--> 131             y = self.layer.call(X)  # (nb_samples * timesteps, ...)
    132             # (nb_samples, timesteps, ...)
    133             output_shape = self.get_output_shape_for(input_shape)

site-packages/keras/layers/embeddings.pyc in call(self, x, mask)
    133         else:
    134             W = self.W
--> 135         out = K.gather(W, x)
    136         return out
    137 

site-packages/keras/backend/theano_backend.pyc in gather(reference, indices)
    164     Return: a tensor of same type as reference.
    165     '''
--> 166     return reference[indices]
    167 
    168 

site-packages/theano/tensor/var.pyc in __getitem__(self, args)
    501                             TensorVariable, TensorConstant,
    502                             theano.tensor.sharedvar.TensorSharedVariable))):
--> 503                 return self.take(args[axis], axis)
    504             else:
    505                 return theano.tensor.subtensor.advanced_subtensor(self, *args)

site-packages/theano/tensor/var.pyc in take(self, indices, axis, mode)
    533 
    534     def take(self, indices, axis=None, mode='raise'):
--> 535         return theano.tensor.subtensor.take(self, indices, axis, mode)
    536 
    537     # COPYING

site-packages/theano/tensor/subtensor.pyc in take(a, indices, axis, mode)
   2390                 [a.shape[:axis], indices.shape, a.shape[axis + 1:]])
   2391         ndim = a.ndim + indices.ndim - 1
-> 2392     return take(a, indices.flatten(), axis, mode).reshape(shape, ndim)

site-packages/theano/tensor/subtensor.pyc in take(a, indices, axis, mode)
   2368             return advanced_subtensor1(a.flatten(), indices)
   2369         elif axis == 0:
-> 2370             return advanced_subtensor1(a, indices)
   2371         else:
   2372             if axis < 0:

site-packages/theano/gof/op.pyc in __call__(self, *inputs, **kwargs)
    609         """
    610         return_list = kwargs.pop('return_list', False)
--> 611         node = self.make_node(*inputs, **kwargs)
    612 
    613         if config.compute_test_value != 'off':

site-packages/theano/tensor/subtensor.pyc in make_node(self, x, ilist)
   1685         ilist_ = theano.tensor.as_tensor_variable(ilist)
   1686         if ilist_.type.dtype[:3] not in ('int', 'uin'):
-> 1687             raise TypeError('index must be integers')
   1688         if ilist_.type.ndim != 1:
   1689             raise TypeError('index must be vector')

TypeError: index must be integers

I looked around, and nothing seems similar to what I am trying to achieve.

Any ideas how to solve this ?

Most helpful comment

I want to make sure I have this straight with intended sizes

input: (batch, max_words, max_len)
embedding: (batch, max_words, max_len, embedding_size)
wordlayer: (batch, max_words, embedding_size)
left/right/merged: (batch, max_words, 128)
predictions: (batch, max_words, 3)

a couple of inputs:

you don't need to time distribute the embedding. indexing into a weight matrix (as the embedding does) will always just extend the last dim, no matter how big the earlier dims are.

from the embedding layer to word layer, you are reducing the dimensions by one.

the issue is that currently, TimeDistributed just passes the mask it was given forward. since you time distributed since the input, the mask created in Embedidng never got passed forward.

(you can see the code here. there isn't a function for computing the mask and the default behavior, given supports_masking is True, is to just pass the mask forward)

Even if you did as I suggested and embedded the full tensor, the mask that gets passed forward becomes invalid because of the dimension reduction.

I have done similar things and I have a couple layers to accomplish these things. They are basically just modified TimeDistributeds.

Specifically: Summarize excepts you to remove the last dimension in just the way you did.

Also, I have changed the masking behavior in my TimeDistributed: https://github.com/braingineer/keras/blob/dev/keras/layers/wrappers.py#L109

>All comments

I want to make sure I have this straight with intended sizes

input: (batch, max_words, max_len)
embedding: (batch, max_words, max_len, embedding_size)
wordlayer: (batch, max_words, embedding_size)
left/right/merged: (batch, max_words, 128)
predictions: (batch, max_words, 3)

a couple of inputs:

you don't need to time distribute the embedding. indexing into a weight matrix (as the embedding does) will always just extend the last dim, no matter how big the earlier dims are.

from the embedding layer to word layer, you are reducing the dimensions by one.

the issue is that currently, TimeDistributed just passes the mask it was given forward. since you time distributed since the input, the mask created in Embedidng never got passed forward.

(you can see the code here. there isn't a function for computing the mask and the default behavior, given supports_masking is True, is to just pass the mask forward)

Even if you did as I suggested and embedded the full tensor, the mask that gets passed forward becomes invalid because of the dimension reduction.

I have done similar things and I have a couple layers to accomplish these things. They are basically just modified TimeDistributeds.

Specifically: Summarize excepts you to remove the last dimension in just the way you did.

Also, I have changed the masking behavior in my TimeDistributed: https://github.com/braingineer/keras/blob/dev/keras/layers/wrappers.py#L109

Was this page helpful?
0 / 5 - 0 ratings

Related issues

fredtcaroli picture fredtcaroli  路  3Comments

LuCeHe picture LuCeHe  路  3Comments

braingineer picture braingineer  路  3Comments

zygmuntz picture zygmuntz  路  3Comments

MarkVdBergh picture MarkVdBergh  路  3Comments