Keras: Is there anyway to pass one-hot binary vector as input and embed it?

Created on 26 Apr 2016 · 8Comments · Source: keras-team/keras

I have a situation where I want to pass 1-hot vectors as input and then embed it into lower dimension.

For example, I have single feature input vector like this:

a =
[0,
1,
0,
2,
3,
3,
]

As one can see, there are three classes and 6 samples, hence it's one-hot encoding will be:

b =
[1 0 0 0,
0 1 0 0,
1 0 0 0,
0 0 1 0,
0 0 0 1,
0 0 0 1
]

I want to pass b as input to the model and then embed it using an embedding layer.
Current embedding seems to only support sequence of categorical values as input and not the one-hot vectors.
In fact, I am able to successfully pass a and get an embedding with something like below:

a =  Input(shape=(1,), dtype='int32', name = 'input1')
em = Embedding(output_dim=3,input_dim=4,input_length=1) (a)

But I cannot see how can I modify this to pass b as input instead of a.
Please let me know if there is any way to work around this.

Thanks

Source

code-ball

Most helpful comment

Sure. I'll show a quick example using numpy and then how it's handled in Keras.

In numpy, if you index with a numpy matrix of indices, then you select the rows that the numbers correspond to.

In [48]: embedding1 = np.array([2,2,0])

In [50]: embedding2 = np.array([[0,0,1], [0,0,1], [1,0,0]])
In [57]: a[embedding1]
Out[57]: 
array([[6, 7, 8],
       [6, 7, 8],
       [0, 1, 2]])

In [58]: embedding2.dot(a)
Out[58]: 
array([[6, 7, 8],
       [6, 7, 8],
       [0, 1, 2]])

In Keras, you can use the Embedding layer, which will do exactly as In[57] shows above. But if you want to use binary vectors, then you can do as In[58]. This is exactly how the Dense layer behaves. The following code is from the Dense layer:

    def call(self, x, mask=None):
        return self.activation(K.dot(x, self.W) + self.b)

So, you can see that if you set the activation to be linear and self.b to be 0s, then it will return the same exact thing.

Though, I just noticed something troublesome. You can't actually set b to None in keras. Hmm. This should probably be changed. An easy quick fix is to subclass Dense and override the call function and the setting of the trainable weights. aka:

class BinaryEmbedding(Dense):
    def build(self, input_shape):
        super(BinaryEmbedding, self).build(input_shape)
        self.trainable_weights = [self.W]

    def call(self, x, mask=None):
        return self.activation(K.dot(x, self.W))

Though, as a by the way, why aren't you able to use integers to do the embeddings? I've used this type of solution before because I had multi-hots and so could either integer embed and sum over the tensor or binary embed and just do the dot product. Is there a particular reason you can't use integers?

braingineer on 26 Apr 2016

👍7

All 8 comments

In general, you can only index into tensors with indices. If you want to embed a binary vector, you could just multiply the embedding matrix by the binary feature vectors. This will have the same effect. I'm not sure on the difference in performance (though I've always meant to profile this in my own work...). This is basically the same thing as having a Dense layer and removing the bias vector.

braingineer on 26 Apr 2016

@braingineer: Thanks for a quick response. While I understand your idea, I don't think I get it fully. Specifically, I did not understand the part of multiply the embedding matrix by the binary feature vectors. Do I need to write my own embedding layer for that? Or do you mean it can be done with existing API?

If it is later, can you provide code snippet exemplifying how to do it in example case I provided above?

Thanks

code-ball on 26 Apr 2016

Sure. I'll show a quick example using numpy and then how it's handled in Keras.

In numpy, if you index with a numpy matrix of indices, then you select the rows that the numbers correspond to.

In [48]: embedding1 = np.array([2,2,0])

In [50]: embedding2 = np.array([[0,0,1], [0,0,1], [1,0,0]])
In [57]: a[embedding1]
Out[57]: 
array([[6, 7, 8],
       [6, 7, 8],
       [0, 1, 2]])

In [58]: embedding2.dot(a)
Out[58]: 
array([[6, 7, 8],
       [6, 7, 8],
       [0, 1, 2]])

    def call(self, x, mask=None):
        return self.activation(K.dot(x, self.W) + self.b)

So, you can see that if you set the activation to be linear and self.b to be 0s, then it will return the same exact thing.

class BinaryEmbedding(Dense):
    def build(self, input_shape):
        super(BinaryEmbedding, self).build(input_shape)
        self.trainable_weights = [self.W]

    def call(self, x, mask=None):
        return self.activation(K.dot(x, self.W))

braingineer on 26 Apr 2016

👍7

@braingineer : First, you are awesome. Thanks for great explanation.

I think this should work for me. I will try out and see.

As for the binary embed requirement, following are the reasons:

1.) One another implementation (not based on Keras) internally developed used one-hot binary vectors as input. And so I need to compare something with that implementation.

2.) I am little confused about integer vs binary embed in Keras. Now, that is probably because I was directly trying to work with the provided Embedding Layer. As one can see, if I provide binary input to that layer, it messes up dimensions. So to describe my confusion,

In above my example, if I use embedding layer with a like below:

a =  Input(shape=(1,), dtype='int32', name = 'input1')
em = Embedding(output_dim=3,input_dim=4,input_length=1) (a)

it works fine as now dimension of em will be (None, 1, 3)

But if I provide binary one-hot vector to same layer, it gets very strange:

a =  Input(shape=(4,), dtype='int32', name = 'input1')
em = Embedding(output_dim=3,input_dim=4,input_length=4) (a)

The problem is 2-fold: 1.) Input shape now becomes 4 = input_dim in embed layer and 2.) input_length in Embed layer which is actually meant for sequence length of words (in one training sample) now becomes 4? This kind of messes up whole lot of things.

3.) This is question for you: In integer embed, does the embedding matrix will be of shape (output_dim, input_dim)?

So if I just plainly use integer embed through provided Embedding layer, should I expect the embedding matrix to be of size (3,4)? And of course, the sample size will be another dimension in tensor.

Again, your answer is very helpful and I will certainly use it for my binary experimentation.

I have couple of more related questions for which I will open another issue. Please provide response to them if you are able to help.

Thanks

code-ball on 26 Apr 2016

Hmm. Ya, anytime you put a matrix as the indices, it will embed all indices and tack it on as the lack dimension. For example, if you took a (4,3) binary matrix and tried to use it in the embedding I showed above:

In [65]: embedding2 = np.array([[0,0,1], [0,0,1], [1,0,0], [0,1,0]])

In [66]: embedding2.shape
Out[66]: (4, 3)

In [67]: a[embedding2].shape
Out[67]: (4, 3, 3)

## or, to make it even clearer:
In [68]: a = np.arange(15).reshape((3,5))

In [69]: a[embedding2].shape
Out[69]: (4, 3, 5)

Embedding takes two arguments: input_dim and output_dim. This refers to the rows and columns of the embedding matrix. Since you are using integers, input_dim also corresponds to the max number - 1 that you will pass in. Aka, think of it like indexing into an array. the array is 15 long, and so the max integer that will work is 14.

In your above example, if you want to embed integers that can take values 0-3, then the input dim would be 4. If you wanted to embed them into a space that is of dimension 3, the output dim would be 3. The input length refers to how many integers you are embedding. If there is a variable number of integers you are embedding, then you can do thing using _masking_.

And sure. Just mention me on it so I get pinged. I kind of randomly select issues to comment on when I'm getting burnt out on coding.

braingineer on 26 Apr 2016

Yes, this makes everything pretty clear. Thanks a lot for this discussion and very helpful explanation.

code-ball on 26 Apr 2016

Since the Dense object has since changed here is a solution using a custom layer:

class OnehotEmbedding(Layer):

    def __init__(self, Nembeddings, **kwargs):
        self.Nembeddings = Nembeddings
        super(OnehotEmbedding, self).__init__(**kwargs)

    def build(self, input_shape):
        # Create a trainable weight variable for this layer.
        self.kernel = self.add_weight(name='kernel',
                                      shape=(input_shape[2], self.Nembeddings),
                                      initializer='uniform',
                                      trainable=True)
        super(OnehotEmbedding, self).build(input_shape)  # Be sure to call this at the end

    def call(self, x):
        return K.dot(x, self.kernel)

    def compute_output_shape(self, input_shape):
        return (input_shape[0], input_shape[1], self.Nembeddings)

The input shape is (batch_size, features, onehot_dimensions) and the output is (batch_size, features, Nembeddings).

krdav on 21 Jun 2018

Yes, this makes everything pretty clear. Thanks a lot for this discussion and very helpful explanation.

Hi,
I have just started learning Keras and I am facing this same issue and havent really understood the solution, could you please help me understand?

my one hot encoding has 72 features and 83k rows/records
a is of size 83000*72

a = Input(shape=(72,), dtype='int32', name = 'input1')
em = Embedding(output_dim=20, input_dim=83001 ,input_length=83000) (a)

can you let me know what am i doing incorrect here ?

I would really appriciate any help on this, Thanks!