Keras: OneHot Layer

Created on 3 Sep 2016  路  3Comments  路  Source: keras-team/keras

From my current modeling tasks, I see that it would be useful to have the flexibility to encode a categorical feature either in one-hot format or embedding format (using Embedding layer) right in the model construction phase instead of creating dummy columns in advance in case of one-hot encoding (it is the zero-based integers in case of Embedding). Though we can use Lambda layer for that purpose, I think it would be more convenient to have a OneHot layer instead. I wrote the code for the propose OneHot layer already which just calls K.one_hot() internally. Feel free to give your thought on whether we should add such layer or not in Keras. I am happy to contribute the code via a PR. Thanks.

The pseudo-code would be like this

models = []
for feature in features:
        if is_categorical(feature):
            model = Sequential()
            if to_encode(feature) == 'one_hot':
                model.add(OneHot())
            else:
                model.add(Embedding())
            models.append(model)
        else:
            model = Sequential()
            model.add(Dense())
            models.append(model)

model = Sequential()
model.add(Merge(models, mode='concat'))
...more layers added...

I created a PR https://github.com/fchollet/keras/pull/3846

Most helpful comment

There are a few catches when using Lambda(K.one_hot), but generally it's possible:

  • the input must be integer (uint8, int32, int64), not float32
  • you have to specify the number of classes explicitly
  • you have to specify the output shape explicitly
from keras import backend as K
from keras.layers import Input, Lambda

input_shape = (10, ) # sequences of length 10
nb_classes = 20
output_shape = (input_shape[0], nb_classes)
input = Input(shape=input_shape, dtype='uint8')
x_ohe = Lambda(K.one_hot, arguments={'nb_classes': nb_classes}, output_shape=output_shape)(input)

Try this like:

import numpy as np
from keras.models import Model
# 5 sequences of length 10
X_classes = np.random.randint(0, 20, size=(5, 10))
assert Model(input, x_ohe).predict(X_classes) == (5, 10, 20)

All 3 comments

Lambda(K.one_hot()) instead as suggested by @fchollet

There are a few catches when using Lambda(K.one_hot), but generally it's possible:

  • the input must be integer (uint8, int32, int64), not float32
  • you have to specify the number of classes explicitly
  • you have to specify the output shape explicitly
from keras import backend as K
from keras.layers import Input, Lambda

input_shape = (10, ) # sequences of length 10
nb_classes = 20
output_shape = (input_shape[0], nb_classes)
input = Input(shape=input_shape, dtype='uint8')
x_ohe = Lambda(K.one_hot, arguments={'nb_classes': nb_classes}, output_shape=output_shape)(input)

Try this like:

import numpy as np
from keras.models import Model
# 5 sequences of length 10
X_classes = np.random.randint(0, 20, size=(5, 10))
assert Model(input, x_ohe).predict(X_classes) == (5, 10, 20)
Was this page helpful?
0 / 5 - 0 ratings

Related issues

MarkVdBergh picture MarkVdBergh  路  3Comments

zygmuntz picture zygmuntz  路  3Comments

fredtcaroli picture fredtcaroli  路  3Comments

vinayakumarr picture vinayakumarr  路  3Comments

LuCeHe picture LuCeHe  路  3Comments