Keras: OneHot Layer

Created on 3 Sep 2016 · 3Comments · Source: keras-team/keras

From my current modeling tasks, I see that it would be useful to have the flexibility to encode a categorical feature either in one-hot format or embedding format (using Embedding layer) right in the model construction phase instead of creating dummy columns in advance in case of one-hot encoding (it is the zero-based integers in case of Embedding). Though we can use Lambda layer for that purpose, I think it would be more convenient to have a OneHot layer instead. I wrote the code for the propose OneHot layer already which just calls K.one_hot() internally. Feel free to give your thought on whether we should add such layer or not in Keras. I am happy to contribute the code via a PR. Thanks.

The pseudo-code would be like this

models = []
for feature in features:
        if is_categorical(feature):
            model = Sequential()
            if to_encode(feature) == 'one_hot':
                model.add(OneHot())
            else:
                model.add(Embedding())
            models.append(model)
        else:
            model = Sequential()
            model.add(Dense())
            models.append(model)

model = Sequential()
model.add(Merge(models, mode='concat'))
...more layers added...

I created a PR https://github.com/fchollet/keras/pull/3846

Source

nhanitvn

😕1

Most helpful comment

There are a few catches when using Lambda(K.one_hot), but generally it's possible:

the input must be integer (uint8, int32, int64), not float32
you have to specify the number of classes explicitly
you have to specify the output shape explicitly

from keras import backend as K
from keras.layers import Input, Lambda

input_shape = (10, ) # sequences of length 10
nb_classes = 20
output_shape = (input_shape[0], nb_classes)
input = Input(shape=input_shape, dtype='uint8')
x_ohe = Lambda(K.one_hot, arguments={'nb_classes': nb_classes}, output_shape=output_shape)(input)

Try this like:

import numpy as np
from keras.models import Model
# 5 sequences of length 10
X_classes = np.random.randint(0, 20, size=(5, 10))
assert Model(input, x_ohe).predict(X_classes) == (5, 10, 20)

bzamecnik on 18 Dec 2016

👍7

All 3 comments

Lambda(K.one_hot()) instead as suggested by @fchollet

nhanitvn on 22 Sep 2016

There are a few catches when using Lambda(K.one_hot), but generally it's possible:

the input must be integer (uint8, int32, int64), not float32
you have to specify the number of classes explicitly
you have to specify the output shape explicitly

from keras import backend as K
from keras.layers import Input, Lambda

input_shape = (10, ) # sequences of length 10
nb_classes = 20
output_shape = (input_shape[0], nb_classes)
input = Input(shape=input_shape, dtype='uint8')
x_ohe = Lambda(K.one_hot, arguments={'nb_classes': nb_classes}, output_shape=output_shape)(input)

Try this like:

import numpy as np
from keras.models import Model
# 5 sequences of length 10
X_classes = np.random.randint(0, 20, size=(5, 10))
assert Model(input, x_ohe).predict(X_classes) == (5, 10, 20)

bzamecnik on 18 Dec 2016

👍7

Full example in a gist: https://gist.github.com/bzamecnik/a33052ec46ee7efeb217856d98a4fb5f

bzamecnik on 18 Dec 2016

Was this page helpful?

0 / 5 - 0 ratings