Keras: sparse_categorical_crossentropy implementation should use crossentropy_categorical_1hot

Created on 31 Aug 2016 · 2Comments · Source: keras-team/keras

The current implementation of sparse_categorical_crossentropy is:

def sparse_categorical_crossentropy(output, target, from_logits=False):
    target = T.cast(T.flatten(target), 'int32')
    target = T.extra_ops.to_one_hot(target, nb_class=output.shape[-1])
    target = reshape(target, shape(output))
    return categorical_crossentropy(output, target, from_logits)

However, that is a very inefficient implementation, because there is actually crossentropy_categorical_1hot provided by Theano. Theano's categorical_crossentropy will even automatically select that one if ndim is one less. This is the Theano code:

def categorical_crossentropy(coding_dist, true_dist):
    if true_dist.ndim == coding_dist.ndim:
        return -tensor.sum(true_dist * tensor.log(coding_dist),
                           axis=coding_dist.ndim - 1)
    elif true_dist.ndim == coding_dist.ndim - 1:
        return crossentropy_categorical_1hot(coding_dist, true_dist)
    else:
        raise TypeError('rank mismatch between coding and true distributions')

So, it really should use crossentropy_categorical_1hot instead.

stale

Source

albertz

Most helpful comment

crossentropy_categorical_1hot uses the CrossentropyCategorical1Hot op and it seems that it is only implemented for CPU, so I don't know if it is a good idea to use it.

However, there is crossentropy_softmax_1hot and related functions (CE combined with softmax), which use the CrossentropySoftmaxArgmax1HotWithBias op, which has a GPU implementation.

There are also some optimizations, e.g. crossentropy_to_crossentropy_with_softmax which replaces the combination crossentropy_categorical_1hot(softmax(...)) with crossentropy_softmax_1hot, so if you use both in combination, it should do everything on GPU.

albertz on 31 Aug 2016

👍3

All 2 comments

Feel free to submit a PR.

On 31 August 2016 at 08:34, Albert Zeyer [email protected] wrote:

The current implementation of sparse_categorical_crossentropy is:

def sparse_categorical_crossentropy(output, target, from_logits=False):
target = T.cast(T.flatten(target), 'int32')
target = T.extra_ops.to_one_hot(target, nb_class=output.shape[-1])
target = reshape(target, shape(output))
return categorical_crossentropy(output, target, from_logits)

However, that is a very inefficient implementation, because there is
actually crossentropy_categorical_1hot provided by Theano. Theano's
categorical_crossentropy will even automatically select that one if ndim
is one less. This is the Theano code:

def categorical_crossentropy(coding_dist, true_dist):
if true_dist.ndim == coding_dist.ndim:
return -tensor.sum(true_dist * tensor.log(coding_dist),
axis=coding_dist.ndim - 1)
elif true_dist.ndim == coding_dist.ndim - 1:
return crossentropy_categorical_1hot(coding_dist, true_dist)
else:
raise TypeError('rank mismatch between coding and true distributions')

So, it really should use crossentropy_categorical_1hot instead.

—
You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub
https://github.com/fchollet/keras/issues/3649, or mute the thread
https://github.com/notifications/unsubscribe-auth/AArWb1v-cXwvbkpvx4p7SyqjpX3f9IGfks5qlZ8ZgaJpZM4JxwEX
.

fchollet on 31 Aug 2016

crossentropy_categorical_1hot uses the CrossentropyCategorical1Hot op and it seems that it is only implemented for CPU, so I don't know if it is a good idea to use it.

However, there is crossentropy_softmax_1hot and related functions (CE combined with softmax), which use the CrossentropySoftmaxArgmax1HotWithBias op, which has a GPU implementation.

albertz on 31 Aug 2016

👍3

Was this page helpful?

0 / 5 - 0 ratings