The current implementation of sparse_categorical_crossentropy is:
def sparse_categorical_crossentropy(output, target, from_logits=False):
target = T.cast(T.flatten(target), 'int32')
target = T.extra_ops.to_one_hot(target, nb_class=output.shape[-1])
target = reshape(target, shape(output))
return categorical_crossentropy(output, target, from_logits)
However, that is a very inefficient implementation, because there is actually crossentropy_categorical_1hot provided by Theano. Theano's categorical_crossentropy will even automatically select that one if ndim is one less. This is the Theano code:
def categorical_crossentropy(coding_dist, true_dist):
if true_dist.ndim == coding_dist.ndim:
return -tensor.sum(true_dist * tensor.log(coding_dist),
axis=coding_dist.ndim - 1)
elif true_dist.ndim == coding_dist.ndim - 1:
return crossentropy_categorical_1hot(coding_dist, true_dist)
else:
raise TypeError('rank mismatch between coding and true distributions')
So, it really should use crossentropy_categorical_1hot instead.
Feel free to submit a PR.
On 31 August 2016 at 08:34, Albert Zeyer [email protected] wrote:
The current implementation of sparse_categorical_crossentropy is:
def sparse_categorical_crossentropy(output, target, from_logits=False):
target = T.cast(T.flatten(target), 'int32')
target = T.extra_ops.to_one_hot(target, nb_class=output.shape[-1])
target = reshape(target, shape(output))
return categorical_crossentropy(output, target, from_logits)However, that is a very inefficient implementation, because there is
actually crossentropy_categorical_1hot provided by Theano. Theano's
categorical_crossentropy will even automatically select that one if ndim
is one less. This is the Theano code:def categorical_crossentropy(coding_dist, true_dist):
if true_dist.ndim == coding_dist.ndim:
return -tensor.sum(true_dist * tensor.log(coding_dist),
axis=coding_dist.ndim - 1)
elif true_dist.ndim == coding_dist.ndim - 1:
return crossentropy_categorical_1hot(coding_dist, true_dist)
else:
raise TypeError('rank mismatch between coding and true distributions')So, it really should use crossentropy_categorical_1hot instead.
—
You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub
https://github.com/fchollet/keras/issues/3649, or mute the thread
https://github.com/notifications/unsubscribe-auth/AArWb1v-cXwvbkpvx4p7SyqjpX3f9IGfks5qlZ8ZgaJpZM4JxwEX
.
crossentropy_categorical_1hot uses the CrossentropyCategorical1Hot op and it seems that it is only implemented for CPU, so I don't know if it is a good idea to use it.
However, there is crossentropy_softmax_1hot and related functions (CE combined with softmax), which use the CrossentropySoftmaxArgmax1HotWithBias op, which has a GPU implementation.
There are also some optimizations, e.g. crossentropy_to_crossentropy_with_softmax which replaces the combination crossentropy_categorical_1hot(softmax(...)) with crossentropy_softmax_1hot, so if you use both in combination, it should do everything on GPU.
Most helpful comment
crossentropy_categorical_1hotuses theCrossentropyCategorical1Hotop and it seems that it is only implemented for CPU, so I don't know if it is a good idea to use it.However, there is
crossentropy_softmax_1hotand related functions (CE combined with softmax), which use theCrossentropySoftmaxArgmax1HotWithBiasop, which has a GPU implementation.There are also some optimizations, e.g.
crossentropy_to_crossentropy_with_softmaxwhich replaces the combinationcrossentropy_categorical_1hot(softmax(...))withcrossentropy_softmax_1hot, so if you use both in combination, it should do everything on GPU.