Keras: sparse_categorical_crossentropy only works on single output

Created on 24 May 2016 · 7Comments · Source: keras-team/keras

So I have this model that try to predict multiple outputs,

    ...
    decoded_outputs = []
    for i, size in enumerate(target_sizes):
        dense = TimeDistributed(Dense(size, activation='softmax'), name='out_{0}'.format(i))
        decoded_outputs.append(dense(decoded))

    autoencoder = Model(input=[encoded_input], output=decoded_outputs, name='autoencoder')
    autoencoder.compile(loss='sparse_categorical_crossentropy', optimizer='adam')

but standardize_input_data() keeps complaining about the mismatch shape.

As workaround, I have to use smaller batch_size (as my original idea was to reduce memory usage for train data) and categorical_crossentropy. But surely it would be great if sparse_categorical_crossentropy can handle multiple outputs.

Source

geovedi

Most helpful comment

I only found out after 20 minutes of debugging and searching, so I'll leave this here:

This issue has been fixed in commit 25c10af59694c6a61778f9111e3ca97dfd8971b4.

mbollmann on 9 Jun 2016

👍2

All 7 comments

Did you try updating Keras? As specified in the checklist that you had to delete for posting this.

fchollet on 24 May 2016

Did you try updating Keras?

yes, I have tested it with 1.0.3 (installed from pip) and also against the master branch.

this is the simplified model that I'm trying to have..

input_dim = 1
timestep = 10
latent_dim = 512
target_sizes = (300, 400)

input_ = Input(shape=(timestep, input_dim))
encoded = LSTM(latent_dim)(input_)

decoded = RepeatVector(timestep)(encoded)
decoded = LSTM(latent_dim, return_sequences=True)(decoded)
output_ = []
for _, size in enumerate(target_sizes):
    dense = TimeDistributed(Dense(size))
    output_.append(dense(decoded))

model = Model(input=[input_], output=output_)
model.compile(loss='sparse_categorical_crossentropy', optimizer='adam')

X_train = np.random.randint(target_sizes[1], size=(50, timestep, 1))
yy_train = np.random.randint(target_sizes[0], size=(50, timestep, 1))
zz_train = np.random.randint(target_sizes[1], size=(50, timestep, 1))

model.fit([X_train], [yy_train, zz_train], nb_epoch=10, batch_size=32, shuffle=True)

It will raise the exception: Exception: Error when checking model target: expected timedistributed_2 to have shape (None, 10, 400) but got array with shape (50, 10, 1)

while this one is working.

output_2 = TimeDistributed(Dense(target_sizes[0]))(decoded)
model_2 = Model(input=input_, output=output_2)
model_2.compile(loss='sparse_categorical_crossentropy', optimizer='adam')

model_2.fit(X_train, yy_train, nb_epoch=10, batch_size=32, shuffle=True)

also this...

yy_train_1 = np.zeros((50, timestep, target_sizes[0]))
for ii in range(yy_train.shape[0]):
    for jj in range(yy_train.shape[1]):
        kk = yy_train[ii, jj][0]
        yy_train_1[ii, jj, kk] = 1.

zz_train_1 = np.zeros((50, timestep, target_sizes[1]))
for ii in range(zz_train.shape[0]):
    for jj in range(zz_train.shape[1]):
        kk = zz_train[ii, jj][0]
        zz_train_1[ii, jj, kk] = 1.

model.compile(loss='categorical_crossentropy', optimizer='adam')
model.fit([X_train], [yy_train_1, zz_train_1], nb_epoch=10, batch_size=32, shuffle=True)

geovedi on 25 May 2016

I only found out after 20 minutes of debugging and searching, so I'll leave this here:

This issue has been fixed in commit 25c10af59694c6a61778f9111e3ca97dfd8971b4.

mbollmann on 9 Jun 2016

👍2

thanks @mbollmann!

geovedi on 9 Jun 2016

Hi ,

I seem to be getting the same error. I'm using Keras version 1.2 with Tensorflow backend. Here's my code.

print('Build model...')
model = Sequential()
model.add(Embedding(10000, 100, dropout=0.2, input_length=num_steps))
model.add(LSTM(50, return_sequences=True))
model.add(Dropout(0.2))
model.add(TimeDistributed(Dense(10000, activation='softmax')))
model.compile(loss='sparse_categorical_crossentropy', optimizer='rmsprop', metrics=['accuracy'])

print(model.summary())

This is the error trace:

TypeError                                 Traceback (most recent call last)
<ipython-input-52-2b5d7607b126> in <module>()
      5 model.add(Dropout(0.2))
      6 model.add(TimeDistributed(Dense(10000, activation='softmax')))
----> 7 model.compile(loss='sparse_categorical_crossentropy', optimizer='rmsprop', metrics=['accuracy'])
      8 
      9 print(model.summary())

/home/rudraksh/anaconda2/lib/python2.7/site-packages/keras/models.pyc in compile(self, optimizer, loss, metrics, sample_weight_mode, **kwargs)
    587                            metrics=metrics,
    588                            sample_weight_mode=sample_weight_mode,
--> 589                            **kwargs)
    590         self.optimizer = self.model.optimizer
    591         self.loss = self.model.loss

/home/rudraksh/anaconda2/lib/python2.7/site-packages/keras/engine/training.pyc in compile(self, optimizer, loss, metrics, loss_weights, sample_weight_mode, **kwargs)
    617             loss_weight = loss_weights_list[i]
    618             output_loss = weighted_loss(y_true, y_pred,
--> 619                                         sample_weight, mask)
    620             if len(self.outputs) > 1:
    621                 self.metrics_tensors.append(output_loss)

/home/rudraksh/anaconda2/lib/python2.7/site-packages/keras/engine/training.pyc in weighted(y_true, y_pred, weights, mask)
    305     def weighted(y_true, y_pred, weights, mask=None):
    306         # score_array has ndim >= 2
--> 307         score_array = fn(y_true, y_pred)
    308         if mask is not None:
    309             # Cast the mask to floatX to avoid float64 upcasting in theano

/home/rudraksh/anaconda2/lib/python2.7/site-packages/keras/objectives.pyc in sparse_categorical_crossentropy(y_true, y_pred)
     43     If you get a shape error, add a length-1 dimension to labels.
     44     '''
---> 45     return K.sparse_categorical_crossentropy(y_pred, y_true)
     46 
     47 

/home/rudraksh/anaconda2/lib/python2.7/site-packages/keras/backend/tensorflow_backend.pyc in sparse_categorical_crossentropy(output, target, from_logits)
   1991     if len(output_shape) == 3:
   1992         # if our output includes timesteps we need to reshape
-> 1993         return tf.reshape(res, [-1, int(output_shape[-2])])
   1994     else:
   1995         return res

TypeError: __int__ returned non-int (type NoneType)

It works fine if I use categorical_crossentropy.