Keras: Error when checking model target with sparse_categorical_crossentropy in 1.0.4

Created on 17 Jun 2016  ·  20Comments  ·  Source: keras-team/keras

The following script fails with Keras 1.0.4, but worked with 1.0.3:

from keras.layers import Dense, Activation
from keras.models import Sequential
model = Sequential([
    Dense(32, input_dim=2),
    Activation('relu'),
    Dense(10),
    Activation('softmax'),
])
model.compile(loss='sparse_categorical_crossentropy', optimizer='sgd')
model.fit([[0,1], [1,1], [1,0]], [1,2,3])

gives the following exception:

Exception: Error when checking model target: expected activation_6 to have shape (None, 10) but got array with shape (3, 1)

The check doesn't seem to take into account the sparse categorical entropy loss (which should only take one integer target per training example). This has been tested with both Tensorflow and Theano backend.

Please make sure that the boxes below are checked before you submit your issue. Thank you!

  • [X] Check that you are up-to-date with the master branch of Keras. You can update with:
    pip install git+git://github.com/fchollet/keras.git --upgrade --no-deps
  • [X] If running on Theano, check that you are up-to-date with the master branch of Theano. You can update with:
    pip install git+git://github.com/Theano/Theano.git --upgrade --no-deps
  • [X] Provide a link to a GitHub Gist of a Python script that can reproduce your issue (or just copy the script here if it is short).

Most helpful comment

@damianhinch
In the model above, the output at the end is 10-dimensional (Dense(output_dim=10)). Because of the softmax layer this can be interpreted as a probability distribution over the 10 classes I am trying to predict (e.g. 10 digits if using MNIST). A typical output would then look like [0.1, 0.05, 0.3, ... , 0.02].

The problem then is that I am trying to fit this output, let's call it y_pred with y_train. But each example in y_train is just a digit (e.g. 2), so it has shape (1,) whereas my y_pred has shape (10,), so there is a mismatch.

There are two ways you can solve this, either you can encode y_train as a one-hot vector, i.e. a vector which is 1 at the digit represented and 0 otherwise. So for 2 this would be [0, 0, 1, 0, 0, ..., 0]. You can use this one hot encoding to fit the model with categorical_crossentropy as above.

Alternatively you can use sparse_categorical_crossentropy which will take care of this transformation for you internally. The word sparse is used here because 2 is a sparse representation of [0, 0, 1, 0, ..., 0] in the sense that it refers to the index of the non zero element.

All 20 comments

Your code runs for me. Have you tried syncing to the master branch?

Are you by any chance using windows? I experience the same problem when i use it on my windows10 machine even with the most recent master branch. However, using my Linux machine your code runs just fine. Anyone have any suggestions?

Most likely that's an installation problem on your window machine. There's
no reason this code wouldn't run if you are actually running the latest
master branch.
On Jun 19, 2016 8:34 PM, "JohnnyRisk" [email protected] wrote:

Are you by any chance using windows? I experience the same problem when i
use it on my windows10 machine even with the most recent master branch.
However, using my Linux machine your code runs just fine. Anyone have any
suggestions?


You are receiving this because you commented.
Reply to this email directly, view it on GitHub
https://github.com/fchollet/keras/issues/3009#issuecomment-227043893,
or mute the thread
https://github.com/notifications/unsubscribe/AArWbw8ppznZBt4baQDX8kpHmin8u2vIks5qNgo0gaJpZM4I4lqr
.

This was tested on Ubuntu with a fresh venv install and pip install keras==1.0.4.

However I just ran with master and this same code runs fine with it. Reverting back to the pypi 1.0.4 release still throws the above exception though.

I'll close this issue for the time being. Many thanks!

Hi,

I've synced to the latest master branch (Keras 1.0.5) and get exactly the same exception.

I'm running the code in a conda environment with the Tensorflow backend (on Mac OS X).

I'm running the following (basically the same as above)

```X_train = np.array([[1,2], [6,5], [8,2]])
y_train = np.array([2,3,7])
input_dim = X_train.shape[1]

model = Sequential()

model.add(Dense(output_dim=64, input_dim=input_dim))
model.add(Activation("relu"))
model.add(Dense(output_dim=10))
model.add(Activation("softmax"))

model.compile(loss='categorical_crossentropy', optimizer='sgd', metrics=['accuracy'])

model.fit(X_train, y_train, nb_epoch=5, batch_size=32)
```

The exception I am getting is

Exception: Error when checking model target: expected activation_2 to have shape (None, 10) but got array with shape (3, 1)

I'm synced with the master branch, but have also tried this with Keras 1.0.3, neither works. Do you have any idea what this issue could be stemming from?

Cheers

whats up @EmilienDupont try sparse_categorical_crossentropy and reshaping y_train = y_train.reshape((-1, 1))

Cheers @lukedeo, that did the trick 👍

I'm new to Python and Keras, but am running into something similar with my code – what does a shape of (None, 10) actually look like? Would that not just be a basic list with 10 elements?

@EmilienDupont could you explain what you did? I also have the same problem.

@damianhinch
In the model above, the output at the end is 10-dimensional (Dense(output_dim=10)). Because of the softmax layer this can be interpreted as a probability distribution over the 10 classes I am trying to predict (e.g. 10 digits if using MNIST). A typical output would then look like [0.1, 0.05, 0.3, ... , 0.02].

The problem then is that I am trying to fit this output, let's call it y_pred with y_train. But each example in y_train is just a digit (e.g. 2), so it has shape (1,) whereas my y_pred has shape (10,), so there is a mismatch.

There are two ways you can solve this, either you can encode y_train as a one-hot vector, i.e. a vector which is 1 at the digit represented and 0 otherwise. So for 2 this would be [0, 0, 1, 0, 0, ..., 0]. You can use this one hot encoding to fit the model with categorical_crossentropy as above.

Alternatively you can use sparse_categorical_crossentropy which will take care of this transformation for you internally. The word sparse is used here because 2 is a sparse representation of [0, 0, 1, 0, ..., 0] in the sense that it refers to the index of the non zero element.

@Emerson (None, 10) is just a placeholder for an array with an unknown number of rows and 10 columns. During training this would typically take a shape (<batch_size>, 10) depending on the size of your batch.

Hi, I got the same error below;
ValueError: Error when checking target: expected sequential_1 to have 4 dimensions, but got array with shape (1481, 3).
The input information is here like;
X_train=(1481, 64, 64, 3) / y_train=(1481, 3) and y_train is like categorized array of [[0 1 0]\n [1 0 0]\n ....]

and model function is below;

image_size = (64, 64)
input_image = Input(shape=(*image_size, 3))
base_model = VGG16(input_tensor=input_image, include_top=False)
top_model = Sequential()
top_model.add(Dense(3, input_shape=base_model.output_shape[1:], activation="softmax"))
model = Model(inputs=base_model.input, outputs=top_model(base_model.output))

I understand that this error is not as expected as the shape of the tensor of y_train, but I did not know how to solve this. Would you give me some advice if possible?

I had this same error before! I found out that the last layer is treated as the output layer, so make sure to change the last layer to Dense(OUTPUT_SIZE).

Hi, I use CNN for text classification and I got this error which is similar to the previous errors but i couldn't solve it with previous solutions. here is the error:
ValueError: Error when checking input: expected input_11 to have shape (None, 185) but got array with shape (1665, 35)
and here the information:
x_train shape: (1665, 35)
x_test shape: (185, 35)
Vocabulary Size: 4825
y_train 1665
y_test 185
Initializing embedding layer with word2vec weights, shape (4825, 300)

I would be grateful if you help me to figure it out.

@simarad Could you please share your code so that I could look into the issue?

yes, i used the code in :https://github.com/alexander-rakhlin/CNN-for-Sentence-Classification-in-Keras. and my main code is:

Train convolutional network for sentiment analysis on IMDB corpus. Based on
"Convolutional Neural Networks for Sentence Classification" by Yoon Kim
http://arxiv.org/pdf/1408.5882v2.pdf
For "CNN-rand" and "CNN-non-static" gets to 88-90%, and "CNN-static" - 85% after 2-5 epochs with following settings:
embedding_dim = 50
filter_sizes = (3, 8)
num_filters = 10
dropout_prob = (0.5, 0.8)
hidden_dims = 50
import numpy as np
from keras.models import Sequential, Model
from keras.layers import Dense, Dropout, Flatten, Input, MaxPooling1D, Convolution1D, Embedding
from keras.layers.merge import Concatenate
from keras.preprocessing import sequence
np.random.seed(0)

---------------------- Parameters section -------------------

Model Hyperparameters

embedding_dim = 300
filter_sizes = (3, 8)
num_filters = 10
dropout_prob = (0.5, 0.8)
hidden_dims = 50

Training parameters

batch_size = 64
num_epochs = 10

Prepossessing parameters

sequence_length = 400
max_words = 5000

Word2Vec parameters (see train_word2vec)

min_word_count = 1
context = 10

---------------------- Parameters end -----------------------

x, y, vocabulary, vocabulary_inv_list = x, y, vocabulary, vocabulary_inv
vocabulary_inv = {key: value for key, value in enumerate(vocabulary_inv_list)}
train_len = int(len(x) * 0.9)
x_train = x[:train_len]
y_train = y[:train_len]
x_test = x[train_len:]
y_test = y[train_len:]

Data Preparation

print("Load data...")
if sequence_length != x_test.shape[0]:
print("Adjusting sequence length for actual size")
sequence_length = x_test.shape[0]

print("x_train shape:", x_train.shape)
print("x_test shape:", x_test.shape)
print("Vocabulary Size: {:d}".format(len(vocabulary_inv)))
print("y_train",len(y_train))
print("y_test", len(y_test))

Prepare embedding layer weights and convert inputs for static model

print("x_train static shape:", x_train.shape)
print("x_test static shape:", x_test.shape)

Build model

if model_type == "CNN-static":
input_shape = (sequence_length, embedding_dim)
else:
input_shape = (sequence_length,)

model_input = Input(shape=input_shape)

if model_type == "CNN-static":
z = model_input
else:
z = Embedding(len(vocabulary_inv), embedding_dim, input_length=sequence_length, name="embedding")(model_input)

z = Dropout(dropout_prob[0])(z)

Convolutional block

conv_blocks = []
for sz in filter_sizes:
conv = Convolution1D(filters=num_filters,
kernel_size=sz,
padding="valid",
activation="relu",
strides=1)(z)
conv = MaxPooling1D(pool_size=2)(conv)
conv = Flatten()(conv)
conv_blocks.append(conv)
z = Concatenate()(conv_blocks) if len(conv_blocks) > 1 else conv_blocks[0]

z = Dropout(dropout_prob[1])(z)
z = Dense(hidden_dims, activation="relu")(z)
model_output = Dense(1, activation="sigmoid")(z)

model = Model(model_input, model_output)
model.compile(loss="sparse_categorical_crossentropy", optimizer="adam", metrics=["accuracy"])

Initialize weights with word2vec

if model_type == "CNN-non-static":
weights = np.array([v for v in embedding_weights.values()])
print("Initializing embedding layer with word2vec weights, shape", weights.shape)
embedding_layer = model.get_layer("embedding")
embedding_layer.set_weights([weights])

Train the model

model.fit(x_train, y_train, batch_size=batch_size, validation_data=(x_test, y_test),epochs=num_epochs, verbose=2)

keras 1.2.0
python 2.7

from keras.models import Sequential, Model
from keras.layers.core import Activation, Flatten
from keras.layers import convolutional

class Bias(Layer):
    """Custom keras layer that simply adds a scalar bias to each location in the input

    Largely copied from the keras docs:
    http://keras.io/layers/writing-your-own-keras-layers/#writing-your-own-keras-layers
    """

    def __init__(self, **kwargs):
        super(Bias, self).__init__(**kwargs)

    def build(self, input_shape):
        self.W = K.zeros(input_shape[1:])
        self.trainable_weights = [self.W]

    def call(self, x, mask=None):
        return x + self.W


defaults = {
            "board": 10,
            "filters_per_layer": 128,
            "layers": 12,
            "filter_width_1": 5
}
# copy defaults, but override with anything in kwargs
params = defaults
network = Sequential()
# create first layer
network.add(convolutional.Convolution2D(
      input_shape=(6, 10, 10),
      nb_filter=128,
      nb_row=5,
      nb_col=5,
      init='uniform',
      activation='relu',
      border_mode='same'))

# create all other layers
for i in range(2, 13):
     # use filter_width_K if it is there, otherwise use 3
     filter_key = "filter_width_%d" % i
     filter_width = params.get(filter_key, 3)

     # use filters_per_layer_K if it is there, otherwise use default value
     filter_count_key = "filters_per_layer_%d" % i
     filter_nb = params.get(filter_count_key, 128)

     network.add(convolutional.Convolution2D(
         nb_filter=filter_nb,
         nb_row=filter_width,
         nb_col=filter_width,
         init='uniform',
         activation='relu',
         border_mode='same'))

# the last layer maps each <filters_per_layer> feature to a number
network.add(convolutional.Convolution2D(
    nb_filter=1,
    nb_row=1,
    nb_col=1,
    init='uniform',
    border_mode='same'))
# reshape output to be board x board
network.add(Flatten())
# add a bias to each board location
network.add(Bias())
# softmax makes it into a probability distribution
network.add(Activation('softmax'))

gives the following exception:

ValueError: Error when checking model target: expected activation_1 to have shape (None, 60) but got array with shape (10, 100)

the training data is a (10, 6, 10, 10) array , why model need (None, 60) ?
if chagne input_shape=(6, 10, 10) to input_shape=(10, 10, 10), will get:

ValueError: Error when checking model input: expected convolution2d_input_1 to have shape (None, 10, 10, 10) but got array with shape (10, 6, 10, 10)

@EmilienDupont Can you help me? thank you very much.

Hi Guys,

Please help

this code of my script:

encoder_inputs = Input(shape=(None,))
x = data_object.embeddingLayer(encoder_inputs)
x, state_h, state_c = LSTM(embeddingDim, return_state=True)(x)
encoder_states = [state_h, state_c]

decoder_inputs = Input(shape=(None,))
x = data_object.embeddingLayer(decoder_inputs)
x = LSTM(embeddingDim, return_sequences=True)(x, initial_state=encoder_states)
decoder_outputs = Dense(maxNumWords, activation='softmax')(x)

model = Model([encoder_inputs, decoder_inputs], decoder_outputs)

model.compile(optimizer='rmsprop', loss=lossFunction)

model.fit([array(inp_x), array(inp_y)], array(out_t), epochs=epochs)

returns the error
ValueError: Error when checking input: expected input_1 to have 2 dimensions, but got array with shape (99, 20, 1001)

the shapes of inp_x, inp_y and out_t are all (99, 20, 1001), I am not sure how to solve the error

@Phetsa
Have you been able to solve your issue? Seems I am having the same problem here. If you have, may I know how you went about it?

I was able to take care of this error by adding a reshape layer at the end. Example:

from keras.layers import Dense, Activation
from keras.models import Sequential

input_dim = 2
output_dim = 10

model = Sequential([
    Dense(32, input_dim=input_dim),
    Activation('relu'),
    Dense(output_dim),
    Activation('softmax'),
    Reshape((output_dim,)
])
model.compile(loss='sparse_categorical_crossentropy', optimizer='sgd')
model.fit([[0,1], [1,1], [1,0]], [1,2,3])
Was this page helpful?
0 / 5 - 0 ratings