keras 🚀 - Can CNN and LSTM classify muti-categories texts, how to modify the code? Thanks!

For eample, imdb_lstm_text_2_classification.

print('Build model...')
model = Sequential()
model.add(Embedding(max_features, 128))
model.add(LSTM(128, 128))  # try using a GRU instead, for fun
model.add(Dropout(0.5))
model.add(Dense(128, 1))
model.add(Activation('sigmoid'))

# try using different optimizers and different optimizer configs
model.compile(loss='binary_crossentropy', optimizer='adam', class_mode="binary")

print("Train...")
model.fit(X_train, y_train, batch_size=batch_size, nb_epoch=4, validation_data=(X_test, y_test), show_accuracy=True)
score, acc = model.evaluate(X_test, y_test, batch_size=batch_size, show_accuracy=True)
print('Test score:', score)
print('Test accuracy:', acc)

Imorton-zd on 22 Aug 2015

Note that the last layer of this model is Dense(128, 1). Thus it has only one output. If you need multiple outputs use Dense(128, number_of_classes) and use a different desired y_test. Also the new cost function will be categorical_crossentropy if only one class can be active at a time check this example out https://github.com/fchollet/keras/blob/master/examples/mnist_mlp.py#L46

EderSantana on 22 Aug 2015

👍2

One more thing: you will need to set class_mode='categorical' in compile. It's the default, so you could also just remove class_mode='binary'.

fchollet on 22 Aug 2015

👍2

@fchollet After removing class_mode='binary', the accuracy becomes 1.0000. Why? I diin't modify other codes.
My whole codes as following: modification from imdb_cnn
I want to classify 6 categories. X has 6 categories texts, and y is the label

X_train = X[:int(len(X)*(1-test_split))]
y_train = labels[:int(len(X)*(1-test_split))]

X_test = X[int(len(X)*(1-test_split)):]
y_test = labels[int(len(X)*(1-test_split)):]

print(len(X_train), 'train sequences')
print(len(X_test), 'test sequences')

# maxlen = 100
print("Pad sequences (samples x time)")
X_train = sequence.pad_sequences(X_train, maxlen=maxlen)
X_test = sequence.pad_sequences(X_test, maxlen=maxlen)
print('X_train shape:', X_train.shape)
print('X_test shape:', X_test.shape)

print('Build model...')
model = Sequential()

# we start off with an efficient embedding layer which maps
# our vocab indices into embedding_dims dimensions
model.add(Embedding(max_features, embedding_dims))
model.add(Dropout(0.25))

# we add a Convolution1D, which will learn nb_filters
# word group filters of size filter_length:
model.add(Convolution1D(input_dim=embedding_dims,
                        nb_filter=nb_filters,
                        filter_length=filter_length,
                        border_mode="valid",
                        activation="relu",
                        subsample_length=1))

# we use standard max pooling (halving the output of the previous layer):
model.add(MaxPooling1D(pool_length=2))

# We flatten the output of the conv layer, so that we can add a vanilla dense layer:
model.add(Flatten())

# Computing the output shape of a conv layer can be tricky;
# for a good tutorial, see: http://cs231n.github.io/convolutional-networks/
output_size = nb_filters * (((maxlen - filter_length) / 1) + 1) / 2

# We add a vanilla hidden layer:
model.add(Dense(output_size, hidden_dims))
model.add(Dropout(0.25))
model.add(Activation('relu'))

# We project onto a single unit output layer, and squash it with a sigmoid:
model.add(Dense(hidden_dims, 1))
model.add(Activation('sigmoid'))

model.compile(loss='categorical_crossentropy', optimizer='rmsprop')
model.fit(X_train, y_train, batch_size=batch_size, nb_epoch=nb_epoch,show_accuracy=True, validation_data=(X_test, y_test))
score, acc = model.evaluate(X_test, y_test, batch_size=batch_size, show_accuracy=True)
print('Test score:', score)
print('Test accuracy:', acc)

output:
Loading data...
max_features23823
len(words)23820
9600 train sequences
2400 test sequences
Pad sequences (samples x time)
X_train shape: (9600L, 200L)
X_test shape: (2400L, 200L)
Build model...
Train on 9600 samples, validate on 2400 samples
Epoch 0

32/9600 [..............................] - ETA: 1900s - loss: 0.6991 - acc: 1.0000
64/9600 [..............................] - ETA: 1900s - loss: -0.4477 - acc: 1.0000
96/9600 [..............................] - ETA: 1898s - loss: -4.1642 - acc: 1.0000

Imorton-zd on 23 Aug 2015

@EderSantana I modified the code as you say.

model.add(Dense(hidden_dims, 6))
model.add(Activation('softmax'))

model.compile(loss='categorical_crossentropy', optimizer='rmsprop')

but:
Traceback (most recent call last):
File "D:\workspace\search\src\CNN\text_classification_test.py", line 293, in
model.fit(X_train, y_train, batch_size=batch_size, nb_epoch=nb_epoch,show_accuracy=True, validation_data=(X_test, y_test))
File "D:\Python27\lib\site-packages\keras\models.py", line 413, in fit
validation_split=validation_split, val_f=val_f, val_ins=val_ins, shuffle=shuffle, metrics=metrics)
File "D:\Python27\lib\site-packages\keras\models.py", line 168, in _fit
outs = f(*ins_batch)
File "D:\Python27\lib\site-packages\theanocompile\function_module.py", line 606, in __call__
storage_map=self.fn.storage_map)
File "D:\Python27\lib\site-packages\theanocompile\function_module.py", line 595, in call
outputs = self.fn()
ValueError: Input dimension mis-match. (input[0].shape[1] = 1, input[1].shape[1] = 6)
Apply node that caused the error: Elemwise{Composite{(i0 * log((i1 / i2)))}}(AdvancedSubtensor1.0, Elemwise{clip,no_inplace}.0, InplaceDimShuffle{0,x}.0)
Inputs types: [TensorType(float64, matrix), TensorType(float64, matrix), TensorType(float64, col)]
Inputs shapes: [(32L, 1L), (32L, 6L), (32L, 1L)]
Inputs strides: [(8L, 8L), (48L, 8L), (8L, 8L)]
Inputs values: ['not shown', 'not shown', 'not shown']

HINT: Re-running with most Theano optimization disabled could give you a back-trace of when this node was created. This can be done with by setting the Theano flag 'optimizer=fast_compile'. If that does not work, Theano optimizations can be disabled with 'optimizer=None'.
HINT: Use the Theano flag 'exception_verbosity=high' for a debugprint and storage map footprint of this apply node.

Imorton-zd on 23 Aug 2015

@Imorton-zd, see the value error:
ValueError: Input dimension mis-match. (input[0].shape[1] = 1, input[1].shape[1] = 6)
You are probably still using a desired signal y with only one class (last dimension equal to 1).

Btw, just a little note, if posting python code, please consider using surrounding it with:

`...code goes inside...`

It will give it a nice syntax highlighting and won't mess up with the # signs.

EderSantana on 23 Aug 2015

Thanks for your help! @EderSantana I can classify multi-category texts. However, a new problem came. The accuracy of train samples is good as I expect, but The accuracy of test samples is very low, as following：
9600/9600 [==============================] - 2098s - loss: 0.5812 - acc: 0.7870 - val_loss: 1.4665 - val_acc: 0.5363

my complete code（omiting data reading）:

batch_size = 32
embedding_dims = 100
nb_filters = 250
filter_length = 3
hidden_dims = 250
nb_epoch = 10
test_split=0.2
seed=113
nb_words=None
skip_top=0
maxlen=200
start_char=1 
oov_char=2
index_from=3
nb_classes = 6
# convert class vectors to binary class matrices
Y_train = np_utils.to_categorical(y_train, nb_classes)
Y_test = np_utils.to_categorical(y_test, nb_classes)

print(len(X_train), 'train sequences')
print(len(X_test), 'test sequences')

# maxlen = 100
print("Pad sequences (samples x time)")
X_train = sequence.pad_sequences(X_train, maxlen=maxlen)
X_test = sequence.pad_sequences(X_test, maxlen=maxlen)
print('X_train shape:', X_train.shape)
print('X_test shape:', X_test.shape)

print('Build model...')
model = Sequential()

# we start off with an efficient embedding layer which maps
# our vocab indices into embedding_dims dimensions
model.add(Embedding(max_features, embedding_dims))
model.add(Dropout(0.25))

# we add a Convolution1D, which will learn nb_filters
# word group filters of size filter_length:
model.add(Convolution1D(input_dim=embedding_dims,
                        nb_filter=nb_filters,
                        filter_length=filter_length,
                        border_mode="valid",
                        activation="relu",
                        subsample_length=1))

# we use standard max pooling (halving the output of the previous layer):
model.add(MaxPooling1D(pool_length=2))

# We flatten the output of the conv layer, so that we can add a vanilla dense layer:
model.add(Flatten())

# Computing the output shape of a conv layer can be tricky;
# for a good tutorial, see: http://cs231n.github.io/convolutional-networks/
output_size = nb_filters * (((maxlen - filter_length) / 1) + 1) / 2

# We add a vanilla hidden layer:
model.add(Dense(output_size, hidden_dims))
model.add(Dropout(0.25))
model.add(Activation('relu'))

# We project onto a single unit output layer, and squash it with a sigmoid:
model.add(Dense(hidden_dims, nb_classes))
model.add(Activation('softmax'))

# model.add(Dense(128, 1, init='normal'))
# model.add(Activation('relu'))

# sgd = SGD(l2=0.0,lr=0.005, decay=1e-6, momentum=0.9, nesterov=True)
# model.compile(loss='categorical_crossentropy', optimizer=sgd, class_mode="categorical")
# model.compile(loss='categorical_crossentropy', optimizer='sgd', class_mode="categorical")
model.compile(loss='categorical_crossentropy', optimizer='rmsprop')
model.fit(X_train, Y_train, batch_size=batch_size, nb_epoch=nb_epoch,show_accuracy=True, validation_data=(X_test, Y_test))
score, acc = model.evaluate(X_test, Y_test, batch_size=batch_size, show_accuracy=True)
print('Test score:', score)
print('Test accuracy:', acc)

How to set the parameters? My computer memory is 8G, if using too many samples or other ways occupying most memory, I'm afraid my computer will carsh. How can I do? Thanks

Imorton-zd on 23 Aug 2015

How big is your dataset? 250 filters seems to be a lot, you may be overfitting the training data, in that case you will need a bigger dataset or use a smaller net. If you don't have enough labeled data for you problem, maybe some unsupervised training in a larger dataset should help.

The following video class should give you some insights on that https://www.youtube.com/watch?v=jCGplSKrl2Y

EderSantana on 23 Aug 2015

@EderSantana My dataset is 12000, 9600 for train, 2400 for test, classifying to 6 classes. If using a smaller net, is it ok to reduce some filters only? At first, I want to add a layer, but I don't kown if it is effective and what layer I can add?

Imorton-zd on 23 Aug 2015

well, welcome to neural networks
the only way to answer these questions is with experience
you will have to run as much experiments as you can and build your intuition with the results. It also helps to read papers and watch video lectures (did you see Hinton's Coursera and De Freitas youtube videos?). Try starting with an architecture used in a similar problem, with a similar dataset size.

But yeah, your dataset is small, to overfit it is the first step, now you have to regularize your model so it can generalize well. You should try less filters and maybe a dropout of .5. Instead of a single 250 filter layer, try two layers with 20 for example.

Good luck!

EderSantana on 23 Aug 2015

👍3 🎉1

@EderSantana the Hinton's Coursera is actualy indigestible. For cnn doing texts classification, how many samples of one category are suitable? I have read the keras documentation, in which the input of convolutional layer has the 'nb_samples', not 'embedding_dims' in example of cnn. Why? And, I want add another convolutional layer using Convolution1D after the first convolutional layer and maxpooling layer expecting higher accuracy. According to the cnn example above, what code do I need to add? Many thanks.
By the way, why does the 'output_size' calculate like this in the code above? If adding another convolutional layer and maxpooling layer, what the 'output_size' should be?

Imorton-zd on 29 Aug 2015

Hi @Imorton-zd

What is the shape of X_train and X_test in your code?

Many thanks,

vinhqdang on 24 Dec 2015

Keras: Can CNN and LSTM classify muti-categories texts, how to modify the code? Thanks!

Most helpful comment

All 12 comments

Related issues