Keras: About use keras for multi-label classification

Created on 25 May 2017  路  6Comments  路  Source: keras-team/keras

I try to train a multi-labels classifier, I used sigmoid units in the output layer and then use "binary_crossentrpy" loss. Current problem is the results of the training and testing were ideal, values of loss and accuracy were great.But when I used model.predict() predicted label, the output don't match the real label value. How to change code to solve it? The shape of the training set and testing set is (-1, 1, 300, 300), the shape of the target label is (-1, 478), I have 478 in total.
My complete code:

import numpy as np
from keras.models import Sequential
from keras.layers import Dense, Activation, Conv2D, MaxPooling2D,Dropout, Flatten
from keras.optimizers import Adam

X = np.load('./data/X_train.npy')
y = np.load('./data/Y_train.npy')
X_train, y_train = X[:8000], y[:8000]
X_test, y_test = X[8000:], y[8000:]

model = Sequential()

model.add(Conv2D(input_shape=(1, 300, 300), padding='same', filters=32, kernel_size=(300, 5)))
model.add(Activation('relu'))
model.add(MaxPooling2D(pool_size=(1, 2), padding='same'))

model.add(Flatten())
model.add(Dense(2048))
model.add(Activation('relu'))
model.add(Dense(100))
model.add(Activation('sigmoid'))
model.compile(optimizer=Adam(lr=1e-4), loss='binary_crossentropy', metrics=['accuracy'])

print('\nTraining ------------')
model.fit(X_train, y_train, epochs=2, batch_size=100, verbose=1)
model.save('model.h5')

Could you help me to find a solution? Thanks!

stale

Most helpful comment

Consider the fact that the first dense layer in your model has over 3.5 million free parameters. Given that you only have 2000 training samples, your model is very likely overfitting. Try reducing the dimensionality further before feeding the representation to the fully-connected layers. Also consider using more forms of regularization besides dropout, such as batch normalization and weight regularizers. Also, if your labels are all mutually exclusive, consider using softmax and categorical crossentropy instead.

All 6 comments

Consider the fact that the first dense layer in your model has over 3.5 million free parameters. Given that you only have 2000 training samples, your model is very likely overfitting. Try reducing the dimensionality further before feeding the representation to the fully-connected layers. Also consider using more forms of regularization besides dropout, such as batch normalization and weight regularizers. Also, if your labels are all mutually exclusive, consider using softmax and categorical crossentropy instead.

@kgrm Thanks for your suggestion, I will try it. If I still have problems and need you help! Thank you again.

@kgrm Thanks for your answer,but how do you get the numbers of parameters in the first dense layer?Much thanks!

You downsample the input (300, 300) image twice using max-pooling with stride 2, therefore its spatial dimensionality is (75, 75) at that point. The final convolutional layer has 32 output filters, therefore the shape of the input tensor to the Flatten layer is (75, 75, 32). The input to the first Dense layer therefore has shape 75 * 75 * 32 = (180000,). The number of parameters for a dense layer is calculated as (number of inputs + 1) x (number of outputs), which, in this case, is actually 184.3 million. You can also verify this with model.summary(). This large a parameter space is obviously completely inappropriate for such a small dataset.

@kgrm I need you help. The last time I described the problem is not very clear. I used keras and CNNs for multi-label classification of english text. I used NLTK and word2vec to treat text as a matrix of 300*300, I increased the sample size, I have 8000 training samples and 2000 testing samples. I revised the code, output filter's shape is (300, 5), the number of filters is 64, the code has been updated. I use the sigmiod and binary_crossentropy because in my project, a text corresponds to more than one tag, but in softmax when increasing score for one label, all others are lowered. I have dropped the data from the full connection layer from a million to 4,800 (150 * 1 * 32), I changed some parameters, but still no effect. I don't know how to do it, help me again! Thanks!

This issue has been automatically marked as stale because it has not had recent activity. It will be closed after 30 days if no further activity occurs, but feel free to re-open a closed issue if needed.

Was this page helpful?
0 / 5 - 0 ratings

Related issues

zygmuntz picture zygmuntz  路  3Comments

KeironO picture KeironO  路  3Comments

braingineer picture braingineer  路  3Comments

LuCeHe picture LuCeHe  路  3Comments

snakeztc picture snakeztc  路  3Comments