Keras: How to interprect `.predict_proba()` for multi-label Classification problem?

Created on 25 Feb 2016 · 4Comments · Source: keras-team/keras

Not really an issue, more of a question to ensure for myself (and others) that the output from the .predict_proba() function for a multi-label classification problem is being interpreted correctly.

So here's a toy problem:

# generate some sample data

X = np.array([[4, 5, 6, 7, 8],
[0, 5, 6, 2, 3],
[1, 2, 6, 5, 8],
[6, 1, 1, 1, 3],
[2, 5, 3, 2, 0]])

y = [['blue', 'red'],
['red'],
['red', 'green'],
['blue', 'green'],
['orange']]d

X_test = np.array([[4, 6, 1, 2, 8],
[0, 0, 1, 5, 1]])

# binarize text labels

mlb = preprocessing.MultiLabelBinarizer()
y = mlb.fit_transform(y)

Using a basic Keras Sequential() model:

model = Sequential()

model.add(Dense(output_dim=10, input_dim=5, init='uniform', activation='tanh'))
model.add(Dense(output_dim=10, input_dim=10, init='uniform', activation='tanh'))
model.add(Dense(output_dim=10, input_dim=10, init='uniform', activation='tanh'))
model.add(Dense(output_dim=4, init='uniform', activation='sigmoid'))
model.compile(optimizer='adadelta', loss='binary_crossentropy')

model.fit(X, y)
proba = model.predict_proba(X_test)

Which outputs:

»» proba
[[ 0.39656317 0.41439512 0.03391508 0.90610588]
[ 0.40581116 0.41944474 0.05669538 0.86803496]]

So how do I interpret this output? I'm used to using scikit Classifiers, which return probability output for multi-label problems in the shape of [n_samples, n_classes] where n_samples are the number of test instances and n_classes are 2 (0, 1). But I think that the model above is outputting in the shape of [n_samples, n_classes] where samples are test instances and classes are the unique number of classes in y.

Could someone clarify this output for me?

Source

aar3

👍3

Most helpful comment

You have an output of 4 binary variables, one for each color. If you take a look at your preprocessed y variable, you will have the same shape (n_samples, n_classes). Your output could be seen as p(color_k=1|x_i) where i is in {1,2} (your samples), for all k where k is in {1,2,3,4} (your colors) giving you an output of shape (2, 4). For a binary logistic regression (sigmoid activation), having 2 outputs is redundant since one is the complement of the other. Keras only outputs you p(color_k=1|x_i) and not p(color_k=0|x_i) like in scikit learn.

tboquet on 25 Feb 2016

👍3

All 4 comments

tboquet on 25 Feb 2016

👍3

got it thanks!

aar3 on 25 Feb 2016

@tboquet I don't understand that

Keras only outputs you p(color_k=1|x_i) and not p(color_k=0|x_i) like in scikit learn.

For single label, the index of y is from 0. Does it different from multi-label because the index of y is from 1.
But when i set the y_label, i also set from 0.