Not really an issue, more of a question to ensure for myself (and others) that the output from the .predict_proba() function for a multi-label classification problem is being interpreted correctly.
So here's a toy problem:
# generate some sample data
X = np.array([[4, 5, 6, 7, 8],
[0, 5, 6, 2, 3],
[1, 2, 6, 5, 8],
[6, 1, 1, 1, 3],
[2, 5, 3, 2, 0]])
y = [['blue', 'red'],
['red'],
['red', 'green'],
['blue', 'green'],
['orange']]d
X_test = np.array([[4, 6, 1, 2, 8],
[0, 0, 1, 5, 1]])
# binarize text labels
mlb = preprocessing.MultiLabelBinarizer()
y = mlb.fit_transform(y)
Using a basic Keras Sequential() model:
model = Sequential()
model.add(Dense(output_dim=10, input_dim=5, init='uniform', activation='tanh'))
model.add(Dense(output_dim=10, input_dim=10, init='uniform', activation='tanh'))
model.add(Dense(output_dim=10, input_dim=10, init='uniform', activation='tanh'))
model.add(Dense(output_dim=4, init='uniform', activation='sigmoid'))
model.compile(optimizer='adadelta', loss='binary_crossentropy')
model.fit(X, y)
proba = model.predict_proba(X_test)
Which outputs:
禄禄 proba
[[ 0.39656317 0.41439512 0.03391508 0.90610588]
[ 0.40581116 0.41944474 0.05669538 0.86803496]]
So how do I interpret this output? I'm used to using scikit Classifiers, which return probability output for multi-label problems in the shape of [n_samples, n_classes] where n_samples are the number of test instances and n_classes are 2 (0, 1). But I think that the model above is outputting in the shape of [n_samples, n_classes] where samples are test instances and classes are the unique number of classes in y.
Could someone clarify this output for me?
You have an output of 4 binary variables, one for each color. If you take a look at your preprocessed y variable, you will have the same shape (n_samples, n_classes). Your output could be seen as p(color_k=1|x_i) where i is in {1,2} (your samples), for all k where k is in {1,2,3,4} (your colors) giving you an output of shape (2, 4). For a binary logistic regression (sigmoid activation), having 2 outputs is redundant since one is the complement of the other. Keras only outputs you p(color_k=1|x_i) and not p(color_k=0|x_i) like in scikit learn.
got it thanks!
@tboquet I don't understand that
Keras only outputs you p(color_k=1|x_i) and not p(color_k=0|x_i) like in scikit learn.
For single label, the index of y is from 0. Does it different from multi-label because the index of y is from 1.
But when i set the y_label, i also set from 0.
x is [[1,2,3],[4,5,6],[7,8,9]
y is [[1,2],[1,3],[3]]
Then y need to switch
y is [[1,1,0],
[1,0,1],
[0,0,1]]
@ralston3 Thanks for your share. Do you also use the predict_proba.
how about the predict Thanks.
Most helpful comment
You have an output of 4 binary variables, one for each color. If you take a look at your preprocessed
yvariable, you will have the same shape(n_samples, n_classes). Your output could be seen asp(color_k=1|x_i)whereiis in{1,2}(your samples), for allkwherekis in{1,2,3,4}(your colors) giving you an output of shape (2, 4). For a binary logistic regression (sigmoid activation), having 2 outputs is redundant since one is the complement of the other. Keras only outputs youp(color_k=1|x_i)and notp(color_k=0|x_i)like in scikit learn.