Keras: Error when Keras model is used in grid search/cross_val_score with a custom scoring for a multilabel problem?

Created on 8 Feb 2018  路  12Comments  路  Source: keras-team/keras

Hi all,

I understand KerasClassifier can be used to wrap a Keras model and input to grid_search/cross_val_score in scikit-learn. But I can't find a way to work this out for a multi-label problem. In this problem, the label for a sample is a binary list [1,0,1,1,0...].

Notice that y has to be in (n_sample, n_label) form for a multi-label problem. For a multi class problem, I am able to convert one-hot encoded y into 1D array of class values. But not for multi-label. When I train a Keras model and compute the f1_micro score for this multi-label problem, I got the following error.

x.shape
(10004, 170)
y.shape
(10004, 15)

from keras.layers import Dense, Activation, Dropout
from keras.optimizers import Adam
from keras.models import Sequential
from keras.wrappers.scikit_learn import KerasClassifier
from sklearn.model_selection import cross_val_score
from sklearn.tree import DecisionTreeClassifier

def create_dnn():
    model = Sequential([Dense(2048, input_shape=(170,)), Activation('relu'), Dropout(0.75),
                        Dense(1024), Activation('relu'), Dropout(0.75),
                        Dense(2048), Activation('relu'),
                        Dense(15), Activation('sigmoid')])

    model.compile(optimizer=Adam(), loss='binary_crossentropy', metrics=['accuracy'])

    return model

dnn_model = KerasClassifier(build_fn=create_dnn, batch_size=256, epochs=1, verbose=1)
cross_val_score(dnn_model, x, y, cv=2, scoring='f1_micro')
Epoch 1/1
5002/5002 [==============================] - 5s - loss: 0.2030 - acc: 0.9277     
4864/5002 [============================>.] - ETA: 0s
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-62-33f9d3b67676> in <module>()
----> 1 cross_val_score(dnn_model, X=x, y=y, cv=2, scoring='f1_micro')

~/anaconda3/lib/python3.6/site-packages/sklearn/model_selection/_validation.py in cross_val_score(estimator, X, y, groups, scoring, cv, n_jobs, verbose, fit_params, pre_dispatch)
    319                                 n_jobs=n_jobs, verbose=verbose,
    320                                 fit_params=fit_params,
--> 321                                 pre_dispatch=pre_dispatch)
    322     return cv_results['test_score']
    323 

~/anaconda3/lib/python3.6/site-packages/sklearn/model_selection/_validation.py in cross_validate(estimator, X, y, groups, scoring, cv, n_jobs, verbose, fit_params, pre_dispatch, return_train_score)
    193             fit_params, return_train_score=return_train_score,
    194             return_times=True)
--> 195         for train, test in cv.split(X, y, groups))
    196 
    197     if return_train_score:

~/anaconda3/lib/python3.6/site-packages/sklearn/externals/joblib/parallel.py in __call__(self, iterable)
    777             # was dispatched. In particular this covers the edge
    778             # case of Parallel used with an exhausted iterator.
--> 779             while self.dispatch_one_batch(iterator):
    780                 self._iterating = True
    781             else:

~/anaconda3/lib/python3.6/site-packages/sklearn/externals/joblib/parallel.py in dispatch_one_batch(self, iterator)
    623                 return False
    624             else:
--> 625                 self._dispatch(tasks)
    626                 return True
    627 

~/anaconda3/lib/python3.6/site-packages/sklearn/externals/joblib/parallel.py in _dispatch(self, batch)
    586         dispatch_timestamp = time.time()
    587         cb = BatchCompletionCallBack(dispatch_timestamp, len(batch), self)
--> 588         job = self._backend.apply_async(batch, callback=cb)
    589         self._jobs.append(job)
    590 

~/anaconda3/lib/python3.6/site-packages/sklearn/externals/joblib/_parallel_backends.py in apply_async(self, func, callback)
    109     def apply_async(self, func, callback=None):
    110         """Schedule a func to be run"""
--> 111         result = ImmediateResult(func)
    112         if callback:
    113             callback(result)

~/anaconda3/lib/python3.6/site-packages/sklearn/externals/joblib/_parallel_backends.py in __init__(self, batch)
    330         # Don't delay the application, to avoid keeping the input
    331         # arguments in memory
--> 332         self.results = batch()
    333 
    334     def get(self):

~/anaconda3/lib/python3.6/site-packages/sklearn/externals/joblib/parallel.py in __call__(self)
    129 
    130     def __call__(self):
--> 131         return [func(*args, **kwargs) for func, args, kwargs in self.items]
    132 
    133     def __len__(self):

~/anaconda3/lib/python3.6/site-packages/sklearn/externals/joblib/parallel.py in <listcomp>(.0)
    129 
    130     def __call__(self):
--> 131         return [func(*args, **kwargs) for func, args, kwargs in self.items]
    132 
    133     def __len__(self):

~/anaconda3/lib/python3.6/site-packages/sklearn/model_selection/_validation.py in _fit_and_score(estimator, X, y, scorer, train, test, verbose, parameters, fit_params, return_train_score, return_parameters, return_n_test_samples, return_times, error_score)
    465         fit_time = time.time() - start_time
    466         # _score will return dict if is_multimetric is True
--> 467         test_scores = _score(estimator, X_test, y_test, scorer, is_multimetric)
    468         score_time = time.time() - start_time - fit_time
    469         if return_train_score:

~/anaconda3/lib/python3.6/site-packages/sklearn/model_selection/_validation.py in _score(estimator, X_test, y_test, scorer, is_multimetric)
    500     """
    501     if is_multimetric:
--> 502         return _multimetric_score(estimator, X_test, y_test, scorer)
    503     else:
    504         if y_test is None:

~/anaconda3/lib/python3.6/site-packages/sklearn/model_selection/_validation.py in _multimetric_score(estimator, X_test, y_test, scorers)
    530             score = scorer(estimator, X_test)
    531         else:
--> 532             score = scorer(estimator, X_test, y_test)
    533 
    534         if hasattr(score, 'item'):

~/anaconda3/lib/python3.6/site-packages/sklearn/metrics/scorer.py in __call__(self, estimator, X, y_true, sample_weight)
    106         else:
    107             return self._sign * self._score_func(y_true, y_pred,
--> 108                                                  **self._kwargs)
    109 
    110 

~/anaconda3/lib/python3.6/site-packages/sklearn/metrics/classification.py in f1_score(y_true, y_pred, labels, pos_label, average, sample_weight)
    712     return fbeta_score(y_true, y_pred, 1, labels=labels,
    713                        pos_label=pos_label, average=average,
--> 714                        sample_weight=sample_weight)
    715 
    716 

~/anaconda3/lib/python3.6/site-packages/sklearn/metrics/classification.py in fbeta_score(y_true, y_pred, beta, labels, pos_label, average, sample_weight)
    826                                                  average=average,
    827                                                  warn_for=('f-score',),
--> 828                                                  sample_weight=sample_weight)
    829     return f
    830 

~/anaconda3/lib/python3.6/site-packages/sklearn/metrics/classification.py in precision_recall_fscore_support(y_true, y_pred, beta, labels, pos_label, average, warn_for, sample_weight)
   1023         raise ValueError("beta should be >0 in the F-beta score")
   1024 
-> 1025     y_type, y_true, y_pred = _check_targets(y_true, y_pred)
   1026     present_labels = unique_labels(y_true, y_pred)
   1027 

~/anaconda3/lib/python3.6/site-packages/sklearn/metrics/classification.py in _check_targets(y_true, y_pred)
     79     if len(y_type) > 1:
     80         raise ValueError("Classification metrics can't handle a mix of {0} "
---> 81                          "and {1} targets".format(type_true, type_pred))
     82 
     83     # We can't have more than one value on y_type => The set is no more needed

ValueError: Classification metrics can't handle a mix of multilabel-indicator and multiclass targets

I understand that this is because y is in the form of (n_sample, n_labels). But the predicted y from dnn_model.predict() is internally converted to 1D array using a predict_classes() function.

dnn_model.model.predict(x).shape

(10004, 15)

In contrast,

dnn_model.predict(x).shape

(10004, )

But interestingly, if I don't use custom scoring, then the error is gone.

cross_val_score(dnn_model, X=x, y=y, cv=2)
Epoch 1/1
5002/5002 [==============================] - 6s - loss: 0.2296 - acc: 0.9110     
5002/5002 [==============================] - 2s     
Epoch 1/1
5002/5002 [==============================] - 6s - loss: 0.2104 - acc: 0.9221     
4864/5002 [============================>.] - ETA: 0s
Out[63]:
array([0.94524857, 0.93333329])



md5-8b8ce84fc5764545785942f0fe721820



cross_val_score(DecisionTreeClassifier(max_depth=3), 
                X=x, y=y, scoring='f1_micro', cv=2)



md5-38adf16bb8448bedaa34dceb9cef9a1f



array([0.63451777, 0.59030837])



md5-50952144b6a40259cbf27af0baf9ee25



dt = DecisionTreeClassifier(max_depth=3)
dt.fit(x, y)
dt.predict(x).shape



md5-38adf16bb8448bedaa34dceb9cef9a1f



(10004, 15)

So I think there should be a way to do grid search/cross_val_score with a custom scoring with KerasCLassifier for a multilabel problem. Please advise. Thanks!

Most helpful comment

Hi,

I've found how to resolve it

First this doc (http://scikit-learn.org/stable/modules/multiclass.html#multilabel-classification-format) show that the one-hot representation used in Keras is interpreted as multilabel in scikit-learn.

Then looking at scikit_learn.py implementing KerasClassifier class : https://github.com/keras-team/keras/blob/master/keras/wrappers/scikit_learn.py

The fit function in the BaseWrapper class includes this line of code :

if loss_name == 'categorical_crossentropy' and len(y.shape) != 2:
            y = to_categorical(y)

The Wrapper does the categorical transformation by itself.

It seems that Keras, to avoid this issue, due to the difference in the multiclass representation with scikit-learn, can takes a scikit-learn style multiclass [0,1,2,1] and transform it into categorical representation [[0,0,0],[0,1,0],[0,0,1],[0,1,0]] just for the NN model fit.

So, I simply tried removing the categorical transformation when passing the model to the sklearn functions.

And it works now

import numpy as np
from sklearn import datasets
from sklearn.model_selection import train_test_split, cross_val_score, GridSearchCV
from keras.models import Sequential
from keras.layers import Dense, Activation
from keras.utils import to_categorical
from keras.wrappers.scikit_learn import KerasClassifier

iris = datasets.load_iris()
X= iris.data
#Y = to_categorical(iris.target,3)
Y = iris.target

X_train, X_test, Y_train, Y_test = train_test_split(X, Y, train_size=0.8, random_state=1000)

def create_model(optimizer='rmsprop'):
    model = Sequential()
    model.add(Dense(8,activation='relu',input_shape = (4,)))
    model.add(Dense(3,activation='softmax'))
    model.compile(optimizer = optimizer,
                  loss='categorical_crossentropy',
                  metrics=['accuracy'])
    return model


model = KerasClassifier(build_fn=create_model,
                        epochs=10, 
                        batch_size=5,
                        verbose=0)

#results = cross_val_score(model, X_train, Y_train, scoring='precision_macro')

param_grid = {'optimizer':('rmsprop','adam')}
grid = GridSearchCV(model,
                    param_grid=param_grid,
                    return_train_score=True,
                   scoring=['precision_macro','recall_macro','f1_macro'],
                    refit='precision_macro')
grid_results = grid.fit(X_train,Y_train)

All 12 comments

I have exactly the same issue.
Using custom scoring with Multiclass outputs from Keras model returns the same error for cross_val_score or GridSearchCV as below (it's on Iris, so you can run it directly):

import numpy as np
from sklearn import datasets
from sklearn.model_selection import train_test_split, cross_val_score, GridSearchCV
from keras.models import Sequential
from keras.layers import Dense
from keras.utils import to_categorical
from keras.wrappers.scikit_learn import KerasClassifier

iris = datasets.load_iris()
X= iris.data
Y = to_categorical(iris.target)

X_train, X_test, Y_train, Y_test = train_test_split(X, Y, train_size=0.8, random_state=1000)

def create_model(optimizer='rmsprop'):
    model = Sequential()
    model.add(Dense(8,activation='relu',input_shape = (4,)))
    model.add(Dense(3,activation='softmax'))
    model.compile(optimizer = optimizer,
                  loss='categorical_crossentropy',
                  metrics=['accuracy'])
    return model


model = KerasClassifier(build_fn=create_model,
                        epochs=10, 
                        batch_size=5,
                        verbose=0)

#results = cross_val_score(model, X_train, Y_train, scoring='precision_macro')

param_grid = {'optimizer':('rmsprop','adam')}
grid = GridSearchCV(model,
                    param_grid=param_grid,
                    return_train_score=True,
                    scoring=['accuracy','precision_macro','recall_macro'],
                    refit='precision_macro')

grid_results = grid.fit(X_train,Y_train)

So I get this error

ValueError: Classification metrics can't handle a mix of multilabel-indicator and binary targets 

When I, remove the scoring parameters, it works.

Is there any way to avoid that and enable having a f1, precision or any custom score ?

Thanks for your help

Hi,

I've found how to resolve it

First this doc (http://scikit-learn.org/stable/modules/multiclass.html#multilabel-classification-format) show that the one-hot representation used in Keras is interpreted as multilabel in scikit-learn.

Then looking at scikit_learn.py implementing KerasClassifier class : https://github.com/keras-team/keras/blob/master/keras/wrappers/scikit_learn.py

The fit function in the BaseWrapper class includes this line of code :

if loss_name == 'categorical_crossentropy' and len(y.shape) != 2:
            y = to_categorical(y)

The Wrapper does the categorical transformation by itself.

It seems that Keras, to avoid this issue, due to the difference in the multiclass representation with scikit-learn, can takes a scikit-learn style multiclass [0,1,2,1] and transform it into categorical representation [[0,0,0],[0,1,0],[0,0,1],[0,1,0]] just for the NN model fit.

So, I simply tried removing the categorical transformation when passing the model to the sklearn functions.

And it works now

import numpy as np
from sklearn import datasets
from sklearn.model_selection import train_test_split, cross_val_score, GridSearchCV
from keras.models import Sequential
from keras.layers import Dense, Activation
from keras.utils import to_categorical
from keras.wrappers.scikit_learn import KerasClassifier

iris = datasets.load_iris()
X= iris.data
#Y = to_categorical(iris.target,3)
Y = iris.target

X_train, X_test, Y_train, Y_test = train_test_split(X, Y, train_size=0.8, random_state=1000)

def create_model(optimizer='rmsprop'):
    model = Sequential()
    model.add(Dense(8,activation='relu',input_shape = (4,)))
    model.add(Dense(3,activation='softmax'))
    model.compile(optimizer = optimizer,
                  loss='categorical_crossentropy',
                  metrics=['accuracy'])
    return model


model = KerasClassifier(build_fn=create_model,
                        epochs=10, 
                        batch_size=5,
                        verbose=0)

#results = cross_val_score(model, X_train, Y_train, scoring='precision_macro')

param_grid = {'optimizer':('rmsprop','adam')}
grid = GridSearchCV(model,
                    param_grid=param_grid,
                    return_train_score=True,
                   scoring=['precision_macro','recall_macro','f1_macro'],
                    refit='precision_macro')
grid_results = grid.fit(X_train,Y_train)

Thanks, Dan. I will try it out in my code!

@moon412 Hi, moon. Have you solve the problem. I came across the same error with you. 'ValueError: Classification metrics can't handle a mix of multilabel-indicator and binary targets.
Please give me a help.
'

@moon412 Hi, moon. I am also working on a multi-label classification grid search problem. And the predicted y from model.predict() is internally converted to 1D array. If you have solved this problem, please give me a reply. Thank you very much.

@moon412 Did @danbricedatascience 's solution solve your problem? It looks like @danbricedatascience 's fix only works for multi-class problems. I believe I have found the issue in the code for multi-label problems. From keras.wrappers.scikit_learn in the KerasClassifier.predict method:

    def predict(self, x, **kwargs):
        """Returns the class predictions for the given test data.

        # Arguments
            x: array-like, shape `(n_samples, n_features)`
                Test samples where `n_samples` is the number of samples
                and `n_features` is the number of features.
            **kwargs: dictionary arguments
                Legal arguments are the arguments
                of `Sequential.predict_classes`.

        # Returns
            preds: array-like, shape `(n_samples,)`
                Class predictions.
        """
        kwargs = self.filter_sk_params(Sequential.predict_classes, kwargs)

        proba = self.model.predict(x, **kwargs)
        if proba.shape[-1] > 1:
            classes = proba.argmax(axis=-1)
        else:
            classes = (proba > 0.5).astype('int32')
        return self.classes_[classes]

It looks like this method does not perform a check for if the loss is "categorical_crossentropy", as they do in BaseWrapper.fit and KerasClassifier.score when converting the multi-class problem to a 1-D array to satisfy Scikit-Learn's format. This is what breaks the classification metric scoring, since the true Y labels are of shape (n_samples, n_classes), while the transformed predictions are in the shape (n_samples,). The solution that works for me is changing the above code to:

    def predict(self, x, **kwargs):
        """Returns the class predictions for the given test data.

        # Arguments
            x: array-like, shape `(n_samples, n_features)`
                Test samples where `n_samples` is the number of samples
                and `n_features` is the number of features.
            **kwargs: dictionary arguments
                Legal arguments are the arguments
                of `Sequential.predict_classes`.

        # Returns
            preds: array-like, shape `(n_samples,)`
                Class predictions.
        """
        kwargs = self.filter_sk_params(Sequential.predict_classes, kwargs)

        proba = self.model.predict(x, **kwargs)

        loss_name = self.model.loss
        if hasattr(loss_name, '__name__'):
            loss_name = loss_name.__name__

        if proba.shape[-1] > 1 and loss_name == 'categorical_crossentropy':
            classes = proba.argmax(axis=-1)
        else:
            classes = (proba > 0.5).astype('int32')
        return self.classes_[classes]

Hi @rsmith49 , thanks for your sharing. I revised the code according to your comment but still raise the same error 'ValueError: Classification metrics can't handle a mix of multilabel-indicator and binary targets'.
How to compile the revised 'scikit_learn.py' to 'scikit_learn.pyc'? In my setup, I installed keras using the command 'conda install keras'. Would you please give me some help?

@JunTomyang Unfortunately, I only needed to run a quick local experiment and did not need to compile to .pyc files. I'm not sure how to go about that, but good luck!

This solved my problem a well! One question though:
I am interested in f1 scores in the CV. Is it okay if I provide metrics=['accuracy'] in the model.compile() function while calling GridSearchCV() with scoring=['f1'] ?

Try removing scoring parameter in which case the estimator's default score would be used.

The suggestion from @kaushal1989 would work

I don't use the iris data, that's why the above solution didn't make sense anything despite strongly right. So when I tried to run this example in my data without one-hot code, I got ValueError: Error when checking target: expected dense_72 to have shape (5,) but got array with shape (1,) error.
So here is the solution that I found for my own independent dataset:

.................
x = df_train[["X","Y","Z"]].values
y= df_train["target"].values

X_train, X_test, Y_train, Y_test = train_test_split(x, y, train_size=0.8, random_state=1000)
.............
Was this page helpful?
0 / 5 - 0 ratings

Related issues

vinayakumarr picture vinayakumarr  路  3Comments

anjishnu picture anjishnu  路  3Comments

LuCeHe picture LuCeHe  路  3Comments

farizrahman4u picture farizrahman4u  路  3Comments

KeironO picture KeironO  路  3Comments