Keras: Error when Keras model is used in grid search/cross_val_score with a custom scoring for a multilabel problem?

Created on 8 Feb 2018 · 12Comments · Source: keras-team/keras

Hi all,

I understand KerasClassifier can be used to wrap a Keras model and input to grid_search/cross_val_score in scikit-learn. But I can't find a way to work this out for a multi-label problem. In this problem, the label for a sample is a binary list [1,0,1,1,0...].

Notice that y has to be in (n_sample, n_label) form for a multi-label problem. For a multi class problem, I am able to convert one-hot encoded y into 1D array of class values. But not for multi-label. When I train a Keras model and compute the f1_micro score for this multi-label problem, I got the following error.

x.shape
(10004, 170)
y.shape
(10004, 15)

from keras.layers import Dense, Activation, Dropout
from keras.optimizers import Adam
from keras.models import Sequential
from keras.wrappers.scikit_learn import KerasClassifier
from sklearn.model_selection import cross_val_score
from sklearn.tree import DecisionTreeClassifier

def create_dnn():
    model = Sequential([Dense(2048, input_shape=(170,)), Activation('relu'), Dropout(0.75),
                        Dense(1024), Activation('relu'), Dropout(0.75),
                        Dense(2048), Activation('relu'),
                        Dense(15), Activation('sigmoid')])

    model.compile(optimizer=Adam(), loss='binary_crossentropy', metrics=['accuracy'])

    return model

dnn_model = KerasClassifier(build_fn=create_dnn, batch_size=256, epochs=1, verbose=1)
cross_val_score(dnn_model, x, y, cv=2, scoring='f1_micro')

Epoch 1/1
5002/5002 [==============================] - 5s - loss: 0.2030 - acc: 0.9277     
4864/5002 [============================>.] - ETA: 0s
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-62-33f9d3b67676> in <module>()
----> 1 cross_val_score(dnn_model, X=x, y=y, cv=2, scoring='f1_micro')

~/anaconda3/lib/python3.6/site-packages/sklearn/model_selection/_validation.py in cross_val_score(estimator, X, y, groups, scoring, cv, n_jobs, verbose, fit_params, pre_dispatch)
    319                                 n_jobs=n_jobs, verbose=verbose,
    320                                 fit_params=fit_params,
--> 321                                 pre_dispatch=pre_dispatch)
    322     return cv_results['test_score']
    323 

~/anaconda3/lib/python3.6/site-packages/sklearn/model_selection/_validation.py in cross_validate(estimator, X, y, groups, scoring, cv, n_jobs, verbose, fit_params, pre_dispatch, return_train_score)
    193             fit_params, return_train_score=return_train_score,
    194             return_times=True)
--> 195         for train, test in cv.split(X, y, groups))
    196 
    197     if return_train_score:

~/anaconda3/lib/python3.6/site-packages/sklearn/externals/joblib/parallel.py in __call__(self, iterable)
    777             # was dispatched. In particular this covers the edge
    778             # case of Parallel used with an exhausted iterator.
--> 779             while self.dispatch_one_batch(iterator):
    780                 self._iterating = True
    781             else:

~/anaconda3/lib/python3.6/site-packages/sklearn/externals/joblib/parallel.py in dispatch_one_batch(self, iterator)
    623                 return False
    624             else:
--> 625                 self._dispatch(tasks)
    626                 return True
    627 

~/anaconda3/lib/python3.6/site-packages/sklearn/externals/joblib/parallel.py in _dispatch(self, batch)
    586         dispatch_timestamp = time.time()
    587         cb = BatchCompletionCallBack(dispatch_timestamp, len(batch), self)
--> 588         job = self._backend.apply_async(batch, callback=cb)
    589         self._jobs.append(job)
    590 

~/anaconda3/lib/python3.6/site-packages/sklearn/externals/joblib/_parallel_backends.py in apply_async(self, func, callback)
    109     def apply_async(self, func, callback=None):
    110         """Schedule a func to be run"""
--> 111         result = ImmediateResult(func)
    112         if callback:
    113             callback(result)

~/anaconda3/lib/python3.6/site-packages/sklearn/externals/joblib/_parallel_backends.py in __init__(self, batch)
    330         # Don't delay the application, to avoid keeping the input
    331         # arguments in memory
--> 332         self.results = batch()
    333 
    334     def get(self):

~/anaconda3/lib/python3.6/site-packages/sklearn/externals/joblib/parallel.py in __call__(self)
    129 
    130     def __call__(self):
--> 131         return [func(*args, **kwargs) for func, args, kwargs in self.items]
    132 
    133     def __len__(self):

~/anaconda3/lib/python3.6/site-packages/sklearn/externals/joblib/parallel.py in <listcomp>(.0)
    129 
    130     def __call__(self):
--> 131         return [func(*args, **kwargs) for func, args, kwargs in self.items]
    132 
    133     def __len__(self):

~/anaconda3/lib/python3.6/site-packages/sklearn/model_selection/_validation.py in _fit_and_score(estimator, X, y, scorer, train, test, verbose, parameters, fit_params, return_train_score, return_parameters, return_n_test_samples, return_times, error_score)
    465         fit_time = time.time() - start_time
    466         # _score will return dict if is_multimetric is True
--> 467         test_scores = _score(estimator, X_test, y_test, scorer, is_multimetric)
    468         score_time = time.time() - start_time - fit_time
    469         if return_train_score:

~/anaconda3/lib/python3.6/site-packages/sklearn/model_selection/_validation.py in _score(estimator, X_test, y_test, scorer, is_multimetric)
    500     """
    501     if is_multimetric:
--> 502         return _multimetric_score(estimator, X_test, y_test, scorer)
    503     else:
    504         if y_test is None:

~/anaconda3/lib/python3.6/site-packages/sklearn/model_selection/_validation.py in _multimetric_score(estimator, X_test, y_test, scorers)
    530             score = scorer(estimator, X_test)
    531         else:
--> 532             score = scorer(estimator, X_test, y_test)
    533 
    534         if hasattr(score, 'item'):

~/anaconda3/lib/python3.6/site-packages/sklearn/metrics/scorer.py in __call__(self, estimator, X, y_true, sample_weight)
    106         else:
    107             return self._sign * self._score_func(y_true, y_pred,
--> 108                                                  **self._kwargs)
    109 
    110 

~/anaconda3/lib/python3.6/site-packages/sklearn/metrics/classification.py in f1_score(y_true, y_pred, labels, pos_label, average, sample_weight)
    712     return fbeta_score(y_true, y_pred, 1, labels=labels,
    713                        pos_label=pos_label, average=average,
--> 714                        sample_weight=sample_weight)
    715 
    716 

~/anaconda3/lib/python3.6/site-packages/sklearn/metrics/classification.py in fbeta_score(y_true, y_pred, beta, labels, pos_label, average, sample_weight)
    826                                                  average=average,
    827                                                  warn_for=('f-score',),
--> 828                                                  sample_weight=sample_weight)
    829     return f
    830 

~/anaconda3/lib/python3.6/site-packages/sklearn/metrics/classification.py in precision_recall_fscore_support(y_true, y_pred, beta, labels, pos_label, average, warn_for, sample_weight)
   1023         raise ValueError("beta should be >0 in the F-beta score")
   1024 
-> 1025     y_type, y_true, y_pred = _check_targets(y_true, y_pred)
   1026     present_labels = unique_labels(y_true, y_pred)
   1027 

~/anaconda3/lib/python3.6/site-packages/sklearn/metrics/classification.py in _check_targets(y_true, y_pred)
     79     if len(y_type) > 1:
     80         raise ValueError("Classification metrics can't handle a mix of {0} "
---> 81                          "and {1} targets".format(type_true, type_pred))
     82 
     83     # We can't have more than one value on y_type => The set is no more needed

ValueError: Classification metrics can't handle a mix of multilabel-indicator and multiclass targets

I understand that this is because y is in the form of (n_sample, n_labels). But the predicted y from dnn_model.predict() is internally converted to 1D array using a predict_classes() function.

dnn_model.model.predict(x).shape

(10004, 15)

In contrast,

dnn_model.predict(x).shape

(10004, )

But interestingly, if I don't use custom scoring, then the error is gone.

cross_val_score(dnn_model, X=x, y=y, cv=2)

Epoch 1/1
5002/5002 [==============================] - 6s - loss: 0.2296 - acc: 0.9110     
5002/5002 [==============================] - 2s     
Epoch 1/1
5002/5002 [==============================] - 6s - loss: 0.2104 - acc: 0.9221     
4864/5002 [============================>.] - ETA: 0s
Out[63]:
array([0.94524857, 0.93333329])



md5-8b8ce84fc5764545785942f0fe721820



cross_val_score(DecisionTreeClassifier(max_depth=3), 
                X=x, y=y, scoring='f1_micro', cv=2)



md5-38adf16bb8448bedaa34dceb9cef9a1f



array([0.63451777, 0.59030837])



md5-50952144b6a40259cbf27af0baf9ee25



dt = DecisionTreeClassifier(max_depth=3)
dt.fit(x, y)
dt.predict(x).shape



md5-38adf16bb8448bedaa34dceb9cef9a1f



(10004, 15)

So I think there should be a way to do grid search/cross_val_score with a custom scoring with KerasCLassifier for a multilabel problem. Please advise. Thanks!

Source

moon412

👍1

Most helpful comment

Hi,

I've found how to resolve it

First this doc (http://scikit-learn.org/stable/modules/multiclass.html#multilabel-classification-format) show that the one-hot representation used in Keras is interpreted as multilabel in scikit-learn.

Then looking at scikit_learn.py implementing KerasClassifier class : https://github.com/keras-team/keras/blob/master/keras/wrappers/scikit_learn.py

The fit function in the BaseWrapper class includes this line of code :

if loss_name == 'categorical_crossentropy' and len(y.shape) != 2:
            y = to_categorical(y)

The Wrapper does the categorical transformation by itself.

It seems that Keras, to avoid this issue, due to the difference in the multiclass representation with scikit-learn, can takes a scikit-learn style multiclass [0,1,2,1] and transform it into categorical representation [[0,0,0],[0,1,0],[0,0,1],[0,1,0]] just for the NN model fit.

So, I simply tried removing the categorical transformation when passing the model to the sklearn functions.

And it works now

import numpy as np
from sklearn import datasets
from sklearn.model_selection import train_test_split, cross_val_score, GridSearchCV
from keras.models import Sequential
from keras.layers import Dense, Activation
from keras.utils import to_categorical
from keras.wrappers.scikit_learn import KerasClassifier

iris = datasets.load_iris()
X= iris.data
#Y = to_categorical(iris.target,3)
Y = iris.target

X_train, X_test, Y_train, Y_test = train_test_split(X, Y, train_size=0.8, random_state=1000)

def create_model(optimizer='rmsprop'):
    model = Sequential()
    model.add(Dense(8,activation='relu',input_shape = (4,)))
    model.add(Dense(3,activation='softmax'))
    model.compile(optimizer = optimizer,
                  loss='categorical_crossentropy',
                  metrics=['accuracy'])
    return model


model = KerasClassifier(build_fn=create_model,
                        epochs=10, 
                        batch_size=5,
                        verbose=0)

#results = cross_val_score(model, X_train, Y_train, scoring='precision_macro')

param_grid = {'optimizer':('rmsprop','adam')}
grid = GridSearchCV(model,
                    param_grid=param_grid,
                    return_train_score=True,
                   scoring=['precision_macro','recall_macro','f1_macro'],
                    refit='precision_macro')
grid_results = grid.fit(X_train,Y_train)

danbricedatascience on 4 Mar 2018

👍8

All 12 comments

I have exactly the same issue.
Using custom scoring with Multiclass outputs from Keras model returns the same error for cross_val_score or GridSearchCV as below (it's on Iris, so you can run it directly):

import numpy as np
from sklearn import datasets
from sklearn.model_selection import train_test_split, cross_val_score, GridSearchCV
from keras.models import Sequential
from keras.layers import Dense
from keras.utils import to_categorical
from keras.wrappers.scikit_learn import KerasClassifier

iris = datasets.load_iris()
X= iris.data
Y = to_categorical(iris.target)

X_train, X_test, Y_train, Y_test = train_test_split(X, Y, train_size=0.8, random_state=1000)

def create_model(optimizer='rmsprop'):
    model = Sequential()
    model.add(Dense(8,activation='relu',input_shape = (4,)))
    model.add(Dense(3,activation='softmax'))
    model.compile(optimizer = optimizer,
                  loss='categorical_crossentropy',
                  metrics=['accuracy'])
    return model


model = KerasClassifier(build_fn=create_model,
                        epochs=10, 
                        batch_size=5,
                        verbose=0)

#results = cross_val_score(model, X_train, Y_train, scoring='precision_macro')

param_grid = {'optimizer':('rmsprop','adam')}
grid = GridSearchCV(model,
                    param_grid=param_grid,
                    return_train_score=True,
                    scoring=['accuracy','precision_macro','recall_macro'],
                    refit='precision_macro')

grid_results = grid.fit(X_train,Y_train)

So I get this error

ValueError: Classification metrics can't handle a mix of multilabel-indicator and binary targets

When I, remove the scoring parameters, it works.

Is there any way to avoid that and enable having a f1, precision or any custom score ?

Thanks for your help

danbricedatascience on 3 Mar 2018

Hi,

I've found how to resolve it

Then looking at scikit_learn.py implementing KerasClassifier class : https://github.com/keras-team/keras/blob/master/keras/wrappers/scikit_learn.py

The fit function in the BaseWrapper class includes this line of code :

if loss_name == 'categorical_crossentropy' and len(y.shape) != 2:
            y = to_categorical(y)

The Wrapper does the categorical transformation by itself.

So, I simply tried removing the categorical transformation when passing the model to the sklearn functions.

And it works now

import numpy as np
from sklearn import datasets
from sklearn.model_selection import train_test_split, cross_val_score, GridSearchCV
from keras.models import Sequential
from keras.layers import Dense, Activation
from keras.utils import to_categorical
from keras.wrappers.scikit_learn import KerasClassifier

iris = datasets.load_iris()
X= iris.data
#Y = to_categorical(iris.target,3)
Y = iris.target

X_train, X_test, Y_train, Y_test = train_test_split(X, Y, train_size=0.8, random_state=1000)

def create_model(optimizer='rmsprop'):
    model = Sequential()
    model.add(Dense(8,activation='relu',input_shape = (4,)))
    model.add(Dense(3,activation='softmax'))
    model.compile(optimizer = optimizer,
                  loss='categorical_crossentropy',
                  metrics=['accuracy'])
    return model


model = KerasClassifier(build_fn=create_model,
                        epochs=10, 
                        batch_size=5,
                        verbose=0)

#results = cross_val_score(model, X_train, Y_train, scoring='precision_macro')

param_grid = {'optimizer':('rmsprop','adam')}
grid = GridSearchCV(model,
                    param_grid=param_grid,
                    return_train_score=True,
                   scoring=['precision_macro','recall_macro','f1_macro'],
                    refit='precision_macro')
grid_results = grid.fit(X_train,Y_train)

danbricedatascience on 4 Mar 2018

👍8

Thanks, Dan. I will try it out in my code!

moon412 on 6 Mar 2018

👍1

@moon412 Hi, moon. Have you solve the problem. I came across the same error with you. 'ValueError: Classification metrics can't handle a mix of multilabel-indicator and binary targets.
Please give me a help.
'

JunTomyang on 16 Mar 2018

@moon412 Hi, moon. I am also working on a multi-label classification grid search problem. And the predicted y from model.predict() is internally converted to 1D array. If you have solved this problem, please give me a reply. Thank you very much.

JunTomyang on 16 Mar 2018

@moon412 Did @danbricedatascience 's solution solve your problem? It looks like @danbricedatascience 's fix only works for multi-class problems. I believe I have found the issue in the code for multi-label problems. From keras.wrappers.scikit_learn in the KerasClassifier.predict method:

    def predict(self, x, **kwargs):
        """Returns the class predictions for the given test data.

        # Arguments
            x: array-like, shape `(n_samples, n_features)`
                Test samples where `n_samples` is the number of samples
                and `n_features` is the number of features.
            **kwargs: dictionary arguments
                Legal arguments are the arguments
                of `Sequential.predict_classes`.

        # Returns
            preds: array-like, shape `(n_samples,)`
                Class predictions.
        """
        kwargs = self.filter_sk_params(Sequential.predict_classes, kwargs)

        proba = self.model.predict(x, **kwargs)
        if proba.shape[-1] > 1:
            classes = proba.argmax(axis=-1)
        else:
            classes = (proba > 0.5).astype('int32')
        return self.classes_[classes]

It looks like this method does not perform a check for if the loss is "categorical_crossentropy", as they do in BaseWrapper.fit and KerasClassifier.score when converting the multi-class problem to a 1-D array to satisfy Scikit-Learn's format. This is what breaks the classification metric scoring, since the true Y labels are of shape (n_samples, n_classes), while the transformed predictions are in the shape (n_samples,). The solution that works for me is changing the above code to:

    def predict(self, x, **kwargs):
        """Returns the class predictions for the given test data.

        # Arguments
            x: array-like, shape `(n_samples, n_features)`
                Test samples where `n_samples` is the number of samples
                and `n_features` is the number of features.
            **kwargs: dictionary arguments
                Legal arguments are the arguments
                of `Sequential.predict_classes`.

        # Returns
            preds: array-like, shape `(n_samples,)`
                Class predictions.
        """
        kwargs = self.filter_sk_params(Sequential.predict_classes, kwargs)

        proba = self.model.predict(x, **kwargs)

        loss_name = self.model.loss
        if hasattr(loss_name, '__name__'):
            loss_name = loss_name.__name__

        if proba.shape[-1] > 1 and loss_name == 'categorical_crossentropy':
            classes = proba.argmax(axis=-1)
        else:
            classes = (proba > 0.5).astype('int32')
        return self.classes_[classes]

rsmith49 on 15 May 2018

Hi @rsmith49 , thanks for your sharing. I revised the code according to your comment but still raise the same error 'ValueError: Classification metrics can't handle a mix of multilabel-indicator and binary targets'.
How to compile the revised 'scikit_learn.py' to 'scikit_learn.pyc'? In my setup, I installed keras using the command 'conda install keras'. Would you please give me some help?

JunTomyang on 16 May 2018

@JunTomyang Unfortunately, I only needed to run a quick local experiment and did not need to compile to .pyc files. I'm not sure how to go about that, but good luck!

rsmith49 on 16 May 2018

This solved my problem a well! One question though:
I am interested in f1 scores in the CV. Is it okay if I provide metrics=['accuracy'] in the model.compile() function while calling GridSearchCV() with scoring=['f1'] ?

GeorgeG92 on 18 Jun 2018

Try removing scoring parameter in which case the estimator's default score would be used.

kaushal1989 on 6 Sep 2018

The suggestion from @kaushal1989 would work

wt-huang on 2 Nov 2018

I don't use the iris data, that's why the above solution didn't make sense anything despite strongly right. So when I tried to run this example in my data without one-hot code, I got ValueError: Error when checking target: expected dense_72 to have shape (5,) but got array with shape (1,) error.
So here is the solution that I found for my own independent dataset:

.................
x = df_train[["X","Y","Z"]].values
y= df_train["target"].values

X_train, X_test, Y_train, Y_test = train_test_split(x, y, train_size=0.8, random_state=1000)
.............