Tpot: classification fails on multi-class data

Created on 5 Feb 2018  路  2Comments  路  Source: EpistasisLab/tpot

I cannot get TPot (v. 0.9.2, Python 2.7) working on multiclass data. (If TPot is only for binary-class data, it should be mentioned in the manual.)

Context of the issue

An example provided below. It runs until 9% and then drops dead with the below mentioned error.
However, with n_classes=2 in the data, it is running well.

Process to reproduce the issue

from sklearn.metrics import f1_score, make_scorer
from sklearn.datasets import make_classification
from tpot import TPOTClassifier

scorer = make_scorer(f1_score)
X, y = make_classification(n_samples=200, n_features=100,
                           n_informative=20, n_redundant=10,
                           n_classes=3, random_state=42)
tpot = TPOTClassifier(generations=10, population_size=20, verbosity=20, scoring=scorer)
tpot.fit(X, y)

Expected result

trained classifier

Current result

RuntimeError: There was an error in the TPOT optimization process. 
This could be because the data was not formatted properly, or because
data for a regression problem was provided to the TPOTClassifier 
object. Please make sure you passed the data to TPOT correctly.
question

Most helpful comment

The issue is about using f1_score scorer in multi-class data, please check this link. You need specify the right f1 score function name for multi-class data except using f1. For example:

from sklearn.datasets import make_classification
from tpot import TPOTClassifier

X, y = make_classification(n_samples=200, n_features=100,
                           n_informative=20, n_redundant=10,
                           n_classes=3, random_state=42)
tpot = TPOTClassifier(generations=10, population_size=20, verbosity=20, scoring='f1_macro')
tpot.fit(X, y)

All 2 comments

The issue is about using f1_score scorer in multi-class data, please check this link. You need specify the right f1 score function name for multi-class data except using f1. For example:

from sklearn.datasets import make_classification
from tpot import TPOTClassifier

X, y = make_classification(n_samples=200, n_features=100,
                           n_informative=20, n_redundant=10,
                           n_classes=3, random_state=42)
tpot = TPOTClassifier(generations=10, population_size=20, verbosity=20, scoring='f1_macro')
tpot.fit(X, y)

Thank you! Indeed, that was the problem.

Was this page helpful?
0 / 5 - 0 ratings

Related issues

qtisan picture qtisan  路  3Comments

skrish13 picture skrish13  路  3Comments

pan-alex picture pan-alex  路  3Comments

jonathanng picture jonathanng  路  3Comments

fferroni picture fferroni  路  4Comments