I cannot get TPot (v. 0.9.2, Python 2.7) working on multiclass data. (If TPot is only for binary-class data, it should be mentioned in the manual.)
An example provided below. It runs until 9% and then drops dead with the below mentioned error.
However, with n_classes=2 in the data, it is running well.
from sklearn.metrics import f1_score, make_scorer
from sklearn.datasets import make_classification
from tpot import TPOTClassifier
scorer = make_scorer(f1_score)
X, y = make_classification(n_samples=200, n_features=100,
n_informative=20, n_redundant=10,
n_classes=3, random_state=42)
tpot = TPOTClassifier(generations=10, population_size=20, verbosity=20, scoring=scorer)
tpot.fit(X, y)
trained classifier
RuntimeError: There was an error in the TPOT optimization process.
This could be because the data was not formatted properly, or because
data for a regression problem was provided to the TPOTClassifier
object. Please make sure you passed the data to TPOT correctly.
The issue is about using f1_score scorer in multi-class data, please check this link. You need specify the right f1 score function name for multi-class data except using f1. For example:
from sklearn.datasets import make_classification
from tpot import TPOTClassifier
X, y = make_classification(n_samples=200, n_features=100,
n_informative=20, n_redundant=10,
n_classes=3, random_state=42)
tpot = TPOTClassifier(generations=10, population_size=20, verbosity=20, scoring='f1_macro')
tpot.fit(X, y)
Thank you! Indeed, that was the problem.
Most helpful comment
The issue is about using
f1_scorescorer in multi-class data, please check this link. You need specify the right f1 score function name for multi-class data except usingf1. For example: