I am using tpot for regression. My code is:
tpot = TPOTRegressor(scoring="r2", generations=2, population_size=50, verbosity=2, n_jobs=-1)
tpot.fit(trainX, trainY)
print("Score is", tpot.score(testX, testY))
However I get:
Score is 122.641597476
The maximum should be 1.0 according to http://scikit-learn.org/stable/modules/generated/sklearn.metrics.r2_score.html .
A score that is at most 1.0
Best pipeline: LassoLarsCV(ExtraTreesRegressor(input_matrix, bootstrap=True, max_features=0.65, min_samples_leaf=9, min_samples_split=16, n_estimators=100), normalize=False)
Score is 122.641597476
check if you are not using negative log loss, probably yes and you ar doing a maximization instead of minimization
@rspadim I don't fully understand sorry. The full code is at https://bpaste.net/show/cf0b0f75657f . I think you always want to maximize the r2 score but the max possible should be 1.0. I set the scoring function in the line
tpot = TPOTRegressor(scoring="r2", generations=2, population_size=50, verbosity=2, n_jobs=-1)
The mathematical formula for r2 is at http://scikit-learn.org/stable/modules/model_evaluation.html#r2-score .
I think this issue is that the r2 score in scikit-learn can be negative and even less than -1 for very bad predictions. TPOT internally should maximize the r2 score but the stdout is the absolute score (I think it is mentioned in one issue before).
@weixuanfu So is 122.641597476 the absolute value of the r2 score? If so, does it make sense to output this as it doesn't seem informative. I mean -1 and 1 are very different r2 scores afaict.
Maybe we need reopen #425 to refine the stdout @rhiever. We had seen a few related questions already, like #612.
Yes, after we've experimented more with regression now, I think we need to remove the abs from the score function and simply allow the user to deal with the negative values as they need to.
OK PR #634 was posted.
I close this issue since the PR is merged to dev branch. Please feel free to re-open the issue (or comment further) if you have any more questions
Most helpful comment
Yes, after we've experimented more with regression now, I think we need to remove the
absfrom thescorefunction and simply allow the user to deal with the negative values as they need to.