Tpot: What is the formula used for r2 scoring in tpot?

Created on 24 Nov 2017  路  8Comments  路  Source: EpistasisLab/tpot

I am using tpot for regression. My code is:

tpot = TPOTRegressor(scoring="r2", generations=2, population_size=50, verbosity=2, n_jobs=-1)
tpot.fit(trainX, trainY)
print("Score is", tpot.score(testX, testY))

However I get:

Score is 122.641597476

The maximum should be 1.0 according to http://scikit-learn.org/stable/modules/generated/sklearn.metrics.r2_score.html .

Context of the issue

Process to reproduce the issue

Expected result

A score that is at most 1.0

Current result

Best pipeline: LassoLarsCV(ExtraTreesRegressor(input_matrix, bootstrap=True, max_features=0.65, min_samples_leaf=9, min_samples_split=16, n_estimators=100), normalize=False)
Score is 122.641597476

Possible fix

question

Most helpful comment

Yes, after we've experimented more with regression now, I think we need to remove the abs from the score function and simply allow the user to deal with the negative values as they need to.

All 8 comments

check if you are not using negative log loss, probably yes and you ar doing a maximization instead of minimization

@rspadim I don't fully understand sorry. The full code is at https://bpaste.net/show/cf0b0f75657f . I think you always want to maximize the r2 score but the max possible should be 1.0. I set the scoring function in the line

tpot = TPOTRegressor(scoring="r2", generations=2, population_size=50, verbosity=2, n_jobs=-1)

The mathematical formula for r2 is at http://scikit-learn.org/stable/modules/model_evaluation.html#r2-score .

I think this issue is that the r2 score in scikit-learn can be negative and even less than -1 for very bad predictions. TPOT internally should maximize the r2 score but the stdout is the absolute score (I think it is mentioned in one issue before).

@weixuanfu So is 122.641597476 the absolute value of the r2 score? If so, does it make sense to output this as it doesn't seem informative. I mean -1 and 1 are very different r2 scores afaict.

Maybe we need reopen #425 to refine the stdout @rhiever. We had seen a few related questions already, like #612.

Yes, after we've experimented more with regression now, I think we need to remove the abs from the score function and simply allow the user to deal with the negative values as they need to.

OK PR #634 was posted.

I close this issue since the PR is merged to dev branch. Please feel free to re-open the issue (or comment further) if you have any more questions

Was this page helpful?
0 / 5 - 0 ratings