Tpot: Poor regression test performance

Created on 7 Oct 2020 · 7Comments · Source: EpistasisLab/tpot

Howdy,
I get models with 0.97 test r squared, but when I test "accuracy" with the following formula, it's something like 50%.

in_range = (y >= p - avg_mae) & (y <= p + avg_mae)
accuracy = sum(in_range) / length(in_range)

I didn't read this in the docs, but are we meant to shuffle before? Also, what specifically does subsample do? I know it randomly selects subset, but does it ignore the rest?

question

Source

matthew-jurewicz

All 7 comments

I calculate average test mean absolute error with 10 fold cross validation, and the error standard deviation is very low.

matthew-jurewicz on 7 Oct 2020

I am not sure about this question/issue. It seems that you used r2 in TPOT and got a good test r2 score but the custom "accuracy" score is not good. Why not using the custom "accuracy" in TPOT? Check this docs

I didn't read this in the docs, but are we meant to shuffle before? Also, what specifically does subsample do? I know it randomly selects subset, but does it ignore the rest?

Yes, it should ignore the rest. For example, setting subsample=0.5 tells TPOT to use a random subsample of half of the training data. This subsample will remain the same during the entire pipeline optimization process.

weixuanfu on 7 Oct 2020

👍1

Thanks, I will try that, but why would subsampling help? Surely more data is better?

matthew-jurewicz on 7 Oct 2020

Sometimes running more data is too slow.

weixuanfu on 7 Oct 2020

👍1

Actually, my accuracy only depends on minimizing mean absolute error. Both optimizing for r squared and mean absolute error give the same mean absolute error, so I don't know that optimizing a custom scoring function would give better results.

matthew-jurewicz on 7 Oct 2020

I did not understand the equations/definition of this custom "accuracy" scoring function. Also it is strange to use "accuracy" for regression problem. Could you please explain a little more?

weixuanfu on 7 Oct 2020

Sure, in this case, accuracy is just a measure of how often the actual target falls within the predicted range. The predicted range is calculated by subtracting and adding the mean absolute error from the predicted value.

matthew-jurewicz on 7 Oct 2020

Was this page helpful?

0 / 5 - 0 ratings