Tpot: How can I use custom regressor_config_dict properly?

Created on 31 Mar 2017 · 8Comments · Source: EpistasisLab/tpot

I'd like to use my own set of algorithms like:

glmnet.ElasticNet
mlxtend.regressor.StackerRegressor
lightgbm

What is the best way to do it?
I just created dictionary in my code:

regressor_config_dict = {
    'mlxtend.regressor.LinearRegression': {
    },
    'glmnet.ElasticNet': {
    },
}

and pass to TPOTRegressor:

TPOTRegressor(generations=5, 
                         population_size=20, 
                         offspring_size=None,
                         mutation_rate=0.9, 
                         crossover_rate=0.1, 
                         scoring=mape,
                         cv=TimeSeriesSplit(n_splits=3), 
                         n_jobs=-1, 
                         max_time_mins=None, 
                         max_eval_time_mins=5, 
                         random_state=123, 
                         config_dict=regressor_config_dict, 
                         warm_start=False, 
                         verbosity=2, 
                        disable_update_check=False
                        )

What is wrong here, because I get an error UnboundLocalError: local variable 'expr' referenced before assignment every time?

TPOT: 0.7

question

Source

sashml

Most helpful comment

You can also use this hacky method to enable certain estimators:

from sklearn.base import RegressorMixin
from glmnet import ElasticNet
from catboost import CatBoostRegressor
from tpot.config.regressor import regressor_config_dict

CatBoostRegressor.__bases__ += (RegressorMixin,)
ElasticNet.__bases__ += (RegressorMixin,)

config_dict = regressor_config_dict
config_dict['catboost.CatBoostRegressor'] = {
    'logging_level': ['Silent'],
    'thread_count': [8]}

config_dict['glmnet.ElasticNet'] = {
    # Elastic net L1 vs L2
    'alpha': [0, 0.25, 0.5, 0.75, 1],
    'n_jobs': [8]
}

joseph-jnl on 24 Jan 2018

👍3

All 8 comments

The format of dictionary is right but the operators in the dictionary are not inherited from sklearn.base.RegressorMixin and TPOT cannot build pipeline without the RegressorMixin based operator as root of pipeline.

The codes below can check if the operators are from RegressorMixin

from sklearn.base import RegressorMixin

from glmnet import ElasticNet
from mlxtend.regressor import LinearRegression

print('mlxtend.regressor.LinearRegression', issubclass(LinearRegression, RegressorMixin))
print('glmnet.ElasticNet', issubclass(ElasticNet, RegressorMixin)) 

# xgboost works
#from xgboost import XGBRegressor
#print('xgboostXGBRegressor.', issubclass(ElasticNet, XGBRegressor))
# sklearn.ensemble.ExtraTreesRegressor also works
from sklearn.ensemble import ExtraTreesRegressor
print('sklearn.ensemble.ExtraTreesRegressor', issubclass(ExtraTreesRegressor, RegressorMixin))

weixuanfu on 31 Mar 2017

So, it's mean I have following scenario as possible:

Build my own adapters per 3rd party algorithms
Pass new classes to TPOT

Thanks

sashml on 31 Mar 2017

Yes, below is a example in xgboost

Import

Build sklearn-based class

weixuanfu on 31 Mar 2017

Hi @sashml, were @weixuanfu2016's examples helpful for getting you started with a custom configuration in TPOT?

rhiever on 7 Apr 2017

Hi guys - I know this issue is closed, but is the method still the same to implement additional estimators in 0.9.2? I'm somewhat experienced with out-of-the-box sklearn, but relatively new to the class structures involved and found the xgboost example above a little bit confusing. Is line 351 in the "Build sklearn-based class" uniquely relevant, or is that entire file something that would need to be built for each custom estimator? Forgive my ignorance. For what it's worth, I happen to be interested in adding almost the exact same set of estimators as the original poster in this topic - are there any plans to add additional estimators to the base TPOT configuration, or a simplified "extender" interface?

Also, I should mention that TPOT is simply awesome, and as a first-time participant here, I'd just like to say thanks for building this great library!

CanML on 24 Jan 2018

yes, the method is still the same to implement additional estimators in 0.9.2.

I think these links is out of date. Please check a related issue #602 and I just add a comment with updated links.

weixuanfu on 24 Jan 2018

You can also use this hacky method to enable certain estimators:

from sklearn.base import RegressorMixin
from glmnet import ElasticNet
from catboost import CatBoostRegressor
from tpot.config.regressor import regressor_config_dict

CatBoostRegressor.__bases__ += (RegressorMixin,)
ElasticNet.__bases__ += (RegressorMixin,)

config_dict = regressor_config_dict
config_dict['catboost.CatBoostRegressor'] = {
    'logging_level': ['Silent'],
    'thread_count': [8]}

config_dict['glmnet.ElasticNet'] = {
    # Elastic net L1 vs L2
    'alpha': [0, 0.25, 0.5, 0.75, 1],
    'n_jobs': [8]
}

joseph-jnl on 24 Jan 2018

👍3

Thanks guys, I will take a look at those and see if I can make it work. Really appreciate the suggestions.

CanML on 25 Jan 2018

Was this page helpful?

0 / 5 - 0 ratings