Tpot: How can I use custom regressor_config_dict properly?

Created on 31 Mar 2017  路  8Comments  路  Source: EpistasisLab/tpot

I'd like to use my own set of algorithms like:

  • glmnet.ElasticNet
  • mlxtend.regressor.StackerRegressor
  • lightgbm

What is the best way to do it?
I just created dictionary in my code:

regressor_config_dict = {
    'mlxtend.regressor.LinearRegression': {
    },
    'glmnet.ElasticNet': {
    },
}

and pass to TPOTRegressor:

TPOTRegressor(generations=5, 
                         population_size=20, 
                         offspring_size=None,
                         mutation_rate=0.9, 
                         crossover_rate=0.1, 
                         scoring=mape,
                         cv=TimeSeriesSplit(n_splits=3), 
                         n_jobs=-1, 
                         max_time_mins=None, 
                         max_eval_time_mins=5, 
                         random_state=123, 
                         config_dict=regressor_config_dict, 
                         warm_start=False, 
                         verbosity=2, 
                        disable_update_check=False
                        )

What is wrong here, because I get an error UnboundLocalError: local variable 'expr' referenced before assignment every time?

TPOT: 0.7

question

Most helpful comment

You can also use this hacky method to enable certain estimators:

from sklearn.base import RegressorMixin
from glmnet import ElasticNet
from catboost import CatBoostRegressor
from tpot.config.regressor import regressor_config_dict

CatBoostRegressor.__bases__ += (RegressorMixin,)
ElasticNet.__bases__ += (RegressorMixin,)

config_dict = regressor_config_dict
config_dict['catboost.CatBoostRegressor'] = {
    'logging_level': ['Silent'],
    'thread_count': [8]}

config_dict['glmnet.ElasticNet'] = {
    # Elastic net L1 vs L2
    'alpha': [0, 0.25, 0.5, 0.75, 1],
    'n_jobs': [8]
}

All 8 comments

The format of dictionary is right but the operators in the dictionary are not inherited from sklearn.base.RegressorMixin and TPOT cannot build pipeline without the RegressorMixin based operator as root of pipeline.

The codes below can check if the operators are from RegressorMixin

from sklearn.base import RegressorMixin

from glmnet import ElasticNet
from mlxtend.regressor import LinearRegression

print('mlxtend.regressor.LinearRegression', issubclass(LinearRegression, RegressorMixin))
print('glmnet.ElasticNet', issubclass(ElasticNet, RegressorMixin)) 

# xgboost works
#from xgboost import XGBRegressor
#print('xgboostXGBRegressor.', issubclass(ElasticNet, XGBRegressor))
# sklearn.ensemble.ExtraTreesRegressor also works
from sklearn.ensemble import ExtraTreesRegressor
print('sklearn.ensemble.ExtraTreesRegressor', issubclass(ExtraTreesRegressor, RegressorMixin))

So, it's mean I have following scenario as possible:

  1. Build my own adapters per 3rd party algorithms
  2. Pass new classes to TPOT

Thanks

Yes, below is a example in xgboost

Import

Build sklearn-based class

Hi @sashml, were @weixuanfu2016's examples helpful for getting you started with a custom configuration in TPOT?

Hi guys - I know this issue is closed, but is the method still the same to implement additional estimators in 0.9.2? I'm somewhat experienced with out-of-the-box sklearn, but relatively new to the class structures involved and found the xgboost example above a little bit confusing. Is line 351 in the "Build sklearn-based class" uniquely relevant, or is that entire file something that would need to be built for each custom estimator? Forgive my ignorance. For what it's worth, I happen to be interested in adding almost the exact same set of estimators as the original poster in this topic - are there any plans to add additional estimators to the base TPOT configuration, or a simplified "extender" interface?

Also, I should mention that TPOT is simply awesome, and as a first-time participant here, I'd just like to say thanks for building this great library!

yes, the method is still the same to implement additional estimators in 0.9.2.

I think these links is out of date. Please check a related issue #602 and I just add a comment with updated links.

You can also use this hacky method to enable certain estimators:

from sklearn.base import RegressorMixin
from glmnet import ElasticNet
from catboost import CatBoostRegressor
from tpot.config.regressor import regressor_config_dict

CatBoostRegressor.__bases__ += (RegressorMixin,)
ElasticNet.__bases__ += (RegressorMixin,)

config_dict = regressor_config_dict
config_dict['catboost.CatBoostRegressor'] = {
    'logging_level': ['Silent'],
    'thread_count': [8]}

config_dict['glmnet.ElasticNet'] = {
    # Elastic net L1 vs L2
    'alpha': [0, 0.25, 0.5, 0.75, 1],
    'n_jobs': [8]
}

Thanks guys, I will take a look at those and see if I can make it work. Really appreciate the suggestions.

Was this page helpful?
0 / 5 - 0 ratings

Related issues

windowshopr picture windowshopr  路  4Comments

beijingtl picture beijingtl  路  4Comments

Anselmoo picture Anselmoo  路  3Comments

chjq201410695 picture chjq201410695  路  4Comments

fferroni picture fferroni  路  4Comments