Tpot: Question about usage of template

Created on 22 May 2019 · 14Comments · Source: EpistasisLab/tpot

Hi,

I've started using TPOT, specifically TPOTClassifier. I've noticed that the default value for template is RandomTree. My questions are:

Does this mean that only Random Forest trees will be explored? So, no feature selection before calling a Random Forest tree will be explored?
From what I can understand, when I use Selector-Transformer-Classifier as an option for template, TPOT will necessarily explore pipelines which have a selector and a transformer step. Is it possible to explore all these 3 steps, but in which a valid option is also no selector or no transformer. In a way, also explore pipelines whose template is 'Selector-Classifier', 'Classifier', 'Transformer-Classifier', and so on...
In the case of having Classifier in the template, is XGBoost also going to be explored as an option?

I hope this is the right place to put this question. Thanks for the attention!

being worked on question

Source

tjiagoM

Most helpful comment

RandomTree is default behavior in previous version of TPOT (without template option). In this setting, randomly generate pipelines into both tree-like architectures and linear structures with random operators in config_dict. We will change it to 'None' for avoid this confusion.

weixuanfu on 19 Jun 2019

👍2

All 14 comments

From the examples I see online, it looks like before this template option existed, some feature selection options were explored, as well as no feature selection option. However, when I tried by myself, it looks like the default template doesn't explore a feature selection step in the default template. Or did I miss something?

tjiagoM on 22 May 2019

RandomTree means random tree-based pipeline, which is how previous versions of TPOT did without this template option.
For now, the pipeline structure defined by template is fixed. So it would not skip selector or transformer.
Yes, XGboost will be explored if it was installed in your environment.

weixuanfu on 22 May 2019

From the examples I see online, it looks like before this template option existed, some feature selection options were explored, as well as no feature selection option. However, when I tried by myself, it looks like the default template doesn't explore a feature selection step in the default template. Or did I miss something?

Hmm, default template should explore feature selection operator randomly. Did you check evaluated_individuals_ attribute to find out if TPOT evaluated pipelines with feature selection?

weixuanfu on 22 May 2019

Ok, I didn't know about that evaluated_individuals_, it looks like I was wrong, some feature selection was performed: when I searched for "select" in the output I only found SelectPercentile and SelectFwe (not SelectFromModel) - though I'm not sure whether it's because of the smaller generations/population size I used to run this quicker.

So, Selector/Transform/Classifier/Regressor are things that you can use in a template either completely separately or completely together, but RandomTree is a special case which explores options with and without a Selector (and no Transform)?

tjiagoM on 23 May 2019

RandomTree should explores all the operators in TPOT configuration so some pipelines could include selector, transformer and classifier for TPOTClassifier (regressor for TPOTRegressor).

weixuanfu on 24 May 2019

@weixuan Should we change the template default to None for no template instead of RandomTree to avoid this confusion? I have experienced this confusion before as well!

trang1618 on 25 May 2019

Ok, sounds good.

weixuanfu on 25 May 2019

Hi,

I am also confused on how to specify an alternative template. Is there a way to have TPOT begin with a template like an SVM? If so, how would one do this? I saw in the documentation (and the above comment) that alternative forms need to be of the form Selector-Transformer-Classifier: would one need to specify an initial template for each step?

Thanks for any assistance!

collinskatie on 19 Jun 2019

Hi @collinskatie, I'm not sure if you would like to allow other feature preprocessing steps in your analysis (e.g. PCA), but if you really want to only tune the hyperparameters in your SVM (assuming classification problem) then one way is to specifying template = 'LinearSVC'.

Another way is to customize your config_dict such as

config_dict = {
'sklearn.svm.LinearSVC': {
        'penalty': ["l1", "l2"],
        'loss': ["hinge", "squared_hinge"],
        'dual': [True, False],
        'tol': [1e-5, 1e-4, 1e-3, 1e-2, 1e-1],
        'C': [1e-4, 1e-3, 1e-2, 1e-1, 0.5, 1., 5., 10., 15., 20., 25.]
    }
}

Related issues

[deprecation] the FutureWarning by scikit-learn

qtisan · 3Comments

Custom Validation Set for tpot.fit

skrish13 · 3Comments

TPOTClassifier.set_params doesn't follow scikit-learn estimator API

TomAugspurger · 4Comments

ConvergenceWarning: Liblinear failed to converge, increase the number of iterations.

CBrauer · 5Comments

List of All (Supervised) Regressors and Classifiers

TaherHabib · 4Comments