Tpot: Question about usage of template

Created on 22 May 2019  路  14Comments  路  Source: EpistasisLab/tpot

Hi,

I've started using TPOT, specifically TPOTClassifier. I've noticed that the default value for template is RandomTree. My questions are:

  1. Does this mean that only Random Forest trees will be explored? So, no feature selection before calling a Random Forest tree will be explored?

  2. From what I can understand, when I use Selector-Transformer-Classifier as an option for template, TPOT will necessarily explore pipelines which have a selector and a transformer step. Is it possible to explore all these 3 steps, but in which a valid option is also no selector or no transformer. In a way, also explore pipelines whose template is 'Selector-Classifier', 'Classifier', 'Transformer-Classifier', and so on...

  3. In the case of having Classifier in the template, is XGBoost also going to be explored as an option?

I hope this is the right place to put this question. Thanks for the attention!

being worked on question

Most helpful comment

RandomTree is default behavior in previous version of TPOT (without template option). In this setting, randomly generate pipelines into both tree-like architectures and linear structures with random operators in config_dict. We will change it to 'None' for avoid this confusion.

All 14 comments

From the examples I see online, it looks like before this template option existed, some feature selection options were explored, as well as no feature selection option. However, when I tried by myself, it looks like the default template doesn't explore a feature selection step in the default template. Or did I miss something?

  1. RandomTree means random tree-based pipeline, which is how previous versions of TPOT did without this template option.
  2. For now, the pipeline structure defined by template is fixed. So it would not skip selector or transformer.
  3. Yes, XGboost will be explored if it was installed in your environment.

From the examples I see online, it looks like before this template option existed, some feature selection options were explored, as well as no feature selection option. However, when I tried by myself, it looks like the default template doesn't explore a feature selection step in the default template. Or did I miss something?

Hmm, default template should explore feature selection operator randomly. Did you check evaluated_individuals_ attribute to find out if TPOT evaluated pipelines with feature selection?

Ok, I didn't know about that evaluated_individuals_, it looks like I was wrong, some feature selection was performed: when I searched for "select" in the output I only found SelectPercentile and SelectFwe (not SelectFromModel) - though I'm not sure whether it's because of the smaller generations/population size I used to run this quicker.

So, Selector/Transform/Classifier/Regressor are things that you can use in a template either completely separately or completely together, but RandomTree is a special case which explores options with and without a Selector (and no Transform)?

RandomTree should explores all the operators in TPOT configuration so some pipelines could include selector, transformer and classifier for TPOTClassifier (regressor for TPOTRegressor).

@weixuan Should we change the template default to None for no template instead of RandomTree to avoid this confusion? I have experienced this confusion before as well!

Ok, sounds good.

Hi,

I am also confused on how to specify an alternative template. Is there a way to have TPOT begin with a template like an SVM? If so, how would one do this? I saw in the documentation (and the above comment) that alternative forms need to be of the form Selector-Transformer-Classifier: would one need to specify an initial template for each step?

Thanks for any assistance!

Hi @collinskatie, I'm not sure if you would like to allow other feature preprocessing steps in your analysis (e.g. PCA), but if you really want to only tune the hyperparameters in your SVM (assuming classification problem) then one way is to specifying template = 'LinearSVC'.

Another way is to customize your config_dict such as

config_dict = {
'sklearn.svm.LinearSVC': {
        'penalty': ["l1", "l2"],
        'loss': ["hinge", "squared_hinge"],
        'dual': [True, False],
        'tol': [1e-5, 1e-4, 1e-3, 1e-2, 1e-1],
        'C': [1e-4, 1e-3, 1e-2, 1e-1, 0.5, 1., 5., 10., 15., 20., 25.]
    }
}

Read more at config.

If you want to limit the classifying model but allow other steps then you can write, say, template = Selector-Transformer-LinearSVC.

Hope that helps!

The template in TPOT can also begin with a SVM, like LinearSVC-Selector-Transformer-Classifier. In this way, the pipelines generated in TPOT should stack predictions from LinearSVC to X in 1st step and then pass to Selector in 2nd step.

Hi @trang1618 and @weixuanfu, thank you so much for your responses - they are very helpful.

I guess, I am bit confused then on what the default template, 'RandomTree', leads to. Does that bias the system towards finding tree-like architectures for the final system? Or is the classifier component selected randomly from the config_dict?

Thanks!

RandomTree is default behavior in previous version of TPOT (without template option). In this setting, randomly generate pipelines into both tree-like architectures and linear structures with random operators in config_dict. We will change it to 'None' for avoid this confusion.

Okay super! Thanks so much.

The issue was fixed in latest version of TPOT 0.10.2 so I closed this issue. Please feel free to reopen if there are any other issues/questions.

Was this page helpful?
0 / 5 - 0 ratings