I ran a short regression test with a small data set. Here is the TPOT input:
tpot_optimizer = TPOTRegressor(generations=5, population_size=20, scoring='neg_median_absolute_error',cv=5, random_state=42, verbosity=2)
Here is the best pipeline output:
Best pipeline: ExtraTreesRegressor(XGBRegressor(LassoLarsCV(PolynomialFeatures(RidgeCV(input_matrix), degree=2, include_bias=False, interaction_only=False), normalize=True), learning_rate=0.1, max_depth=2, min_child_weight=4, n_estimators=100, nthread=1, subsample=0.5), bootstrap=True, max_features=0.45, min_samples_leaf=6, min_samples_split=15, n_estimators=100)
Here is the relevant part of the exported python file:
exported_pipeline = make_pipeline(
StackingEstimator(estimator=RidgeCV()),
PolynomialFeatures(degree=2, include_bias=False, interaction_only=False),
StackingEstimator(estimator=LassoLarsCV(normalize=True)),
StackingEstimator(estimator=XGBRegressor(learning_rate=0.1, max_depth=2, min_child_weight=4, n_estimators=100, nthread=1, subsample=0.5)),
ExtraTreesRegressor(bootstrap=True, max_features=0.45, min_samples_leaf=6, min_samples_split=15, n_estimators=100)
)
Question 1: Is the following interpretation of the order of steps used correct?
Question 2: Is it possible to turn off stacking?
For Question 1. The steps are:
For Question 2.
For now, TPOT does not provide this options. But:
One of my dev branch of TPOT called noCDF_noStacking has a option named
simple_pipeline, which can disable bothStackingEstimatorandCombineDFsifsimple_pipeline=True(e.g.TPOTClassifier(simple_pipeline=True)). But it is noted that this dev branch is not fully tested yet. If you want to try TPOT withoutStackingEstimatorandFeatureUnion, you may install this branch in your test environment via the command below:pip install --upgrade --no-deps --force-reinstall git+https://github.com/weixuanfu/tpot.git@noCDF_noStacking
Please check #152 for more details. We are working on a more advanced pipeline configuration option.
Weixuanfu thank you for your prompt answer.
You may want to add this explanation to the documents. Also, here is something to add to what I am sure is a large "to do" list: use Graphviz to print out a tree structure image of the best pipeline. This would make it easier for the user to understand the data flow in the pipeline.
Most helpful comment
Weixuanfu thank you for your prompt answer.
You may want to add this explanation to the documents. Also, here is something to add to what I am sure is a large "to do" list: use Graphviz to print out a tree structure image of the best pipeline. This would make it easier for the user to understand the data flow in the pipeline.