Tpot: Question: How to get the value of the fitted pipeline's best internal CV score from training?

Created on 25 Apr 2020  路  5Comments  路  Source: EpistasisLab/tpot

As it trains, I know it displays the current best CV score, but is there a way to pull that value into a variable? What I have set up is a For loop that runs 5 different TPOT sessions, using 5 different combinations of features in the training set, and I would like to create a sort of "ensemble'd" prediction by taking the predictions of each session's best pipeline, and then weighting those predictions based on the training score it received. Thanks!

question

Most helpful comment

I may have a sort of solution that I will work on.

I noticed that someone on this post #703 , created a way to take the current session's evaluated_pipelines_ and sort them by their CV scores. This creates a dataframe that has the string value of the pipeline used. Then, one could use the solution shown in #516 to then convert those string values into actual pipelines to run future .predict functions on. I'm going to take a stab at it and see how I do.

Theoretically, what I'd do is:

  1. Sort all the pipelines in evaluated_pipelines_ from highest to lowest CV scores, like what's shown in 703.
  2. Remove duplicates based on the CV column (because I don't want a bunch of pipelines that are showing the same score for fear that future .predict's would just predict the same values for each pipeline. I'd want a mix of predictions from different models of the top 5 pipelines)
  3. Store the string values of each of the top 5 model architectures, and their CV scores, in a list (or two)
  4. For Loop/Iterate over those lists, converting each model arch string to back to an actual pipeline, like what's shown in 516
  5. Make predictions with each pipeline and weight them according to their CV scores in the list

I am using a binary (1, 0) classification example so the last step will be easy. Not sure how one would do it for multi-class as most of the work I've done to date has been with binary classifications.

If there's interest I will post my progress here; likely having something to show within the week, unless someone beats me to it, or someone spots something that might give me grief in the process. Thanks!

All 5 comments

I may have a sort of solution that I will work on.

I noticed that someone on this post #703 , created a way to take the current session's evaluated_pipelines_ and sort them by their CV scores. This creates a dataframe that has the string value of the pipeline used. Then, one could use the solution shown in #516 to then convert those string values into actual pipelines to run future .predict functions on. I'm going to take a stab at it and see how I do.

Theoretically, what I'd do is:

  1. Sort all the pipelines in evaluated_pipelines_ from highest to lowest CV scores, like what's shown in 703.
  2. Remove duplicates based on the CV column (because I don't want a bunch of pipelines that are showing the same score for fear that future .predict's would just predict the same values for each pipeline. I'd want a mix of predictions from different models of the top 5 pipelines)
  3. Store the string values of each of the top 5 model architectures, and their CV scores, in a list (or two)
  4. For Loop/Iterate over those lists, converting each model arch string to back to an actual pipeline, like what's shown in 516
  5. Make predictions with each pipeline and weight them according to their CV scores in the list

I am using a binary (1, 0) classification example so the last step will be easy. Not sure how one would do it for multi-class as most of the work I've done to date has been with binary classifications.

If there's interest I will post my progress here; likely having something to show within the week, unless someone beats me to it, or someone spots something that might give me grief in the process. Thanks!

You solution looks good. It is noted that the demo in #516 is out of date but I think only one line tpot._set_param_recursive(sklearn_pipeline.steps, 'random_state', 42) need to be commented/deleted because we move it into tpot._toolbox.compile() function.

Alright I'm very close to a solution but need two more things to help complete it.

@weixuanfu What I'm doing is, running the tpot.fit() function, and exporting a dataframe of the the pipeline's CV scores, and string values of those pipelines. I'm then referring to that dataframe only (not the tpot session itself) to convert the pipeline string values to actual sklearn pipelines. I'm getting an error at the :

deap_pipeline = creator.Individual.from_string(pipeline_string, tpot._pset)

line because tpot._pset SHOULD be referencing the TPOT session that was used for the initial fitting, but I have the initial TPOT session renamed to tpot_model instead. My goal here is to be able to run the second half of this script at any time, without having to fit pipelines again first. So theoretically a user could have a pre-saved .csv file of the top 5 model strings and be able to re-run the script from that point. This poses two problems that I need some guidance on:

  1. How to get that tpot._pset to run without prior fitting? (I noticed in #516 that we should run tpot._fit_init() first, but I'm not sure what to do there? You'll see it's in there but commented out with a bunch of ??????'s beside it.)

  2. I'm still wanting to include the set_param_recursive after a pipeline object has been created from a string because, normally in the exported pipeline .py file, there is an attribute set if the user defined a random_state during training, so I'm trying to emulate that with the set_param_recursive, because we are simulating here that we don't have access to the initial TPOT .fit() session where it is defined. Am I correct in doing this? I want the imported pipeline strings to make use of the random_state provided that the user knows it already. If that makes sense?

Thanks! Code is below:

from tpot import TPOTClassifier
from sklearn.datasets import load_breast_cancer
from sklearn.model_selection import train_test_split
import pandas as pd

# Load breast cancer (binary classification) dataset and split
breast_cancer = load_breast_cancer()
X_train, X_test, y_train, y_test = train_test_split(breast_cancer.data, breast_cancer.target,
                                                    train_size=0.75, test_size=0.25, random_state=42)

# Define TPOT classifier
tpot_model = TPOTClassifier(generations=24, population_size=2, verbosity=2, random_state=42)

# Fit/start training
tpot_model.fit(X_train, y_train)
print('Done training/fitting TPOT session.')

# Get TPOT's score on test set (default metric is 'accuracy'; define something else in TPOT classifier if needed)
print('TPOTs score on test set is...')
print(tpot_model.score(X_test, y_test))

# Export the best pipeline
tpot_model.export('tpot_breast_cancer_pipeline.py')


# An attempt at making an ensemble prediction using the top 5 unique CV scored pipelines

# Create sorted by CV (highest to lowest) dataframe 
# (taken from: https://github.com/EpistasisLab/tpot/issues/703)
my_dict = list(tpot_model.evaluated_individuals_.items())
# Create an empty dataframe to append the model strings, model info strings and CV score strings to
model_scores = pd.DataFrame()
for model in my_dict:
    model_name = model[0]
    #model_info = model[1] # You could take this out if the values of the pipeline aren't important to you
    cv_score = model[1].get('internal_cv_score')  # Pull out cv_score as a column (i.e., sortable)
    model_scores = model_scores.append({'model': model_name,
                                        'cv_score': cv_score,}, # See above comment. Took out for now
                                        #'model_info': model_info,},
                                       ignore_index=True)
# Sort by best CV score to worst (top to bottom)
model_scores = model_scores.sort_values('cv_score', ascending=False)
print('Model Scores dataframe is...')
print(model_scores)

# Remove duplicate CV score rows and keep top X pipelines (to get best, 'unique' pipelines)
model_scores = model_scores.drop_duplicates(subset ="cv_score", keep = False)
model_scores = model_scores.head(5)
# Export to .csv for inspection if desired
model_scores.to_csv('./top_models.csv', index=False)

# Get the sum of the top 5 CV scores for weighting the 1's later
sum_of_cv_scores = model_scores['cv_score'].sum()

# Generate pipeline objects from model strings in above dataframe 
# (taken from: https://github.com/EpistasisLab/tpot/issues/516)
import numpy as np
import tpot
from deap import creator
from sklearn.model_selection import cross_val_score
from tpot.export_utils import generate_pipeline_code, expr_to_tree
from sklearn.metrics import accuracy_score
from tpot.export_utils import set_param_recursive

# Before we start, create a list full of 0's to append to, and apply addition to, each pipeline's weighted predictions
total_weighted_predictions_list = [0] * len(y_test) # Must match y_test length

# A quick note on the weighted average of the predicted class (i.e. the 1's in the y/labels)
# What I'm doing here is calculating a multiplier that I can apply to the predicted 1's in the following predictions.
# This multiplier is based on each pipeline's CV score from the first training run. The multiplier is:
# current pipeline's CV score from training / sum of all CV scores
# So basically, the better the CV score it produced during training, the higher the multiplier will be for that pipeline.

# Then, every time a pipeline predicts a 1, this (as an example) 0.25 multiplier will be applied to that 1 (resulting
# in a prediction of 0.25 for that row). This will then be added to all the other pipeline's predictions, resulting
# in a final list of added together numbers that range between 0 and 1. Anything over 0.5 is considered a 1.

# I do it this way because in this simple example, I don't really care about the 0's as much as I do about the 1's.
# So we need enough pipelines, or rather, enough of the good pipelines, to have predicted a 1 to sway the ensembled
# prediction closer to 1 than to 0.

# As an example, let's say the top 2 pipelines have the multipliers 0.30 and 0.30. The bottom 3 pipelines have the multipliers
# 0.13, 0.13 and 0.13. If the top 2 pipelines predict a 1, and the bottom 3 predict a 0, the resulting ensembled prediction would be
# an added total of 0.6, making it a final prediction of 1. So the weights of each model are taken into account when predicting
# a 1. This creates a basic "weighted average" prediction giving more importance to the better performing models, rather than
# just taking a straight average of all the pipeline's predictions which treats crappy pipeline's predictions as importantly as awesome ones)

# Now that we have a list of the top 5 unique models, iterate over them, make predictions and weight those predictions
for i in range(0, len(model_scores), 1):

    # Get the first pipeline in string value from the dataframe
    pipeline_string = model_scores['model'].iloc[i]
    print('pipeline_string is...')
    print(pipeline_string)

    # Get the prediction weight multiplier, based on its ratio of CV Score to Sum of CV Scores
    prediction_weight_multiplier = float(model_scores['cv_score'].iloc[i]) / float(sum_of_cv_scores)
    print('prediction_weight_multiplier is...')
    print(prediction_weight_multiplier)

    # Convert pipeline string to actual scikit-learn pipeline object
    # tpot._fit_init() # ??????
    deap_pipeline = creator.Individual.from_string(pipeline_string, tpot._pset)
    sklearn_pipeline = tpot._toolbox.compile(expr=deap_pipeline)

    # # print sklearn pipeline string (could comment this section out if you wanted)
    # sklearn_pipeline_str = generate_pipeline_code(expr_to_tree(deap_pipeline, tpot._pset), tpot.operators)
    # print('sklearn_pipeline_str is...')
    # print(sklearn_pipeline_str)

    # Set the same random state that was used during training. This will also help if you run this script without
    # having used the .fit() function first.
    set_param_recursive(sklearn_pipeline.steps, 'random_state', 42)

    # Now : 
    # 1. Re-fit the pipeline to the training data
    # 2. Make a prediction on test data
    # 3. Apply the weight_multiplier to its predictions
    # 4. And combine them to the ensembled_predictions_list, using addition of each element when combining
    sklearn_pipeline.fit(X_train, y_train)
    results = sklearn_pipeline.predict(X_test)
    weighted_results = [i * prediction_weight_multiplier for i in results]
    total_weighted_predictions_list = [x + y for x, y in zip(total_weighted_predictions_list, weighted_results)]

    # Move on to the next pipeline string in the dataframe


# When all top 5 pipelines have made predictions, and we have the weighted average'd predictions,
# create and export a dataframe
ensembled_results_df = pd.DataFrame(y_test, columns=['Actuals'])
ensembled_results_df['Raw_Ensembled_Preds'] = pd.DataFrame(total_weighted_predictions_list)
ensembled_results_df['Converted_Ensembled_Preds'] = np.where(ensembled_results_df['Raw_Ensembled_Preds'] >= 0.5, 1, 0)
ensembled_results_df.to_csv('./ensembled_results.csv', index=False)

print('-----------------------------------------------------------------------')
print('Ensembled Accuracy Score is...')
print(accuracy_score(ensembled_results_df['Converted_Ensembled_Preds'], y_test))
print('-----------------------------------------------------------------------')
print('Compare this with the printed table above (scroll up).')

Hi, you need call tpot._fit_init() to allow new tpot object (like tpot=TPOTClassifier(random_state=42)) to initialize tpot._toolbox and tpot._pset. You can add set_param_recursive into your codes to reset random_state even enough tpot._toolbox.compil function can do that if random_state is set in TPOTClassifier (check these lines). set_param_recursive can be imported from tpot.export_utils.

Thanks @weixuanfu I see what you meant now with the random_state being included in the new TPOT classifier, so I took out the set_param_recursive.

Here is the full code, working on my end. This code gets an ensemble'd, weighted average of all the predicted 1's from the top 5 pipelines, based on their respective CV scores achieved during training, so the better pipelines get more weight in their predictions. This is for a binary classification problem, where the 1's are more important than the 0's, that's why I wrote it the way I did. And like was discussed before, I wrote it in a way such that one could potentially have a .csv file of 5 previously saved classifiers and still be able to reference those without having to have called the .fit() training first. I also left the default scoring metric 'accuracy', but others may want to change that accordingly. Here is an example of the output I got:

-----------------------------------------------------------------------
Ensembled Accuracy Score is...
0.972027972027972
-----------------------------------------------------------------------
Again, the Individual models scored...
    cv_score  ...                                         model_info
8   0.950670  ...  {'generation': 'INVALID', 'mutation_count': 2,...
1   0.950643  ...  {'generation': 0, 'mutation_count': 0, 'crosso...
6   0.945964  ...  {'generation': 'INVALID', 'mutation_count': 3,...
12  0.945910  ...  {'generation': 'INVALID', 'mutation_count': 3,...
0   0.934282  ...  {'generation': 0, 'mutation_count': 0, 'crosso...

Your individual results may vary. Let me know if any issues. Thanks! This was fun!

from tpot import TPOTClassifier
from sklearn.datasets import load_breast_cancer
from sklearn.model_selection import train_test_split
import pandas as pd

# Set a random seed
random_state = 42

# Load breast cancer (binary classification) dataset and split
breast_cancer = load_breast_cancer()
X_train, X_test, y_train, y_test = train_test_split(breast_cancer.data, breast_cancer.target,
                                                    train_size=0.75, test_size=0.25, random_state=random_state)

# Define TPOT classifier
tpot_model = TPOTClassifier(generations=9, population_size=2, verbosity=2, random_state=random_state)

# Fit/start training
tpot_model.fit(X_train, y_train)
print('Done training/fitting TPOT session.')

# Get TPOT's score on test set (default metric is 'accuracy'; define something else in TPOT classifier if needed)
print('TPOTs score on test set is...')
print(tpot_model.score(X_test, y_test))

# Export the best pipeline
tpot_model.export('tpot_breast_cancer_pipeline.py')


# An attempt at making an ensemble prediction using the top 5 unique CV scored pipelines

# Create sorted by CV (highest to lowest) dataframe 
# (taken from: https://github.com/EpistasisLab/tpot/issues/703)
my_dict = list(tpot_model.evaluated_individuals_.items())
# Create an empty dataframe to append the model strings, model info strings and CV score strings to
model_scores = pd.DataFrame()
for model in my_dict:
    model_name = model[0]
    model_info = model[1] # You could take this out if the values of the pipeline aren't important to you
    cv_score = model[1].get('internal_cv_score')  # Pull out cv_score as a column (i.e., sortable)
    model_scores = model_scores.append({'model': model_name,
                                        'cv_score': cv_score, # }, # You could take this out if the values of the pipeline aren't important to you
                                        'model_info': model_info,},
                                       ignore_index=True)
# Sort by best CV score to worst (top to bottom)
model_scores = model_scores.sort_values('cv_score', ascending=False)
print('Model Scores dataframe is...')
print(model_scores)

# Remove duplicate CV score rows and keep top X pipelines (to get best, 'unique' pipelines)
model_scores = model_scores.drop_duplicates(subset ="cv_score", keep = False)
model_scores = model_scores.head(5)
# Export to .csv for inspection if desired
model_scores.to_csv('./top_models.csv', index=False)

# Get the sum of the top 5 CV scores for weighting the 1's later
sum_of_cv_scores = model_scores['cv_score'].sum()

# Generate pipeline objects from model strings in above dataframe 
# (taken from: https://github.com/EpistasisLab/tpot/issues/516)
import numpy as np
import tpot
from deap import creator
from sklearn.model_selection import cross_val_score
from tpot.export_utils import generate_pipeline_code, expr_to_tree
from sklearn.metrics import accuracy_score
from tpot.export_utils import set_param_recursive

# Before we start, create a list full of 0's to append to, and apply addition to, each pipeline's weighted predictions
total_weighted_predictions_list = [0] * len(y_test) # Must match y_test length

# A quick note on the weighted average of the predicted class (i.e. the 1's in the y/labels)
# What I'm doing here is calculating a multiplier that I can apply to the predicted 1's in the following predictions.
# This multiplier is based on each pipeline's CV score from the first training run. The multiplier is:
# current pipeline's CV score from training / sum of all CV scores
# So basically, the better the CV score it produced during training, the higher the multiplier will be for that pipeline.

# Then, every time a pipeline predicts a 1, this (as an example) 0.25 multiplier will be applied to that 1 (resulting
# in a prediction of 0.25 for that row). This will then be added to all the other pipeline's predictions, resulting
# in a final list of added together numbers that range between 0 and 1. Anything over 0.5 is considered a 1.

# I do it this way because in this simple example, I don't really care about the 0's as much as I do about the 1's.
# So we need enough pipelines, or rather, enough of the good pipelines, to have predicted a 1 to sway the ensembled
# prediction closer to 1 than to 0.

# As an example, let's say the top 2 pipelines have the multipliers 0.30 and 0.30. The bottom 3 pipelines have the multipliers
# 0.13, 0.13 and 0.13. If the top 2 pipelines predict a 1, and the bottom 3 predict a 0, the resulting ensembled prediction would be
# an added total of 0.6, making it a final prediction of 1. So the weights of each model are taken into account when predicting
# a 1. This creates a basic "weighted average" prediction giving more importance to the better performing models, rather than
# just taking a straight average of all the pipeline's predictions which treats crappy pipeline's predictions as importantly as awesome ones)

# Now that we have a list of the top 5 unique models, iterate over them, make predictions and weight those predictions
for i in range(0, len(model_scores), 1):

    # Get the first pipeline in string value from the dataframe
    pipeline_string = model_scores['model'].iloc[i]
    print('pipeline_string is...')
    print(pipeline_string)

    # Get the prediction weight multiplier, based on its ratio of CV Score to Sum of CV Scores
    prediction_weight_multiplier = float(model_scores['cv_score'].iloc[i]) / float(sum_of_cv_scores)
    print('prediction_weight_multiplier is...')
    print(prediction_weight_multiplier)

    # Convert pipeline string to actual scikit-learn pipeline object
    tpot = TPOTClassifier(random_state=random_state)
    tpot._fit_init()
    deap_pipeline = creator.Individual.from_string(pipeline_string, tpot._pset)
    sklearn_pipeline = tpot._toolbox.compile(expr=deap_pipeline)

    # # print sklearn pipeline string (could comment this section out if you wanted)
    # sklearn_pipeline_str = generate_pipeline_code(expr_to_tree(deap_pipeline, tpot._pset), tpot.operators)
    # print('sklearn_pipeline_str is...')
    # print(sklearn_pipeline_str)

    # Took this section out in favour of the TPOT line above
    # set_param_recursive(sklearn_pipeline.steps, 'random_state', 42)

    # Now : 
    # 1. Re-fit the pipeline to the training data
    # 2. Make a prediction on test data
    # 3. Apply the weight_multiplier to its predictions
    # 4. And combine them to the ensembled_predictions_list, using addition of each element when combining
    sklearn_pipeline.fit(X_train, y_train)
    results = sklearn_pipeline.predict(X_test)
    weighted_results = [i * prediction_weight_multiplier for i in results]
    total_weighted_predictions_list = [x + y for x, y in zip(total_weighted_predictions_list, weighted_results)]

    # Move on to the next pipeline string in the dataframe


# When all top 5 pipelines have made predictions, and we have the weighted average'd predictions,
# create and export a dataframe
ensembled_results_df = pd.DataFrame(y_test, columns=['Actuals'])
ensembled_results_df['Raw_Ensembled_Preds'] = pd.DataFrame(total_weighted_predictions_list)
ensembled_results_df['Converted_Ensembled_Preds'] = np.where(ensembled_results_df['Raw_Ensembled_Preds'] >= 0.5, 1, 0)
ensembled_results_df.to_csv('./ensembled_results.csv', index=False)

print('-----------------------------------------------------------------------')
print('Ensembled Accuracy Score is...')
print(accuracy_score(ensembled_results_df['Converted_Ensembled_Preds'], y_test))
print('-----------------------------------------------------------------------')
print('Again, the Individual models scored...')
print(model_scores.head(5))
Was this page helpful?
0 / 5 - 0 ratings