Tpot: OPTIMIZATION PROCESS STOPS : stopit.utils.TimeoutException

Created on 24 Jun 2020  路  5Comments  路  Source: EpistasisLab/tpot

Hi! We are trying to run TPOT with 100 generations, but the optimization process stops and we get the following error message:

File "/home/anaconda3/lib/python3.7/_weakrefset.py", line 38, in _remove
def _remove(item, selfref=ref(self)):
stopit.utils.TimeoutException

Does anyone knows how we can solve this problem? Thanks in advance.

question

Most helpful comment

Hmm, that is strange. Is this error reproducible with a small benchmark, like Iris dataset? If so, please let us know the versions of TPOT and its dependencies as well as the config_file.

All 5 comments

Hmm, I did not seem this error message before. Could you please provide more details (like versions of all TPOT dependencies and a demo) to reproduce this error?

I think it seems that it was related to max_eval_time_mins parameter, which control the maximum run time of each pipeline evaluation. Maybe the dataset in your case is very large and increasing this value assigned to this parameter may be helpful.

As far I understand, max_eval_time_mins just makes to skip a specific pipeline if it takes more than x minutes, and it doesn't stop the optimization process a priori. We already set this parameter to 60 minutes and the error continues to occur. I have read somewhere else (https://github.com/glenfant/stopit/issues/16) that this type of error message is related to the communicate() method of Popen, but I still can't resolve it.

I left the code we are running:

def run_TPOT_auto_ML(data_path,target,sep='\t',exclude=[],generations=1,population_size=100,cv=5,fold=0,rseed=42,results_path='./'):

    if results_path != './' and not os.path.exists(results_path): os.makedirs(results_path)

    # Loading dictionary of pipelines to use
    config_file = np.load('TPOT_config_file.npy',allow_pickle=True).item()

    # Loading
    data = pd.read_csv(data_path,sep=sep)
    feats = [c for c in data.columns if c != target and c not in exclude]
    X,y = data[feats].values,data[target].values

    # Split
    idx = [tr for tr,_ in StratifiedKFold(cv,random_state=rseed).split(X,y)][fold]
    with open(os.path.join(results_path,'TPOT_train_idx_{}_{}.json'.format(fold,rseed)), 'w') as outfile:
        json.dump({
            'data_path':data_path,
            'target':target,
            'cv':cv, 'fold':fold,
            'idx':idx.tolist(),
        }, outfile)

    CLF = TPOTClassifier(generations=generations,population_size=population_size,config_dict=config_file,verbosity=2,n_jobs=1,max_eval_time_mins=60)
    CLF.fit(X[idx],y[idx])

    # Export the best pipeline
    CLF.export(output_file_name=os.path.join(results_path,'TPOT_best_pipeline_{}.py'.format(rseed)),data_file_path=data_path)

    RESULTS = pd.DataFrame([{
            'Generation':CLF.evaluated_individuals_[pipe]['generation'],
            'Model':pipe,
            'Internal_cv_score':CLF.evaluated_individuals_[pipe]['internal_cv_score'],
            'Mutation_count':CLF.evaluated_individuals_[pipe]['mutation_count'],
            'Crossover_count':CLF.evaluated_individuals_[pipe]['crossover_count'],
            'Predecessor':CLF.evaluated_individuals_[pipe]['predecessor'],
            'Operator_count':CLF.evaluated_individuals_[pipe]['operator_count']
    } for pipe in CLF.evaluated_individuals_])

    RESULTS.to_csv(os.path.join(results_path,'auto_ML_results_{}.csv'.format(rseed)),sep=sep,index=False)

Hmm, that is strange. Is this error reproducible with a small benchmark, like Iris dataset? If so, please let us know the versions of TPOT and its dependencies as well as the config_file.

I second that question / issue (tpot version: 0.11.5)

Here is a workaround that may work as a pointer:

from tpot import decorators
decorators.MAX_EVAL_SECS = 100
tpot_obj.fit(train_X, train_y)

Otherwise one could get stopit.utils.TimeoutException with a message

... utils.py:82] Code block execution exceeded 2 seconds timeout. (MAX_EVAL_SECS is 2 by default)

Hmm, @vlaskinvlad how about changing the decorators.MAX_EVAL_SECS to 5 or 10, I think 100 cannot control the time limit for pretest pipeline with a small subset of data (max sample size = 50). How many features in your datasets that causing this issue?

Was this page helpful?
0 / 5 - 0 ratings