Warm start does not appear to work as expected
Hi there,
What I was hoping to do was generate a testing curve of TPOTs performance for comparison in a paper, by checking the performance on test set every X minutes.
_Side Note: I'm aware this shouldn't be used for model selection or training, its purely for comparison sake._
To do this I was trying to use the warm start parameter. As a minimal example, I was aiming to do something like
testing_frequency = 5
total_test_runs = 30
tpot = TPOTClassifier(
max_time_mins=testing_frequency,
warm_start=True,
)
scores = []
for _ in range(total_test_runs):
tpot.fit(train_x, train_y)
score = tpot.score(test_x, test_y)
scores.append(score)
plot(scores)
To then be able to plot scores against comparison methods.
However, it seems that if TPOT times out (KeyboardInterrupt thrown), then self._pop will not be updated and that the subsequent calls to fit() will have the population reinitialised randomly.
Upon subsequent calls to fit(), self._pop to begin as the last population from eaMuPlusLambda in the previous call to fit. This currently only occurs if the call to eaMuPlusLambda finishes "gracefully" (i.e. the max generations hit): https://github.com/EpistasisLab/tpot/blob/master/tpot/base.py#L756
self._pop is none upon subsequent calls to fit() if using max_time_mins rather than generations, so tpot starts the population again from scratch (well I guess the cache and Pareto front etc are maintained, but the population itself lost).
The result wont get worse than where it was before resuming (since the Pareto front is maintained), but its definetly not what I was expecting with a warm start and losing the population seems detrimental to evolution. Shouldn't the population persist from where it was left?
One way is in _evaluate_individuals, whenever self._pareto_front.update(population) is called, self._pop could also be set if warm_start is true. However, currently _evaluate_individuals is only ever given the offspring so this wouldn't work directly. _evaluate_individuals would need to be passed both the previous population + offspring to do this correctly which isn't ideal.
Alternatively, self._pop could be modified directly in eaMuPlusLambda. As a result self._pop would be set even if warm_start=False, so some of the checks for if self._pop would need to be updated to check the warm start parameter as well.
I fixed this locally by making eaMuPlusLambda modify self._pop directly, because this is what's done in the original deap code anyway. But im not sure if this is the approach you want to take or not.
Its kind of "half" there already, in that an in place update of population is done here, but the first assignment to population prevents this from updating the actual self._pop list that gets passed in. In the original eaMuPlusLambda, this first assignment doesn't exist.
Thank you for reporting this issue.
The first assignment exists on this line in original eaMuPlusLambda.
I think the self._pareto_front should be updated after reaching the max_time_mins limit (see this line)
But the bug is caused by those lines. The population update should be moved under expect... in this case.
I will fix this bug soon.
Hi @weixuanfu,
The first assignment exists on this line in original eaMuPlusLambda.
The difference is that fitnesses is a local variable created in eaMuPlusLambda. But then when population is modified in the original code, they are not creating a new local variable called population, they are updated the original list in eaMuPlusLambda directly (population[:] = ). However, in tpot, the first assignment (population = ) is creating a new local variable, so it never modifies the original population passed in. This does not occur in the original eaMuPlusLambda.
I think the self._pareto_front should be updated after reaching the max_time_mins limit (see this line)
Yup the pareto front and cache work as expected
But the bug is caused by those lines. The population update should be moved under expect... in this case.
This is what I initially thought too, but _evaluate_individuals is not always given then population, in the general case its only given the offspring (despite the variable name). So you're storing the population as the offspring, and not the result of select(population + offspring), meaning the results will not be quite as intended in the fix
To verify this, you can check the calls to toolbox.evaluate. It works in the first case here because the population is given, but in the general case here only the offspring is given. So the population will be saved as the offspring, not the result of select(population + offspring, mu).
My suggestion would be, in eaMuPlusLambda change
population = toolbox.evaluate(population)
To
population[:] = toolbox.evaluate(population)
So you update the actual population passed in.
Then in base.py change
if self._pop:
pop = self._pop
else:
pop = self._toolbox.population(n=self.population_size)
to
if not self.warm_start or not self._pop:
self._pop = self._toolbox.population(n=self.population_size)
And update the call to eaMuPlusLambda
self._pop, _ = eaMuPlusLambda(
population= self._pop,
....... # Rest stays the same
)
Thank you for debugging. I added some changes in PR #949 (merged to dev) and #952 (I will merged to dev soon). I think self._pop should be reset to [] if warm_state=False since this object may take too much memory.
@weixuanfu thanks! Seems to work as intended now.
Only some slight points
Line 704 should be self._pop, _ ... (or else just not saving the return values of eaMuPlusLambda). Functionality wont change but the local var pop is just not used now
The change previously added in _evaluate_individuals to update self._pop should now be removed, since that can set self._pop to offspring (at least temporarily until its updated from eaMuPlusLambda)
I agree with the reset in the case warm_start = False
Feel free to close the issue
OK changes was pushed to dev branch. We will release a version with those fixes soon.
The issue was fixed in new version of TPOT (v0.11.0). Please feel free to reopen it if you have any questions or suggestions.
Thanks for the fix @weixuanfu. Another somewhat related follow up problem. fit_init is called at every call to fit, so with warmstart its getting run several times. There is a check for this in fit_init if not self.warm_start or not hasattr(self, '_pareto_front'): which only resets some values if its not a warm start run.
However, some values still get reset. For example, self.evaluated_individuals_ = {}, which doesn't seem like it should happen. Even for a warm start, this cache should still be kept to avoid recomputing/regenerating duplicates, right?
Does fit_init even need to be called more than once if its a warmstart?
@ben-ix I think it is good idea to keep evaluated_individuals_ when warm_start=True to avoid recomputing/regenerating duplicates. Maybe fit() function just don't need call fit_init() if warm_start=True and hasattr(self, '_pareto_front'). So I reopen this issue for fixing it in the future version of TPOT.
Another related problem. In _evaluate_individuals, the check for invalid fitnesses:
individuals = [ind for ind in population if not ind.fitness.valid]
The fitness.valid check just checks if theres a fitness assigned. However, tpot seems to use (5000, -inf) as an "invalid" fitness if an individual timeout, but it actually assigns this to an individual. So this check should also capture the individuals which have a fitness of (5000, -inf) right? Something like
[ind for ind in population if not ind.fitness.valid or ind.fitness = (5000, -inf)]
@ben-ix In _evaluate_individuals, the fitness.valid check is to get the list of new-generated individuals via crossover and mutation where the fitness.valid should be deleted. TPOT assign (5000, -inf) fitness valid to invalid pipelines that cannot process normally or are too time-consuming to finish in time budget of max_eval_time_mins.
Hi @weixuanfu,
TPOT assign (5000, -inf) fitness valid to invalid pipelines that cannot process normally or are too time-consuming to finish in time budget of max_eval_time_mins.
Those are two cases, but theres also a third. When the max_time_mins (not max_eval_time_mins) is hit, so we do not evaluate all in the population. In this case, they are assigned a bad fitness as well.
Say we have 1,000 individuals in the population, and we hit max_time_mins after evaluating 300 of these. With warm start, on the next call to fit, I would expect these remaining 700 to be evaluated before continuing on withe evolution. Instead, on the first call they were assigned the invalid fitness so we never get to evaluate these and may lose good solutions, since we continue effectively using only the 300 from that generation.
I guess the easiest way to check this is to have a large population size, and a small max time mins.
If you check the size of individuals in evaluate the number will be len(population) in subsequent calls to fit, but really it should be len(population) - num_eval_ind where num_eval_ind is the number completed in the last generation.
Perhaps a way is needed to distinguish between the cases you mentioned (the expected behaviour) and this edge case of having valid pipelines that just didn't get reached in time before the timeout
@ben-ix You are right. In the case of max_time_mins and warm_start=True, TPOT should evaluate those remaining pipelines. Thank you for catching this bug. The possible fix that you purposed should work. Could you please submit a PR for this issue?
This issue should be fixed in TPOT v0.11.1. Please feel free to reopen this issue if there is a unsolved issue related to this one.