Are we able to pickle TPOT objects (e.g., TPOTRegressor())?
I prefer to work with the TPOT object instead of the exported pipeline, since I am accustomed to SKLearn (e.g., using TPOT.predict()). I also do not store my data in csv files, and so it is cumbersome to modify exported pipelines to fit my needs.
As such, I would love to be able to save the object for later use. I do not necessarily need it pickled; I simply need a way to easily save and load the object. I suspect that https://github.com/rhiever/tpot/issues/152 is addressing this, but I am unsure of the timelines associated with that issue. And for me, that type of solution seems like it would be killing a fly with a bazooka.
I was just about to ask this same question:
The "warm start" feature is really awesome, but it would be significantly better if there were also some way to save/serialize your current TPOT object as-is and then be able to load it back into memory later.
When I tried pickling my TPOT object that had already processed a few generations, I received this error:
cannot serialize '_io.TextIOWrapper' object
I'm not sure which portion of the TPOT object is "unpickle-able", but this functionality would be very valuable.
The next thing I might try is:
1) run the fit method until I process a few generations
2) pickle the dictionary that holds the previously attempted pipelines and results (I forget the attribute name)
- (some process that removes the TPOT object from memory to mimic a computer shut-down)
3) instantiate a new TPOT object and assign it the previously pickled dictionary object of results
4) call "fit" method on the newly created TPOT object that now has access to previous results
I might be overlooking a simpler solution, however.
Sorry for the slow response on this issue. First to answer @ktran9891's question. If all you want to do is pickle the best fitted pipeline at the end of the run, instead of pickling the entire TPOT object, you can pickle just the best pipeline stored in the fitted_pipeline_ attribute. That attribute stores the best fitted pipeline from the run, which is what TPOT uses to make the predictions etc when you call predict. See the API docs for more info.
@tjvananne, it does indeed seem to be difficult to pickle (or even dill) the TPOT object because of how we're generating classes on-the-fly for TPOT's internal pipeline representation. We've had a long-standing issue to implement a serialization/checkpointing feature (#79), which I agree would be useful for some use cases, but no one has taken it on.
That said, the dev branch has a new feature that regularly outputs the best pipeline every (configurable) number of generations to a (configurable) directory. That's one step toward better serialization.
@rhiever : For my specific case, your suggestion works perfectly, thank you. I defer the closure (or non-closure) of this issue to you.
Thanks!
OK. Please feel free to re-open the issue (or comment further) if you have any more questions about pickling TPOT.
I'm able to save the sate by pickling futted_pipeline_, but how do I use the saved pickle in sci-kit learn? I would like to use it directly with the exported script without re-training it again.
Yes, you can directly use the fitted model from a pickle file. Please check the example in this link.
Most helpful comment
Sorry for the slow response on this issue. First to answer @ktran9891's question. If all you want to do is pickle the best fitted pipeline at the end of the run, instead of pickling the entire TPOT object, you can pickle just the best pipeline stored in the
fitted_pipeline_attribute. That attribute stores the best fitted pipeline from the run, which is what TPOT uses to make the predictions etc when you callpredict. See the API docs for more info.