Hi,
I'm currently working on a prediction task that involves incremental learning (i.e. having to update the model using new data that is received each day). Since having to retrain the model on a large amount of data is quite expensive, I'm looking to use the refit parameter, described in the Parameters description page as:
[a mechanism] for refitting existing models with new data.
Unfortunately, I can't seem to find any working examples or additional documentation for refit.
Based on my understanding of refit, I've created a conf file refit.conf file containing:
task = refit
input_model = <existing pre-trained model.txt>
data = <current day's data in csv format>
From what I understand, running the above configuration file should give me a more up-to-date model and produce better accuracy scores. However, I seem to be getting worse results when I use refit than when I retrain the model, indicating that the model may not be updating properly.
Is my understanding of refit correct? Can anyone elaborate on how to use refit and how to properly set the parameters for refit, or direct me to an example that uses refit for incremental learning?
refit is using existing tree structure to update the output of leaf, by new data.
This algorithm cannot ensure the better accuracy in stream training. It is just faster than retrain, since you don't need to re-search the tree structure.
Most helpful comment
refit is using existing tree structure to update the output of leaf, by new data.
This algorithm cannot ensure the better accuracy in stream training. It is just faster than retrain, since you don't need to re-search the tree structure.