System (please complete the following information):
Question
The training process is out of patient based on my initial configuration file and the best model is saved. I am not satisfy with the result so I want to continue training the model by changing the "patience" in the configuration file to a larger number and copy the new configuration file into the serialization_dir. I used the command "allennlp train path_to_conf_file.json -s path_to_serialization_dir --recover" to continue training the model. However, the training process stops only after one more epoch and the message showed that it is out of patience. I think the reason is that the "training_state["metric_tracker"]" is saved in the training_state_epoch_XX.th and is not updated by the new configuration file. How to solve this issue in an easy way please?
Yes, it looks like the metric tracker saves the patience as part of it's state, so this approach won't work: https://github.com/allenai/allennlp/blob/7d34ca3b8f723eca603b3a012e9c17da809dc6d2/allennlp/training/metric_tracker.py#L89
You could try using the fine-tune command instead.
it's not ideal, but you could just manually modify the training state:
import torch
state_dict = torch.load('training_state_epoch_2.th')
state_dict['metric_tracker']['patience'] = 20
torch.save(state_dict, 'training_state_epoch_2.th')
@joelgrus, not if you want to just do allennlp train. But yeah, if you're writing your own entry point, you could definitely do something like that.
@matt-gardner @joelgrus Thank you so much for your answers. They are very helpful.
I tried Joel's approach and it works!
Most helpful comment
it's not ideal, but you could just manually modify the training state: