Yes, you can resume training by specifying the model you'd like to resume from using --restore-file <path to checkpoint>.
_Originally posted by @lematt1991 in https://github.com/pytorch/fairseq/issues/1182#issuecomment-535507612_
First Model was trained on architecture LSTM and the second one was also LSTM with restore-file option. Both were being trained on separate data files (same language pair)
Error: Architecture mismatch.

your vocabulary size probably changed, see the "size mismatch"
Yes, the vocab size is different because they are 2 different data sets and not the same.
It's not going to be possible to restore from a checkpoint where the vocabulary size is different... the input/output embedding matrices are going to be the wrong size. This is not a bug with the code. You need to decide how you want to handle this, most likely you want to re-process your second dataset with the same dictionary as the first.
Hi @huihuifan
Do you suggest modifying the preprocess code right here to load a different dictionary (in this case, the dictionary from the first dataset)?
@aastha19 what did you end up doing?
Hi @echan00
I made a common dictionary for both the datasets and then used it to train the models separately.
Thanks @aastha19 would you mind showing me how you used the same dictionary in two separate trainings?
@echan00
The bin files were created separately and also a common dictionary was created.
And this common dictionary was placed in the bin files for both the data sets replacing their actual dictionary created at the time of pre-processing.
Somehow it worked for me!
Most helpful comment
@echan00
The bin files were created separately and also a common dictionary was created.
And this common dictionary was placed in the bin files for both the data sets replacing their actual dictionary created at the time of pre-processing.
Somehow it worked for me!