Fairseq: Resume training by specifying the model you'd like to resume from using `--restore-file <path to checkpoint>`.

Created on 27 Sep 2019 · 7Comments · Source: pytorch/fairseq

Yes, you can resume training by specifying the model you'd like to resume from using --restore-file <path to checkpoint>.

_Originally posted by @lematt1991 in https://github.com/pytorch/fairseq/issues/1182#issuecomment-535507612_

First Model was trained on architecture LSTM and the second one was also LSTM with restore-file option. Both were being trained on separate data files (same language pair)
Error: Architecture mismatch.

Source

aastha19

Most helpful comment

@echan00
The bin files were created separately and also a common dictionary was created.
And this common dictionary was placed in the bin files for both the data sets replacing their actual dictionary created at the time of pre-processing.
Somehow it worked for me!

aastha19 on 5 Dec 2019

😄1 👍1

All 7 comments

your vocabulary size probably changed, see the "size mismatch"

huihuifan on 27 Sep 2019

Yes, the vocab size is different because they are 2 different data sets and not the same.

aastha19 on 27 Sep 2019

It's not going to be possible to restore from a checkpoint where the vocabulary size is different... the input/output embedding matrices are going to be the wrong size. This is not a bug with the code. You need to decide how you want to handle this, most likely you want to re-process your second dataset with the same dictionary as the first.

huihuifan on 27 Sep 2019

👍1

Hi @huihuifan
Do you suggest modifying the preprocess code right here to load a different dictionary (in this case, the dictionary from the first dataset)?

@aastha19 what did you end up doing?