Fairseq: Question About Pre-trained Models

Created on 23 Aug 2018  路  2Comments  路  Source: pytorch/fairseq

When I tried to use the pre-trained Model(cons2s wmt14en-de), I I have encountered such an error

RuntimeError: Error(s) in loading state_dict for FConvModel: size mismatch for encoder.embed_tokens.weight: copying a param of torch.Size([40472, 768]) from checkpoint, where the shape is torch.Size([40358, 768]) in current model. size mismatch for decoder.embed_tokens.weight: copying a param of torch.Size([42720, 768]) from checkpoint, where the shape is torch.Size([42714, 768]) in current model. size mismatch for decoder.fc3.bias: copying a param of torch.Size([42720]) from checkpoint, where the shape is torch.Size([42714]) in current model. size mismatch for decoder.fc3.weight_g: copying a param of torch.Size([42720, 1]) from checkpoint, where the shape is torch.Size([42714, 1]) in current model. size mismatch for decoder.fc3.weight_v: copying a param of torch.Size([42720, 512]) from checkpoint, where the shape is torch.Size([42714, 512]) in current model.

My generate script is like this
CUDA_VISIBLE_DEVICES=3 python generate.py wmt14_en_de --path model.pt --beam 5 --remove-bpe

Is the model already behind the code version? I confirmed that I didn't made any changes to the source
code.

Most helpful comment

Just wondering: how did you fix this?

All 2 comments

Just wondering: how did you fix this?

Hi, this's still an issue :/
I solved it by manually removing the extra weights and overwriting the checkpoint. This doesn't seem like it should be happening though. Are those supposed to be hidden buffers not to be saved?
Using fairseq 0.9.0.

Was this page helpful?
0 / 5 - 0 ratings