Espnet: RuntimeError: Error(s) in loading state_dict for Transformer: size mismatch for encoder.embed.0.weight: copying a param with shape torch.Size([43, 384]) from checkpoint, the shape in current model is torch.Size([37, 384]).

Created on 18 Mar 2020 · 4Comments · Source: espnet/espnet

Hi,
I tried transfer learning with my own dataset but it didn't work.I used a pretrained model which I trained with jsut recipe.

Firstly i ran ./run.sh --backend pytorch --ngpu 1 --stage 0 --stop_stage 2 and ran tts_train.py --backend pytorch --ngpu 1 --minibatches 0 --outdir exp/phn_train_no_dev_pytorch_train_pytorch_transformer/results \ --tensorboard-dir tensorboard/phn_train_no_dev_pytorch_train_pytorch_transformer --verbose 1 --seed 1 \ --train-json dump/phn_train_no_dev/data.json --valid-json dump/phn_dev/data.json --config conf/train_pytorch_transformer.yaml --pretrained-model /content/espnet/snapshot.ep.940
(I run this instead of recipe because I want to see the log on colab)

and I've got the error below

2020-03-18 03:05:23,670 (tts_train:183) INFO: ngpu: 1
2020-03-18 03:05:23,670 (tts_train:186) INFO: random seed = 1
2020-03-18 03:05:24,215 (deterministic_utils:24) INFO: torch type check is disabled
2020-03-18 03:05:24,294 (tts:279) INFO: #input dims : 37
2020-03-18 03:05:24,294 (tts:280) INFO: #output dims: 80
2020-03-18 03:05:24,295 (tts:297) INFO: writing a model config file toexp/phn_train_no_dev_pytorch_train_pytorch_transformer/results/model.json
2020-03-18 03:05:24,295 (tts:301) INFO: ARGS: accum_grad: 1
2020-03-18 03:05:24,295 (tts:301) INFO: ARGS: adim: 384
2020-03-18 03:05:24,295 (tts:301) INFO: ARGS: aheads: 4
2020-03-18 03:05:24,295 (tts:301) INFO: ARGS: backend: pytorch
2020-03-18 03:05:24,295 (tts:301) INFO: ARGS: batch_bins: 4554000
2020-03-18 03:05:24,295 (tts:301) INFO: ARGS: batch_count: auto
2020-03-18 03:05:24,295 (tts:301) INFO: ARGS: batch_frames_in: 0
2020-03-18 03:05:24,295 (tts:301) INFO: ARGS: batch_frames_inout: 0
2020-03-18 03:05:24,295 (tts:301) INFO: ARGS: batch_frames_out: 0
2020-03-18 03:05:24,295 (tts:301) INFO: ARGS: batch_size: 0
2020-03-18 03:05:24,295 (tts:301) INFO: ARGS: batch_sort_key: output
2020-03-18 03:05:24,295 (tts:301) INFO: ARGS: bce_pos_weight: 5.0
2020-03-18 03:05:24,295 (tts:301) INFO: ARGS: config: conf/train_pytorch_transformer.yaml
2020-03-18 03:05:24,295 (tts:301) INFO: ARGS: config2: None
2020-03-18 03:05:24,295 (tts:301) INFO: ARGS: config3: None
2020-03-18 03:05:24,295 (tts:301) INFO: ARGS: debugmode: 1
2020-03-18 03:05:24,295 (tts:301) INFO: ARGS: dec_init: None
2020-03-18 03:05:24,295 (tts:301) INFO: ARGS: dec_init_mods: ['dec.']
2020-03-18 03:05:24,295 (tts:301) INFO: ARGS: decoder_concat_after: False
2020-03-18 03:05:24,295 (tts:301) INFO: ARGS: decoder_normalize_before: False
2020-03-18 03:05:24,295 (tts:301) INFO: ARGS: dlayers: 3
2020-03-18 03:05:24,295 (tts:301) INFO: ARGS: dprenet_dropout_rate: 0.5
2020-03-18 03:05:24,295 (tts:301) INFO: ARGS: dprenet_layers: 2
2020-03-18 03:05:24,295 (tts:301) INFO: ARGS: dprenet_units: 256
2020-03-18 03:05:24,296 (tts:301) INFO: ARGS: dunits: 1536
2020-03-18 03:05:24,296 (tts:301) INFO: ARGS: early_stop_criterion: validation/main/loss
2020-03-18 03:05:24,296 (tts:301) INFO: ARGS: elayers: 3
2020-03-18 03:05:24,296 (tts:301) INFO: ARGS: embed_dim: 0
2020-03-18 03:05:24,296 (tts:301) INFO: ARGS: enc_init: None
2020-03-18 03:05:24,296 (tts:301) INFO: ARGS: enc_init_mods: ['enc.']
2020-03-18 03:05:24,296 (tts:301) INFO: ARGS: encoder_concat_after: False
2020-03-18 03:05:24,296 (tts:301) INFO: ARGS: encoder_normalize_before: False
2020-03-18 03:05:24,296 (tts:301) INFO: ARGS: epochs: 1000
2020-03-18 03:05:24,296 (tts:301) INFO: ARGS: eprenet_conv_chans: 0
2020-03-18 03:05:24,296 (tts:301) INFO: ARGS: eprenet_conv_filts: 0
2020-03-18 03:05:24,296 (tts:301) INFO: ARGS: eprenet_conv_layers: 0
2020-03-18 03:05:24,296 (tts:301) INFO: ARGS: eprenet_dropout_rate: 0.0
2020-03-18 03:05:24,296 (tts:301) INFO: ARGS: eps: 1e-06
2020-03-18 03:05:24,296 (tts:301) INFO: ARGS: eunits: 1536
2020-03-18 03:05:24,296 (tts:301) INFO: ARGS: eval_interval_epochs: 1
2020-03-18 03:05:24,296 (tts:301) INFO: ARGS: freeze_mods: None
2020-03-18 03:05:24,296 (tts:301) INFO: ARGS: grad_clip: 1.0
2020-03-18 03:05:24,296 (tts:301) INFO: ARGS: guided_attn_loss_lambda: 1.0
2020-03-18 03:05:24,296 (tts:301) INFO: ARGS: guided_attn_loss_sigma: 0.4
2020-03-18 03:05:24,296 (tts:301) INFO: ARGS: initial_decoder_alpha: 1.0
2020-03-18 03:05:24,296 (tts:301) INFO: ARGS: initial_encoder_alpha: 1.0
2020-03-18 03:05:24,296 (tts:301) INFO: ARGS: keep_all_data_on_mem: False
2020-03-18 03:05:24,296 (tts:301) INFO: ARGS: loss_type: L1
2020-03-18 03:05:24,296 (tts:301) INFO: ARGS: lr: 0.001
2020-03-18 03:05:24,296 (tts:301) INFO: ARGS: maxlen_in: 100
2020-03-18 03:05:24,296 (tts:301) INFO: ARGS: maxlen_out: 200
2020-03-18 03:05:24,296 (tts:301) INFO: ARGS: minibatches: 0
2020-03-18 03:05:24,296 (tts:301) INFO: ARGS: model_module: espnet.nets.pytorch_backend.e2e_tts_transformer:Transformer
2020-03-18 03:05:24,296 (tts:301) INFO: ARGS: modules_applied_guided_attn: ['encoder-decoder']
2020-03-18 03:05:24,296 (tts:301) INFO: ARGS: ngpu: 1
2020-03-18 03:05:24,296 (tts:301) INFO: ARGS: num_heads_applied_guided_attn: 2
2020-03-18 03:05:24,296 (tts:301) INFO: ARGS: num_iter_processes: 0
2020-03-18 03:05:24,296 (tts:301) INFO: ARGS: num_layers_applied_guided_attn: 2
2020-03-18 03:05:24,296 (tts:301) INFO: ARGS: num_save_attention: 5
2020-03-18 03:05:24,296 (tts:301) INFO: ARGS: opt: noam
2020-03-18 03:05:24,296 (tts:301) INFO: ARGS: outdir: exp/phn_train_no_dev_pytorch_train_pytorch_transformer/results
2020-03-18 03:05:24,296 (tts:301) INFO: ARGS: patience: 0
2020-03-18 03:05:24,296 (tts:301) INFO: ARGS: positionwise_conv_kernel_size: 1
2020-03-18 03:05:24,296 (tts:301) INFO: ARGS: positionwise_layer_type: linear
2020-03-18 03:05:24,297 (tts:301) INFO: ARGS: postnet_chans: 256
2020-03-18 03:05:24,297 (tts:301) INFO: ARGS: postnet_dropout_rate: 0.5
2020-03-18 03:05:24,297 (tts:301) INFO: ARGS: postnet_filts: 5
2020-03-18 03:05:24,297 (tts:301) INFO: ARGS: postnet_layers: 5
2020-03-18 03:05:24,297 (tts:301) INFO: ARGS: preprocess_conf: None
2020-03-18 03:05:24,297 (tts:301) INFO: ARGS: pretrained_model: /content/espnet/snapshot.ep.940
2020-03-18 03:05:24,297 (tts:301) INFO: ARGS: reduction_factor: 3
2020-03-18 03:05:24,297 (tts:301) INFO: ARGS: report_interval_iters: 100
2020-03-18 03:05:24,297 (tts:301) INFO: ARGS: resume:
2020-03-18 03:05:24,297 (tts:301) INFO: ARGS: save_interval_epochs: 10
2020-03-18 03:05:24,297 (tts:301) INFO: ARGS: seed: 1
2020-03-18 03:05:24,297 (tts:301) INFO: ARGS: sortagrad: 0
2020-03-18 03:05:24,297 (tts:301) INFO: ARGS: spc_dim: None
2020-03-18 03:05:24,297 (tts:301) INFO: ARGS: spk_embed_dim: None
2020-03-18 03:05:24,297 (tts:301) INFO: ARGS: spk_embed_integration_type: add
2020-03-18 03:05:24,297 (tts:301) INFO: ARGS: tensorboard_dir: tensorboard/phn_train_no_dev_pytorch_train_pytorch_transformer
2020-03-18 03:05:24,297 (tts:301) INFO: ARGS: train_json: dump/phn_train_no_dev/data.json
2020-03-18 03:05:24,297 (tts:301) INFO: ARGS: transformer_dec_attn_dropout_rate: 0.1
2020-03-18 03:05:24,297 (tts:301) INFO: ARGS: transformer_dec_dropout_rate: 0.1
2020-03-18 03:05:24,297 (tts:301) INFO: ARGS: transformer_dec_positional_dropout_rate: 0.1
2020-03-18 03:05:24,297 (tts:301) INFO: ARGS: transformer_enc_attn_dropout_rate: 0.1
2020-03-18 03:05:24,297 (tts:301) INFO: ARGS: transformer_enc_dec_attn_dropout_rate: 0.1
2020-03-18 03:05:24,297 (tts:301) INFO: ARGS: transformer_enc_dropout_rate: 0.1
2020-03-18 03:05:24,297 (tts:301) INFO: ARGS: transformer_enc_positional_dropout_rate: 0.1
2020-03-18 03:05:24,297 (tts:301) INFO: ARGS: transformer_init: pytorch
2020-03-18 03:05:24,297 (tts:301) INFO: ARGS: transformer_lr: 1.0
2020-03-18 03:05:24,297 (tts:301) INFO: ARGS: transformer_warmup_steps: 4000
2020-03-18 03:05:24,297 (tts:301) INFO: ARGS: use_batch_norm: True
2020-03-18 03:05:24,297 (tts:301) INFO: ARGS: use_guided_attn_loss: True
2020-03-18 03:05:24,297 (tts:301) INFO: ARGS: use_masking: True
2020-03-18 03:05:24,297 (tts:301) INFO: ARGS: use_scaled_pos_enc: True
2020-03-18 03:05:24,297 (tts:301) INFO: ARGS: use_second_target: False
2020-03-18 03:05:24,297 (tts:301) INFO: ARGS: use_speaker_embedding: False
2020-03-18 03:05:24,297 (tts:301) INFO: ARGS: use_weighted_masking: False
2020-03-18 03:05:24,297 (tts:301) INFO: ARGS: valid_json: dump/phn_dev/data.json
2020-03-18 03:05:24,297 (tts:301) INFO: ARGS: verbose: 1
2020-03-18 03:05:24,297 (tts:301) INFO: ARGS: weight_decay: 0.0
Traceback (most recent call last):
File "/content/espnet/espnet/bin/tts_train.py", line 198, in
main(sys.argv[1:])
File "/content/espnet/espnet/bin/tts_train.py", line 192, in main
train(args)
File "/content/espnet/espnet/tts/pytorch_backend/tts.py", line 308, in train
model = model_class(idim, odim, args)
File "/content/espnet/espnet/nets/pytorch_backend/e2e_tts_transformer.py", line 436, in __init__
self.load_pretrained_model(args.pretrained_model)
File "/content/espnet/espnet/nets/tts_interface.py", line 65, in load_pretrained_model
torch_load(model_path, self)
File "/content/espnet/espnet/asr/asr_utils.py", line 517, in torch_load
model.load_state_dict(model_state_dict)
File "/usr/local/lib/python3.6/dist-packages/torch/nn/modules/module.py", line 830, in load_state_dict
self.__class__.__name__, "\n\t".join(error_msgs)))
RuntimeError: Error(s) in loading state_dict for Transformer:
size mismatch for encoder.embed.0.weight: copying a param with shape torch.Size([43, 384]) from checkpoint, the shape in current model is torch.Size([37, 384]).

so i overwrote


    # reverse input and output dimension
    idim = int(valid_json[utts[0]]['output'][0]['shape'][1])

to
idim = 43

in espnet/tts/pytorch_backend/tts.py

and Error has disappeared but I don't think it's correct way.
How can i transfer learning when idim is different than the pretrained model?

Thanks in advance.

Question

Source

thrfdth

Most helpful comment

You can use the following options to select pretrained parameters to be loaded.
https://github.com/espnet/espnet/blob/a360dbb5f21923bdea867c2c3ed002020163cfeb/espnet/bin/tts_train.py#L125-L135
In your case, you can load all of the parameters except for encoder.0.embed.
Then while encoder.0.embed is initialized with random value, the other paramters are loaded from the pretrained model.

kan-bayashi on 18 Mar 2020

👍2

All 4 comments

In my opinion, as I had encountered same errors at past , it is about difference about the number of language features you used
(e.g) stop token, or character depending on your language)
you can see it in

'data/lang1char/*dev_units.txt'

The error code says that the pretrained model you used uses 43 number of cleaned features, but you used about 37 features.

 `   size mismatch for encoder.embed.0.weight: copying a param with shape torch.Size([43, 384]) from checkpoint, the shape in current model is torch.Size([37, 384]).

so if you want to resume training from pretrained model, you may revise local clean text stage.