Model: XLM_mlm_ende_1024
Language: English
Using: the official example scripts: run_squad.py
Steps to reproduce the behavior:
run_squad.py with the following args --model_type xlm --model_name_or_path xlm-mlm-ende-1024 --do_train --do_eval --train_file ./squad_data/train-v1.1.json --predict_file ./squad_data/dev-v1.1.json --per_gpu_train_batch_size 12 --learning_rate 3e-5 --num_train_epochs 2.0 --max_seq_length 384 --doc_stride 128 --output_dir ./debug_xlm
Ultimate error:
size mismatch for transformer.embeddings.weight: copying a param with shape torch.Size([64699, 1024]) from checkpoint, the shape in current model is torch.Size([30145, 1024]).
Full inout ouput below.
Expected behavior: finetune xlm for squad.
Relates to previous issues (possibly):
```python ./transformers/examples/run_squad.py --model_type xlm --model_name_or_path xlm-mlm-ende-1024 --do_train --do_eval --train_file ./squad_data/train-v1.1.json --predict_file ./squad_data/dev-v1.1.json --per_gpu_train_batch_size 12 --learning_rate 3e-5 --num_train_epochs 2.0 --max_seq_length 384 --doc_stride 128 --output_dir ./debug_xlm
12/18/2019 15:47:23 - WARNING - __main__ - Process rank: -1, device: cpu, n_gpu: 0, distributed training: False, 16-bits training: False
12/18/2019 15:47:23 - INFO - transformers.configuration_utils - loading configuration file https://s3.amazonaws.com/models.huggingface.co/bert/xlm-mlm-ende-1024-config.json from cache at /HOME/.cache/torch/transformers/8f689e7cdf34bbebea67ad44ad6a142c9c5144e5c19d989839139e0d47d1ed74.0038e5c2b48fc777632fc95c3d3422203693750b1d0845a511b3bb84ad6d8c29
12/18/2019 15:47:23 - INFO - transformers.configuration_utils - Model config {
"asm": false,
"attention_dropout": 0.1,
"bos_index": 0,
"causal": false,
"dropout": 0.1,
"emb_dim": 1024,
"embed_init_std": 0.02209708691207961,
"end_n_top": 5,
"eos_index": 1,
"finetuning_task": null,
"gelu_activation": true,
"id2label": {
"0": "LABEL_0",
"1": "LABEL_1"
},
"id2lang": {
"0": "de",
"1": "en"
},
"init_std": 0.02,
"is_decoder": false,
"is_encoder": true,
"label2id": {
"LABEL_0": 0,
"LABEL_1": 1
},
"lang2id": {
"de": 0,
"en": 1
},
"layer_norm_eps": 1e-12,
"mask_index": 5,
"max_position_embeddings": 512,
"max_vocab": -1,
"min_count": 0,
"n_heads": 8,
"n_langs": 2,
"n_layers": 6,
"num_labels": 2,
"output_attentions": false,
"output_hidden_states": false,
"output_past": true,
"pad_index": 2,
"pruned_heads": {},
"same_enc_dec": true,
"share_inout_emb": true,
"sinusoidal_embeddings": false,
"start_n_top": 5,
"summary_activation": null,
"summary_first_dropout": 0.1,
"summary_proj_to_labels": true,
"summary_type": "first",
"summary_use_proj": true,
"torchscript": false,
"unk_index": 3,
"use_bfloat16": false,
"use_lang_emb": true,
"vocab_size": 30145
}
12/18/2019 15:47:24 - INFO - transformers.tokenization_utils - loading file https://s3.amazonaws.com/models.huggingface.co/bert/xlm-mlm-ende-1024-vocab.json from cache at /HOME/.cache/torch/transformers/6771b710c1daf9d51643260fdf576f6353369c3563bf0fb12176c692778dca3f.2c29a4b393decdd458e6a9744fa1d6b533212e4003a4012731d2bc2261dc35f3
12/18/2019 15:47:24 - INFO - transformers.tokenization_utils - loading file https://s3.amazonaws.com/models.huggingface.co/bert/xlm-mlm-ende-1024-merges.txt from cache at /HOME/.cache/torch/transformers/85d878ffb1bc2c3395b785d10ce7fc91452780316140d7a26201d7a912483e44.42fa32826c068642fdcf24adbf3ef8158b3b81e210a3d03f3102cf5a899f92a0
12/18/2019 15:47:25 - INFO - transformers.modeling_utils - loading weights file https://s3.amazonaws.com/models.huggingface.co/bert/xlm-mlm-ende-1024-pytorch_model.bin from cache at /HOME/.cache/torch/transformers/ea4c0bbee310b490decb2b608a4dbc8ed9f2e4a103dd729ce183770b0fef698b.119d74257b953e5d50d73555a430ced11b1c149a7c17583219935ec1bd37d948
12/18/2019 15:47:28 - INFO - transformers.modeling_utils - Weights of XLMForQuestionAnswering not initialized from pretrained model: ['qa_outputs.start_logits.dense.weight', 'qa_outputs.start_logits.dense.bias', 'qa_outputs.end_logits.dense_0.weight', 'qa_outputs.end_logits.dense_0.bias', 'qa_outputs.end_logits.LayerNorm.weight', 'qa_outputs.end_logits.LayerNorm.bias', 'qa_outputs.end_logits.dense_1.weight', 'qa_outputs.end_logits.dense_1.bias', 'qa_outputs.answer_class.dense_0.weight', 'qa_outputs.answer_class.dense_0.bias', 'qa_outputs.answer_class.dense_1.weight']
12/18/2019 15:47:28 - INFO - transformers.modeling_utils - Weights from pretrained model not used in XLMForQuestionAnswering: ['pred_layer.proj.weight', 'pred_layer.proj.bias']
Traceback (most recent call last):
File "./transformers/examples/run_squad.py", line 614, in
main()
File "./transformers/examples/run_squad.py", line 532, in main
cache_dir=args.cache_dir if args.cache_dir else None)
File "/HOME/sandpit/transformers/transformers/modeling_utils.py", line 486, in from_pretrained
model.__class__.__name__, "\n\t".join(error_msgs)))
RuntimeError: Error(s) in loading state_dict for XLMForQuestionAnswering:
size mismatch for transformer.embeddings.weight: copying a param with shape torch.Size([64699, 1024]) from checkpoint, the shape in current model is torch.Size([30145, 1024]).
```
I've tried different XLM* models with and I've obtained the same error you've. I suspect it's broken something into the implementation of XLM* models or the .bin file uploaded to AWS S3.
N.B: I've tried to load xlm-mlm-17-1280 with the usual procedure (i.e. by using from_pretrained method) which works as expected in #2043 (about 15 days ago), but now it doesn't work (same error). Therefore, there's something broken for sure.
N.B: it's not a download problem itself, I've tried also with force_download=True parameter.
The stack trace is the following:
Python 3.6.9 |Anaconda, Inc.| (default, Jul 30 2019, 19:07:31)
[GCC 7.3.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import transformers
/home/<user>/anaconda3/envs/huggingface/lib/python3.6/site-packages/tensorboard/compat/tensorflow_stub/dtypes.py:541: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
_np_qint8 = np.dtype([("qint8", np.int8, 1)])
/home/<user>/anaconda3/envs/huggingface/lib/python3.6/site-packages/tensorboard/compat/tensorflow_stub/dtypes.py:542: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
_np_quint8 = np.dtype([("quint8", np.uint8, 1)])
/home/<user>/anaconda3/envs/huggingface/lib/python3.6/site-packages/tensorboard/compat/tensorflow_stub/dtypes.py:543: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
_np_qint16 = np.dtype([("qint16", np.int16, 1)])
/home/<user>/anaconda3/envs/huggingface/lib/python3.6/site-packages/tensorboard/compat/tensorflow_stub/dtypes.py:544: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
_np_quint16 = np.dtype([("quint16", np.uint16, 1)])
/home/<user>/anaconda3/envs/huggingface/lib/python3.6/site-packages/tensorboard/compat/tensorflow_stub/dtypes.py:545: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
_np_qint32 = np.dtype([("qint32", np.int32, 1)])
/home/<user>/anaconda3/envs/huggingface/lib/python3.6/site-packages/tensorboard/compat/tensorflow_stub/dtypes.py:550: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
np_resource = np.dtype([("resource", np.ubyte, 1)])
>>> from transformers import XLMTokenizer, XLMWithLMHeadModel
>>> tokenizer = XLMTokenizer.from_pretrained('xlm-mlm-ende-1024')
Downloading: 100%|鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻坾 1.44M/1.44M [00:00<00:00, 2.06MB/s]
Downloading: 100%|鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻坾 1.00M/1.00M [00:00<00:00, 1.71MB/s]
>>> model = XLMWithLMHeadModel.from_pretrained('xlm-mlm-ende-1024')
Downloading: 100%|鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅| 396/396 [00:00<00:00, 177kB/s]
Downloading: 100%|鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻坾 835M/835M [01:13<00:00, 11.3MB/s]
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/home/<user>/Desktop/transformers/transformers/transformers/modeling_utils.py", line 486, in from_pretrained
model.__class__.__name__, "\n\t".join(error_msgs)))
RuntimeError: Error(s) in loading state_dict for XLMWithLMHeadModel:
size mismatch for transformer.embeddings.weight: copying a param with shape torch.Size([64699, 1024]) from checkpoint, the shape in current model is torch.Size([30145, 1024]).
size mismatch for pred_layer.proj.weight: copying a param with shape torch.Size([64699, 1024]) from checkpoint, the shape in current model is torch.Size([30145, 1024]).
size mismatch for pred_layer.proj.bias: copying a param with shape torch.Size([64699]) from checkpoint, the shape in current model is torch.Size([30145]).
>>> from transformers import XLMTokenizer, XLMForQuestionAnswering
>>> model = XLMForQuestionAnswering.from_pretrained('xlm-mlm-ende-1024')
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/home/<user>/Desktop/transformers/transformers/transformers/modeling_utils.py", line 486, in from_pretrained
model.__class__.__name__, "\n\t".join(error_msgs)))
RuntimeError: Error(s) in loading state_dict for XLMForQuestionAnswering:
size mismatch for transformer.embeddings.weight: copying a param with shape torch.Size([64699, 1024]) from checkpoint, the shape in current model is torch.Size([30145, 1024]).
>>> model = XLMForQuestionAnswering.from_pretrained('xlm-clm-ende-1024')
Downloading: 100%|鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅| 396/396 [00:00<00:00, 164kB/s]
Downloading: 100%|鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻坾 835M/835M [01:11<00:00, 11.7MB/s]
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/home/<user>/Desktop/transformers/transformers/transformers/modeling_utils.py", line 486, in from_pretrained
model.__class__.__name__, "\n\t".join(error_msgs)))
RuntimeError: Error(s) in loading state_dict for XLMForQuestionAnswering:
size mismatch for transformer.embeddings.weight: copying a param with shape torch.Size([64699, 1024]) from checkpoint, the shape in current model is torch.Size([30145, 1024]).
Bug
Model: XLM_mlm_ende_1024
Language: English
Using: the official example scripts:
run_squad.pyTo Reproduce
Steps to reproduce the behavior:
- install dependecies and download squad v1.1 data; pull, install transformers from github master.
- run
run_squad.pywith the following args--model_type xlm --model_name_or_path xlm-mlm-ende-1024 --do_train --do_eval --train_file ./squad_data/train-v1.1.json --predict_file ./squad_data/dev-v1.1.json --per_gpu_train_batch_size 12 --learning_rate 3e-5 --num_train_epochs 2.0 --max_seq_length 384 --doc_stride 128 --output_dir ./debug_xlmUltimate error:
size mismatch for transformer.embeddings.weight: copying a param with shape torch.Size([64699, 1024]) from checkpoint, the shape in current model is torch.Size([30145, 1024]).Full inout ouput below.
Expected behavior: finetune xlm for squad.
Environment
- OS: OpenSuse 15.0
- Python version: 3.6
- PyTorch version: torch.version = '1.3.1+cpu'
- PyTorch Transformers version (or branch): (just transformers now?) 2.2.2
- Using GPU ? nope
- Distributed of parallel setup ? Ummm n/a
Relates to previous issues (possibly):
- I've had with XLM
- Similar looking error
Additional context
12/18/2019 15:47:23 - WARNING - __main__ - Process rank: -1, device: cpu, n_gpu: 0, distributed training: False, 16-bits training: False 12/18/2019 15:47:23 - INFO - transformers.configuration_utils - loading configuration file https://s3.amazonaws.com/models.huggingface.co/bert/xlm-mlm-ende-1024-config.json from cache at /HOME/.cache/torch/transformers/8f689e7cdf34bbebea67ad44ad6a142c9c5144e5c19d989839139e0d47d1ed74.0038e5c2b48fc777632fc95c3d3422203693750b1d0845a511b3bb84ad6d8c29 12/18/2019 15:47:23 - INFO - transformers.configuration_utils - Model config { "asm": false, "attention_dropout": 0.1, "bos_index": 0, "causal": false, "dropout": 0.1, "emb_dim": 1024, "embed_init_std": 0.02209708691207961, "end_n_top": 5, "eos_index": 1, "finetuning_task": null, "gelu_activation": true, "id2label": { "0": "LABEL_0", "1": "LABEL_1" }, "id2lang": { "0": "de", "1": "en" }, "init_std": 0.02, "is_decoder": false, "is_encoder": true, "label2id": { "LABEL_0": 0, "LABEL_1": 1 }, "lang2id": { "de": 0, "en": 1 }, "layer_norm_eps": 1e-12, "mask_index": 5, "max_position_embeddings": 512, "max_vocab": -1, "min_count": 0, "n_heads": 8, "n_langs": 2, "n_layers": 6, "num_labels": 2, "output_attentions": false, "output_hidden_states": false, "output_past": true, "pad_index": 2, "pruned_heads": {}, "same_enc_dec": true, "share_inout_emb": true, "sinusoidal_embeddings": false, "start_n_top": 5, "summary_activation": null, "summary_first_dropout": 0.1, "summary_proj_to_labels": true, "summary_type": "first", "summary_use_proj": true, "torchscript": false, "unk_index": 3, "use_bfloat16": false, "use_lang_emb": true, "vocab_size": 30145 } 12/18/2019 15:47:24 - INFO - transformers.tokenization_utils - loading file https://s3.amazonaws.com/models.huggingface.co/bert/xlm-mlm-ende-1024-vocab.json from cache at /HOME/.cache/torch/transformers/6771b710c1daf9d51643260fdf576f6353369c3563bf0fb12176c692778dca3f.2c29a4b393decdd458e6a9744fa1d6b533212e4003a4012731d2bc2261dc35f3 12/18/2019 15:47:24 - INFO - transformers.tokenization_utils - loading file https://s3.amazonaws.com/models.huggingface.co/bert/xlm-mlm-ende-1024-merges.txt from cache at /HOME/.cache/torch/transformers/85d878ffb1bc2c3395b785d10ce7fc91452780316140d7a26201d7a912483e44.42fa32826c068642fdcf24adbf3ef8158b3b81e210a3d03f3102cf5a899f92a0 12/18/2019 15:47:25 - INFO - transformers.modeling_utils - loading weights file https://s3.amazonaws.com/models.huggingface.co/bert/xlm-mlm-ende-1024-pytorch_model.bin from cache at /HOME/.cache/torch/transformers/ea4c0bbee310b490decb2b608a4dbc8ed9f2e4a103dd729ce183770b0fef698b.119d74257b953e5d50d73555a430ced11b1c149a7c17583219935ec1bd37d948 12/18/2019 15:47:28 - INFO - transformers.modeling_utils - Weights of XLMForQuestionAnswering not initialized from pretrained model: ['qa_outputs.start_logits.dense.weight', 'qa_outputs.start_logits.dense.bias', 'qa_outputs.end_logits.dense_0.weight', 'qa_outputs.end_logits.dense_0.bias', 'qa_outputs.end_logits.LayerNorm.weight', 'qa_outputs.end_logits.LayerNorm.bias', 'qa_outputs.end_logits.dense_1.weight', 'qa_outputs.end_logits.dense_1.bias', 'qa_outputs.answer_class.dense_0.weight', 'qa_outputs.answer_class.dense_0.bias', 'qa_outputs.answer_class.dense_1.weight'] 12/18/2019 15:47:28 - INFO - transformers.modeling_utils - Weights from pretrained model not used in XLMForQuestionAnswering: ['pred_layer.proj.weight', 'pred_layer.proj.bias'] Traceback (most recent call last): File "./transformers/examples/run_squad.py", line 614, in <module> main() File "./transformers/examples/run_squad.py", line 532, in main cache_dir=args.cache_dir if args.cache_dir else None) File "/HOME/sandpit/transformers/transformers/modeling_utils.py", line 486, in from_pretrained model.__class__.__name__, "\n\t".join(error_msgs))) RuntimeError: Error(s) in loading state_dict for XLMForQuestionAnswering: size mismatch for transformer.embeddings.weight: copying a param with shape torch.Size([64699, 1024]) from checkpoint, the shape in current model is torch.Size([30145, 1024]).
Indeed, there seems to be an error that was introduced by #2164. I'm looking into it now. Thanks for raising an issue!
Please let me know if 8efc6dd fixes this issue!
I've installed Transformers from source (master branch) with pip install git+https://github.com/huggingface/transformers.git right now, but it seems to be the same bug. Is it possible? The stack trace is the same as before. @LysandreJik
Please let me know if 8efc6dd fixes this issue!
Hmm could you post a short snippet to reproduce? Running your initial script in my environment doesn't raise any error:
from transformers import XLMWithLMHeadModel
XLMWithLMHeadModel.from_pretrained("xlm-mlm-17-1280")
The error seems to be fixed on my side
I'm trying to use XLMForQuestionAnswering model, is it right for run_squad.py correct?
>>> import transformers
>>> from transformers import XLMForQuestionAnswering
>>> model = XLMForQuestionAnswering.from_pretrained('xlm-mlm-ende-1024', force_download=True)
Downloading: 100%|鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅| 396/396 [00:00<00:00, 146kB/s]
Downloading: 100%|鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻坾 835M/835M [01:16<00:00, 10.9MB/s]
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/home/vidiemme/Desktop/transformers/transformers/transformers/modeling_utils.py", line 486, in from_pretrained
model.__class__.__name__, "\n\t".join(error_msgs)))
RuntimeError: Error(s) in loading state_dict for XLMForQuestionAnswering:
size mismatch for transformer.embeddings.weight: copying a param with shape torch.Size([64699, 1024]) from checkpoint, the shape in current model is torch.Size([30145, 1024]).
N.B: I've tried also your piece of code in my environment but it doesn't work (the same bug as before). How is it possible? I'm using Python 3.6.9, OS Ubuntu 16.04, PyTorch 1.3.1 and TensorFlow 2.0.
Hmm could you post a short snippet to reproduce? Running your initial script in my environment doesn't raise any error:
from transformers import XLMWithLMHeadModel XLMWithLMHeadModel.from_pretrained("xlm-mlm-17-1280")The error seems to be fixed on my side
Indeed, it doesn't fail on my side either. Is there any way you could go check in your environment, I guess (according to your error trace) following the path:
/home/vidiemme/Desktop/transformers/transformers/transformers/configuration_xlm.py
and telling me if the following lines:
if "n_words" in kwargs:
self.n_words = kwargs["n_words"]
Are on lines 147-148? Just to make sure the install from source worked correctly. Thank you @TheEdoardo93
N.B: I've tried also your piece of code in my environment but it doesn't work (the same bug as before). How is it possible? I'm using Python 3.6.9, OS Ubuntu 16.04, PyTorch 1.3.1 and TensorFlow 2.0.
Hmm okay, I'm looking into it.
In the file you've said to me at line 147-148 I've got the following lines:
@property
def n_words(self): # For backward compatibility
return self.vocab_size
I don't have the lines you've posted above. Therefore, I can say that I haven't installed the Transformers library correctly. How can I do (i.e. install from master after your fix)? Usually I do the following: pip install git+https://github.com/huggingface/transformers.git
Indeed, it doesn't fail on my side either. Is there any way you could go check in your environment, I guess (according to your error trace) following the path:
/home/vidiemme/Desktop/transformers/transformers/transformers/configuration_xlm.pyand telling me if the following lines:
if "n_words" in kwargs: self.n_words = kwargs["n_words"]Are on lines 147-148? Just to make sure the install from source worked correctly. Thank you @TheEdoardo93
Hmm it seems your install from source didn't work. I don't exactly know how your environment is setup, but it looks like you've cloned the repository and the code is running from this clone rather than from the library installed in your environment/virtual environment.
If you did clone it in /home/vidiemme/Desktop/transformers/, I would just do a git pull to update it.
Now it works as expected! Your fix fixes the bug! Great work! You can close this issue for me ;)
Now we can import both XLMForQuestionAnswering.from_pretrained('xlm-mlm-ende-1024') and XLMWithLMHeadModel.from_pretrained("xlm-mlm-17-1280") correctly.
Hmm it seems your install from source didn't work. I don't exactly know how your environment is setup, but it looks like you've cloned the repository and the code is running from this clone rather than from the library installed in your environment/virtual environment.
If you did clone it in
/home/vidiemme/Desktop/transformers/, I would just do agit pullto update it.
Glad to hear that @TheEdoardo93 !
A bit late to the party, but I can provide a second confirmation that this error no longer appears.
Thanks!
PS I don't know where is a useful place to put this but for anyone training XLM on squad....
The command above now runs to completion.
Its score is underwhelming but demonstrates some training has been achieved
Results: {'exact': 56.9441816461684, 'f1': 67.90690126118979, 'total': 10570, 'HasAns_exact': 56.9441816461684, 'HasAns_f1': 67.90690126118979, 'HasAns_total': 10570, 'best_exact': 56.9441816461684, 'best_exact_thresh': 0.0, 'best_f1': 67.90690126118979, 'best_f1_thresh': 0.0}