Transformers: Issue with XLNet using xlnet-base-cased

Created on 12 May 2020 · 6Comments · Source: huggingface/transformers

🐛 Bug

Information

Model I am using (Bert, XLNet ...): XLNet

Language I am using the model on (English, Chinese ...): English

The problem arises when using:

[x] the official example scripts: (give details below)
[ ] my own modified scripts: (give details below)

The tasks I am working on is:

[x] an official GLUE/SQUaD task: SQUaD v1.1
[ ] my own task or dataset: (give details below)

To reproduce

Steps to reproduce the behavior:

Clone the repository

!git clone https://github.com/huggingface/transformers.git

!python ./transformers/examples/question-answering/run_squad.py \
    --model_type xlnet \
    --model_name_or_path xlnet-base-cased \
    --do_train \
    --do_eval \
    --train_file $SQuAD_Dir/train-v1.1.json \
    --predict_file $SQuAD_Dir/dev-v1.1.json \
    --learning_rate 3e-5 \
    --num_train_epochs 2 \
    --max_seq_length 384 \
    --doc_stride 128 \
    --output_dir ./model_output \
    --per_gpu_eval_batch_size=4  \
    --per_gpu_train_batch_size=4   \
    --save_steps 5000

Error

Epoch:   0% 0/2 [00:00<?, ?it/s]
Iteration:   0% 0/15852 [00:00<?, ?it/s]Traceback (most recent call last):
  File "./transformers/examples/question-answering/run_squad.py", line 830, in <module>
    main()
  File "./transformers/examples/question-answering/run_squad.py", line 769, in main
    global_step, tr_loss = train(args, train_dataset, model, tokenizer)
  File "./transformers/examples/question-answering/run_squad.py", line 204, in train
    outputs = model(**inputs)
  File "/usr/local/lib/python3.6/dist-packages/torch/nn/modules/module.py", line 550, in __call__
    result = self.forward(*input, **kwargs)
TypeError: forward() got an unexpected keyword argument 'cls_index'

Expected behavior

The model should start training from the first epoch.

Environment info

transformers version: 2.9.0
Platform: Google Colab Pro
Python version: 3.6.9
PyTorch version (GPU?): 1.5.0+cu101 (Yes)
Tensorflow version (GPU?): 2.2.0 (Yes)
Using GPU in script?: Yes
Using distributed or parallel set-up in script?: No

wontfix

Source

eabg97

All 6 comments

I also have the same problem @alexandrenriq

yyHaker on 12 May 2020

😕3

I think there's a mistake in the code which sets the AutoModelForQuestionAnswering as XLNetForQuestionAnsweringSimple. You can run the code by substituting AutoModelForQuestionAnswering into XLNetForQuestionAnswering (or by removing cls_index and p_mask in the batches to make XLNetForQuestionAnsweringSimple work).

Nevertheless, after the training is done, I can't reproduce the scores right (Results: {'exact': 0.03784295175023652, 'f1': 0.6317807409886281,...).