I'm getting a KeyError here when using RoBERTa in examples/run_glue.py and trying to access 'token_type_ids' while preprocessing the data, maybe from this commit removing 'token_type_ids' from RoBERTa (and DistilBERT)?
I get the error when fine-tuning RoBERTa on CoLA and RTE. I haven't tried other tasks, but I think you'd get the same error.
I don't get the error when fine-tuning XLNet (presumably, since XLNet does use 'token_type_ids'), and I don't get the error when I do pip install transformers instead of pip install . (which I think means the issue is coming from a recent commit).
Here's the full error message:
03/17/2020 11:53:58 - INFO - transformers.data.processors.glue - Writing example 0/13997
Traceback (most recent call last):
File "examples/run_glue.py", line 731, in <module>
main()
File "examples/run_glue.py", line 679, in main
train_dataset = load_and_cache_examples(args, args.task_name, tokenizer, evaluate=False)
File "examples/run_glue.py", line 419, in load_and_cache_examples
pad_token_segment_id=4 if args.model_type in ["xlnet"] else 0,
File "/home/ejp416/cmv/transformers/src/transformers/data/processors/glue.py", line 94, in glue_convert_examples_to_features
input_ids, token_type_ids = inputs["input_ids"], inputs["token_type_ids"]
KeyError: 'token_type_ids'
Model I am using (Bert, XLNet ...): RoBERTa. I think DistilBERT may run into the same issue as well.
Language I am using the model on (English, Chinese ...): English
The problem arises when using:
I've made slight modifications to the training loop in the official examples/run_glue.py, but I did not touch the data pre-processing, which is where the error occurs (before any training).
The tasks I am working on is:
I've run into the error on CoLA and RTE, though I think the error should happen on all GLUE tasks.
Steps to reproduce the behavior:
transformers using the latest clone (use pip install . not pip install transformers)data/RTE using the GLUE download scripts in this repo)python examples/run_glue.py --model_type roberta --model_name_or_path roberta-base --output_dir models/debug --task_name rte --do_train --evaluate_during_training --data_dir data/RTE --max_seq_length 32 --max_grad_norm inf --adam_epsilon 1e-6 --adam_beta_2 0.98 --weight_decay 0.1 --logging_steps 874 --save_steps 874 --num_train_epochs 10 --warmup_steps 874 --per_gpu_train_batch_size 1 --per_gpu_eval_batch_size 2 --learning_rate 1e-5 --seed 12 --gradient_accumulation_steps 16 --overwrite_output_dir
load_and_cache_examples (and specifically, the call to convert_examples_to_features) in examples/run_glue.py should run without error, to load, preprocess, and tokenize the dataset.
transformers version: 2.5.1I also have this issue when i run run_multiple_choice.py in RACE data with RoBERTA.
I get the same error when I try to fine-tune Squad
Tagging @LysandreJik
I also have this issue when i run run_multiple_choice.py in RACE data with RoBERTA.
Same here. Any solution?
@nielingyun @orena1 @Onur90 maybe try pulling again from the latest version of the repo and see if it works? The error went away after I pulled recently, not sure if that fixed it or something else I did - let me know if that worked
@ethanjperez by latest version you mean latest commit or the latest release (v2.6.0)? It is still not working with the latest commit.
Most helpful comment
I also have this issue when i run run_multiple_choice.py in RACE data with RoBERTA.