transformers version: 3.3.1@sgugger
Model I am using (Bert, XLNet ...): XLNet-base-cased
The problem arises when using:
The tasks I am working on is:
Steps to reproduce the behavior:
Install transformers from master and download SST-2 data using download_glue_data.py
Create the following scripts
GLUE_DIR=~/glue
CUDA_VISIBLE_DEVICES=0
TASK_NAME=SST-2
python3 ~/applications/transformers/examples/text-classification/run_glue.py \
--model_name_or_path ~/xlnet \
--task_name $TASK_NAME \
--do_eval \
--data_dir $GLUE_DIR/$TASK_NAME \
--max_seq_length 64 \
--per_device_train_batch_size 32 \
--per_device_eval_batch_size 64 \
--learning_rate 2e-5 \
--num_train_epochs 3.0 \
--output_dir ~/result/$TASK_NAME/ \
--overwrite_output_dir \
--eval_steps 100
Trainer should return appropriate evaluation results. Here are logs when evaluating bert-base with above-given hyperparameters.
10/05/2020 22:28:47 - INFO - filelock - Lock 140392033291808 acquired on /data/home/liusishun/glue/SST-2/cached_dev_BertTokenizer_64_sst-2.lock
10/05/2020 22:28:47 - INFO - filelock - Lock 140392033291808 released on /data/home/liusishun/glue/SST-2/cached_dev_BertTokenizer_64_sst-2.lock
10/05/2020 22:28:50 - INFO - __main__ - *** Evaluate ***
Evaluation: 100%|鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻坾 14/14 [00:01<00:00, 7.22it/s]
{'eval_loss': 0.6916399122378148, 'eval_acc': 0.49770642201834864, 'step': 0}
/data/home/liusishun/.conda/envs/myenv/lib/python3.8/site-packages/transformers/trainer.py:1168: FutureWarning: This method is deprecated, use `Trainer.is_world_process_zero()` instead.
warnings.warn("This method is deprecated, use `Trainer.is_world_process_zero()` instead.", FutureWarning)
10/05/2020 22:28:52 - INFO - __main__ - ***** Eval results sst-2 *****
10/05/2020 22:28:52 - INFO - __main__ - eval_loss = 0.6916399122378148
10/05/2020 22:28:52 - INFO - __main__ - eval_acc = 0.49770642201834864
10/05/2020 22:30:05 - INFO - filelock - Lock 139928226197216 acquired on /data/home/liusishun/glue/SST-2/cached_dev_XLNetTokenizer_64_sst-2.lock
10/05/2020 22:30:05 - INFO - filelock - Lock 139928226197216 released on /data/home/liusishun/glue/SST-2/cached_dev_XLNetTokenizer_64_sst-2.lock
10/05/2020 22:30:09 - INFO - __main__ - *** Evaluate ***
Evaluation: 93%|鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枆 | 13/14 [00:02<00:00, 4.44it/s]
Traceback (most recent call last):
File "/data/home/liusishun/applications/transformers/examples/text-classification/run_glue.py", line 247, in <module>
main()
File "/data/home/liusishun/applications/transformers/examples/text-classification/run_glue.py", line 197, in main
eval_result = trainer.evaluate(eval_dataset=eval_dataset)
File "/data/home/liusishun/.conda/envs/myenv/lib/python3.8/site-packages/transformers/trainer.py", line 1297, in evaluate
output = self.prediction_loop(eval_dataloader, description="Evaluation")
File "/data/home/liusishun/.conda/envs/myenv/lib/python3.8/site-packages/transformers/trainer.py", line 1382, in prediction_loop
preds = logits if preds is None else nested_concat(preds, logits, dim=0)
File "/data/home/liusishun/.conda/envs/myenv/lib/python3.8/site-packages/transformers/trainer_utils.py", line 151, in nested_concat
return type(tensors)(nested_concat(t, n, dim) for t, n in zip(tensors, new_tensors))
File "/data/home/liusishun/.conda/envs/myenv/lib/python3.8/site-packages/transformers/trainer_utils.py", line 151, in <genexpr>
return type(tensors)(nested_concat(t, n, dim) for t, n in zip(tensors, new_tensors))
File "/data/home/liusishun/.conda/envs/myenv/lib/python3.8/site-packages/transformers/trainer_utils.py", line 151, in nested_concat
return type(tensors)(nested_concat(t, n, dim) for t, n in zip(tensors, new_tensors))
File "/data/home/liusishun/.conda/envs/myenv/lib/python3.8/site-packages/transformers/trainer_utils.py", line 151, in <genexpr>
return type(tensors)(nested_concat(t, n, dim) for t, n in zip(tensors, new_tensors))
File "/data/home/liusishun/.conda/envs/myenv/lib/python3.8/site-packages/transformers/trainer_utils.py", line 152, in nested_concat
return torch.cat((tensors, new_tensors), dim=dim)
RuntimeError: invalid argument 0: Sizes of tensors must match except in dimension 0. Got 40 and 64 in dimension 1 at /opt/conda/conda-bld/pytorch_1579061855666/work/aten/src/THC/generic/THCTensorMath.cu:71
The XLNet model outputs some past states called mems at index 2. Those can't be concatenated together because they have a sequence length that varies. You should pass along --past_index 2 to your script so that:
mems are usedWe will have something easier to use in the future, but for now it should work around your problem.
Thanks for your fast reply. Unfortunately --past_index 2 doesn't work for me.
New error logs
10/05/2020 22:55:40 - INFO - filelock - Lock 140417916796544 acquired on /data/home/liusishun/glue/SST-2/cached_dev_XLNetTokenizer_64_sst-2.lock
10/05/2020 22:55:41 - INFO - filelock - Lock 140417916796544 released on /data/home/liusishun/glue/SST-2/cached_dev_XLNetTokenizer_64_sst-2.lock
10/05/2020 22:55:44 - INFO - __main__ - *** Evaluate ***
Evaluation: 93%|鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枆 | 13/14 [00:09<00:00, 1.41it/s]
Traceback (most recent call last):
File "/data/home/liusishun/applications/transformers/examples/text-classification/run_glue.py", line 247, in <module>
main()
File "/data/home/liusishun/applications/transformers/examples/text-classification/run_glue.py", line 197, in main
eval_result = trainer.evaluate(eval_dataset=eval_dataset)
File "/data/home/liusishun/.conda/envs/myenv/lib/python3.8/site-packages/transformers/trainer.py", line 1297, in evaluate
output = self.prediction_loop(eval_dataloader, description="Evaluation")
File "/data/home/liusishun/.conda/envs/myenv/lib/python3.8/site-packages/transformers/trainer.py", line 1377, in prediction_loop
loss, logits, labels = self.prediction_step(model, inputs, prediction_loss_only)
File "/data/home/liusishun/.conda/envs/myenv/lib/python3.8/site-packages/transformers/trainer.py", line 1459, in prediction_step
outputs = model(**inputs)
File "/data/home/liusishun/.conda/envs/myenv/lib/python3.8/site-packages/torch/nn/modules/module.py", line 532, in __call__
result = self.forward(*input, **kwargs)
File "/data/home/liusishun/.conda/envs/myenv/lib/python3.8/site-packages/transformers/modeling_xlnet.py", line 1499, in forward
transformer_outputs = self.transformer(
File "/data/home/liusishun/.conda/envs/myenv/lib/python3.8/site-packages/torch/nn/modules/module.py", line 532, in __call__
result = self.forward(*input, **kwargs)
File "/data/home/liusishun/.conda/envs/myenv/lib/python3.8/site-packages/transformers/modeling_xlnet.py", line 1226, in forward
new_mems = new_mems + (self.cache_mem(output_h, mems[i]),)
File "/data/home/liusishun/.conda/envs/myenv/lib/python3.8/site-packages/transformers/modeling_xlnet.py", line 1011, in cache_mem
new_mem = torch.cat([prev_mem, curr_out], dim=0)[cutoff:]
RuntimeError: invalid argument 0: Sizes of tensors must match except in dimension 0. Got 40 and 64 in dimension 1 at /opt/conda/conda-bld/pytorch_1579061855666/work/aten/src/THC/generic/THCTensorMath.cu:71
current script
GLUE_DIR=~/glue
CUDA_VISIBLE_DEVICES=0
TASK_NAME=SST-2
python3 ~/applications/transformers/examples/text-classification/run_glue.py \
--model_name_or_path ~/xlnet \
--task_name $TASK_NAME \
--do_eval \
--data_dir $GLUE_DIR/$TASK_NAME \
--max_seq_length 64 \
--per_device_train_batch_size 32 \
--per_device_eval_batch_size 64 \
--past_index 2 \
--learning_rate 2e-5 \
--num_train_epochs 3.0 \
--output_dir ~/result/$TASK_NAME/ \
--overwrite_output_dir \
--eval_steps 100 \
Any idea?
Asking for the XLNet specialists on our internal slack. I think the main problem is that the model returns those mems that can't be used for anything (and can't be concatenated). The fact you have an error with past_index show they can't really be used to speed up sequence classification.
Thanks for your response. Could you have any temporary workarounds or further actions about this problem?
Use another model...
Hi @StepinSilence and @sgugger ! Any updates on this issue?
@StepinSilence were able to find a work around to use XLNet?
Hi, @adhithyaarun. I remember that this issue occurred when batch size couldn't divide the dataset size, so if you set the batch size a factor of the size of your dataset it may work. However, I can't confirm this right now because our server data disk died several days ago.
Hello. I encountered the same problem using a Camembert Model with transformers 3.4.0. This issue seems to rise when using dynamic padding. Any workaround for this other than padding to max length?
You should update to 3.5.0, which contains a fix for this in Trainer, to be able to do evaluation with dynamic padding.
From reading the paper (especilally the experiment part about SQuad, RACE, ...) I originally thought that the cached memory was also used during fine-tuning and not just during pre-training, but from this description here: https://github.com/zihangdai/xlnet/issues/41#issuecomment-505102587 it seems like the cached memory is actually not used during fine-tuning. So I'd suggest that we disable it for all models except XLNetLMHeadModel where it obviously makes sense to use it. I'll add a PR to fix it
Really thank all of you for fixing this issue!
Most helpful comment
From reading the paper (especilally the experiment part about SQuad, RACE, ...) I originally thought that the cached memory was also used during fine-tuning and not just during pre-training, but from this description here: https://github.com/zihangdai/xlnet/issues/41#issuecomment-505102587 it seems like the cached memory is actually not used during fine-tuning. So I'd suggest that we disable it for all models except
XLNetLMHeadModelwhere it obviously makes sense to use it. I'll add a PR to fix it