Transformers: can't load checkpoint file from examples/run_language_modeling.py

Created on 13 May 2020 · 3Comments · Source: huggingface/transformers

🐛 Bug

Information

Model I am using (Bert, XLNet ...):
GPT2
Language I am using the model on (English, Chinese ...):
English
The problem arises when using:

[x ] the official example scripts: (give details below)
There seems to be no supported way of continuing training or evaluating a previously saved model checkpoint.
[ ] my own modified scripts: (give details below)

The tasks I am working on is:

[ ] an official GLUE/SQUaD task: (give the name)
[ x] my own task or dataset: (give details below)
just trying to train/eval on wikitext-2 dataset

To reproduce

Steps to reproduce the behavior:

python ../examples/language-modeling/run_language_modeling.py ^
--output_dir=output ^
--overwrite_output_dir ^
--tokenizer=gpt2 ^
--model_type=gpt2 ^
--model_name_or_path=output/pytorch.pytorch_model.bin ^
--do_eval ^
--per_gpu_eval_batch_size=1 ^
--eval_data_file=%userprofile%/.data/wikitext-2/wikitext-2/wiki.test.tokens

This gives an error because "model_name_or_path" is assumed to be a JSON file that contained pretrained model info, not a saved checkpoint file. The error that occurs here is when trying to load the CONFIG file associated with a pretrained model.

I also tried to create a new "model_checkpoint" argument that I then pass into AutoModelWithLMHead.from_pretrained(), but that ends up with a model/checkpoint mismatch (looks like hidden size in checkpoint file =256, but current model=768). In my usage here, I have never changed the hidden size - just did the "do-train" option and it saved my checkpoints to the output directory. And now, I am just trying to verify I can eval on a checkpoint, and then also continue training on a checkpoint.

Expected behavior

I expected to be able to specify an checkpoint_path argument in the run_language_modeling.py that would load the checkpoint file and let me continue training on it and/or evaluate it.

Environment info

transformers version: 2.9.0
Platform: Windows-10-10.0.19041-SP0
Python version: 3.6.9
PyTorch version (GPU?): 1.4.0 (True)
Tensorflow version (GPU?): not installed (NA)
Using GPU in script?: yes
Using distributed or parallel set-up in script?: no

Source

rfernand2

Most helpful comment

Hi, may I ask how did you get these checkpoint files? I tried to specify the path to the checkpoint that is generated by the script during training (containing _config.json_, _optimizer.pt_, _pytorch_model.bin_, _scheduler.pt_, _training_args.bin_), but I met with a Traceback like this

Traceback (most recent call last):
  File "run_language_modeling.py", line 277, in <module>
    main()
  File "run_language_modeling.py", line 186, in main
    tokenizer = AutoTokenizer.from_pretrained(model_args.model_name_or_path, cache_dir=model_args.cache_dir)
  File "H:\Anaconda3\envs\env_name\lib\site-packages\transformers\tokenization_auto.py", line 203, in from_pretrained
    return tokenizer_class_py.from_pretrained(pretrained_model_name_or_path, *inputs, **kwargs)
  File "H:\Anaconda3\envs\env_name\lib\site-packages\transformers\tokenization_utils.py", line 902, in from_pretrained
    return cls._from_pretrained(*inputs, **kwargs)
  File "H:\Anaconda3\envs\env_name\lib\site-packages\transformers\tokenization_utils.py", line 1007, in _from_pretrained
    list(cls.vocab_files_names.values()),
OSError: Model name 'C:\\path-to-ckpt\\checkpoint-17500' was not found in tokenizers model name list (bert-base-uncased, bert-large-uncased, bert-base-cased, bert-large-cased, bert-base-multilingual-uncased, bert-base-multilingual-cased, bert-base-chinese, bert-base-german-cased, bert-large-uncased-whole-word-masking, bert-large-cased-whole-word-masking, bert-large-uncased-whole-word-masking-finetuned-squad, bert-large-cased-whole-word-masking-finetuned-squad, bert-base-cased-finetuned-mrpc, bert-base-german-dbmdz-cased, bert-base-german-dbmdz-uncased, bert-base-finnish-cased-v1, bert-base-finnish-uncased-v1, bert-base-dutch-cased). We assumed 'C:\\path-to-ckpt\\checkpoint-17500' was a path, a model identifier, or url to a directory containing vocabulary files named ['vocab.txt'] but couldn't find such vocabulary files at this path or url.

which technically says that the checkpoint folder misses some other files. I wonder where this mismatch comes from if I used the same script to train.

vincentwen1995 on 29 May 2020

👍8

All 3 comments

--model_name_or_path should be a folder, so you should use just ./output instead.

julien-c on 13 May 2020

Thanks. Verified - that fixed it. Please add a note n the README.md to explain this. Thanks.

rfernand2 on 13 May 2020

Traceback (most recent call last):
  File "run_language_modeling.py", line 277, in <module>
    main()
  File "run_language_modeling.py", line 186, in main
    tokenizer = AutoTokenizer.from_pretrained(model_args.model_name_or_path, cache_dir=model_args.cache_dir)
  File "H:\Anaconda3\envs\env_name\lib\site-packages\transformers\tokenization_auto.py", line 203, in from_pretrained
    return tokenizer_class_py.from_pretrained(pretrained_model_name_or_path, *inputs, **kwargs)
  File "H:\Anaconda3\envs\env_name\lib\site-packages\transformers\tokenization_utils.py", line 902, in from_pretrained
    return cls._from_pretrained(*inputs, **kwargs)
  File "H:\Anaconda3\envs\env_name\lib\site-packages\transformers\tokenization_utils.py", line 1007, in _from_pretrained
    list(cls.vocab_files_names.values()),
OSError: Model name 'C:\\path-to-ckpt\\checkpoint-17500' was not found in tokenizers model name list (bert-base-uncased, bert-large-uncased, bert-base-cased, bert-large-cased, bert-base-multilingual-uncased, bert-base-multilingual-cased, bert-base-chinese, bert-base-german-cased, bert-large-uncased-whole-word-masking, bert-large-cased-whole-word-masking, bert-large-uncased-whole-word-masking-finetuned-squad, bert-large-cased-whole-word-masking-finetuned-squad, bert-base-cased-finetuned-mrpc, bert-base-german-dbmdz-cased, bert-base-german-dbmdz-uncased, bert-base-finnish-cased-v1, bert-base-finnish-uncased-v1, bert-base-dutch-cased). We assumed 'C:\\path-to-ckpt\\checkpoint-17500' was a path, a model identifier, or url to a directory containing vocabulary files named ['vocab.txt'] but couldn't find such vocabulary files at this path or url.

which technically says that the checkpoint folder misses some other files. I wonder where this mismatch comes from if I used the same script to train.

vincentwen1995 on 29 May 2020

👍8

Was this page helpful?

0 / 5 - 0 ratings