Transformers: Unable to download community models

Created on 3 Jan 2020 · 6Comments · Source: huggingface/transformers

🐛 Bug

Model I am using (Bert, XLNet....): bert-base-cased-finetuned-conll03-english

Language I am using the model on (English, Chinese....): English

The problem arise when using:

[x] the official example scripts: running a small snippet from docs (see below)
[ ] my own modified scripts: (give details)

The tasks I am working on is:

[ ] an official GLUE/SQUaD task: (give the name)
[x] my own task or dataset: just trying to load the model at this stage

To Reproduce

Steps to reproduce the behavior:

I'm following the instructions at https://huggingface.co/bert-large-cased-finetuned-conll03-english but failing at the first hurdle. This is the snippet from the docs that I've run:

tokenizer = AutoTokenizer.from_pretrained("bert-large-cased-finetuned-conll03-english")
model = AutoModel.from_pretrained("bert-large-cased-finetuned-conll03-english")

It fails with this message:

OSError: Model name 'bert-base-cased-finetuned-conll03-english' was not found in model name list (bert-base-uncased, bert-large-uncased, bert-base-cased, bert-large-cased, bert-base-multilingual-uncased, bert-base-multilingual-cased, bert-base-chinese, bert-base-german-cased, bert-large-uncased-whole-word-masking, bert-large-cased-whole-word-masking, bert-large-uncased-whole-word-masking-finetuned-squad, bert-large-cased-whole-word-masking-finetuned-squad, bert-base-cased-finetuned-mrpc, bert-base-german-dbmdz-cased, bert-base-german-dbmdz-uncased, bert-base-japanese, bert-base-japanese-whole-word-masking, bert-base-japanese-char, bert-base-japanese-char-whole-word-masking, bert-base-finnish-cased-v1, bert-base-finnish-uncased-v1). We assumed 'https://s3.amazonaws.com/models.huggingface.co/bert/bert-base-cased-finetuned-conll03-english/config.json' was a path or url to a configuration file named config.json or a directory containing such a file but couldn't find any such file at this path or url.

The message mentions looking at https://s3.amazonaws.com/models.huggingface.co/bert/bert-base-cased-finetuned-conll03-english/config.json and finding nothing.

I also tried with the CLI: transformers-cli download bert-base-cased-finetuned-conll03-english but I'm afraid that failed with a similar message. However both methods work for the namespaced models, e.g. dbmdz/bert-base-italian-cased.

Expected behavior

The community model should download. :)

Environment

OS: openSUSE Tumbleweed 20200101
Python version: 3.7
PyTorch version: 1.3.1
PyTorch Transformers version (or branch): 2.3.0
Using GPU ? n/a
Distributed of parallel setup ? n/a
Any other relevant information:

Additional context

I browsed https://s3.amazonaws.com/models.huggingface.co/ and see that the model is there, but paths are like:

https://s3.amazonaws.com/models.huggingface.co/bert/bert-base-cased-finetuned-conll03-english-config.json

rather than:

https://s3.amazonaws.com/models.huggingface.co/bert/bert-base-cased-finetuned-conll03-english/config.json

(note -config.json vs /config.json)

If I download the files manually and rename, the model loads. So it looks like just a naming problem.

Source

cbowdon

All 6 comments

I confirm what you see... in current master code, bert-large-cased-finetuned-conll03-english has no mapping in tokenizers or models so it can't find it in the same way as bert-base-uncased for example.

but it works if you target it directly:

AutoTokenizer.from_pretrained("https://s3.amazonaws.com/models.huggingface.co/bert/bert-base-cased-finetuned-conll03-english-config.json")

AutoModel.from_pretrained("https://s3.amazonaws.com/models.huggingface.co/bert/bert-base-cased-finetuned-conll03-english-pytorch_model.bin")

mandubian on 3 Jan 2020

👍1

Hmm, I think I see the issue. @stefan-it @mfuntowicz we could either:

move bert-large-cased-finetuned-conll03-english to dbmdz/bert-large-cased-finetuned-conll03-english
or add shortcut model names inside the codebase (config, model, tokenizer)

What do you think?

(also kinda related to #2281)