Transformers: Unable to download community models

Created on 3 Jan 2020  路  6Comments  路  Source: huggingface/transformers

馃悰 Bug

Model I am using (Bert, XLNet....): bert-base-cased-finetuned-conll03-english

Language I am using the model on (English, Chinese....): English

The problem arise when using:

  • [x] the official example scripts: running a small snippet from docs (see below)
  • [ ] my own modified scripts: (give details)

The tasks I am working on is:

  • [ ] an official GLUE/SQUaD task: (give the name)
  • [x] my own task or dataset: just trying to load the model at this stage

To Reproduce

Steps to reproduce the behavior:

I'm following the instructions at https://huggingface.co/bert-large-cased-finetuned-conll03-english but failing at the first hurdle. This is the snippet from the docs that I've run:

tokenizer = AutoTokenizer.from_pretrained("bert-large-cased-finetuned-conll03-english")
model = AutoModel.from_pretrained("bert-large-cased-finetuned-conll03-english")

It fails with this message:

OSError: Model name 'bert-base-cased-finetuned-conll03-english' was not found in model name list (bert-base-uncased, bert-large-uncased, bert-base-cased, bert-large-cased, bert-base-multilingual-uncased, bert-base-multilingual-cased, bert-base-chinese, bert-base-german-cased, bert-large-uncased-whole-word-masking, bert-large-cased-whole-word-masking, bert-large-uncased-whole-word-masking-finetuned-squad, bert-large-cased-whole-word-masking-finetuned-squad, bert-base-cased-finetuned-mrpc, bert-base-german-dbmdz-cased, bert-base-german-dbmdz-uncased, bert-base-japanese, bert-base-japanese-whole-word-masking, bert-base-japanese-char, bert-base-japanese-char-whole-word-masking, bert-base-finnish-cased-v1, bert-base-finnish-uncased-v1). We assumed 'https://s3.amazonaws.com/models.huggingface.co/bert/bert-base-cased-finetuned-conll03-english/config.json' was a path or url to a configuration file named config.json or a directory containing such a file but couldn't find any such file at this path or url.

The message mentions looking at https://s3.amazonaws.com/models.huggingface.co/bert/bert-base-cased-finetuned-conll03-english/config.json and finding nothing.

I also tried with the CLI: transformers-cli download bert-base-cased-finetuned-conll03-english but I'm afraid that failed with a similar message. However both methods work for the namespaced models, e.g. dbmdz/bert-base-italian-cased.

Expected behavior

The community model should download. :)

Environment

  • OS: openSUSE Tumbleweed 20200101
  • Python version: 3.7
  • PyTorch version: 1.3.1
  • PyTorch Transformers version (or branch): 2.3.0
  • Using GPU ? n/a
  • Distributed of parallel setup ? n/a
  • Any other relevant information:

Additional context

I browsed https://s3.amazonaws.com/models.huggingface.co/ and see that the model is there, but paths are like:

https://s3.amazonaws.com/models.huggingface.co/bert/bert-base-cased-finetuned-conll03-english-config.json

rather than:

https://s3.amazonaws.com/models.huggingface.co/bert/bert-base-cased-finetuned-conll03-english/config.json

(note -config.json vs /config.json)

If I download the files manually and rename, the model loads. So it looks like just a naming problem.

All 6 comments

I confirm what you see... in current master code, bert-large-cased-finetuned-conll03-english has no mapping in tokenizers or models so it can't find it in the same way as bert-base-uncased for example.

but it works if you target it directly:

AutoTokenizer.from_pretrained("https://s3.amazonaws.com/models.huggingface.co/bert/bert-base-cased-finetuned-conll03-english-config.json")

AutoModel.from_pretrained("https://s3.amazonaws.com/models.huggingface.co/bert/bert-base-cased-finetuned-conll03-english-pytorch_model.bin")

Hmm, I think I see the issue. @stefan-it @mfuntowicz we could either:

  • move bert-large-cased-finetuned-conll03-english to dbmdz/bert-large-cased-finetuned-conll03-english
  • or add shortcut model names inside the codebase (config, model, tokenizer)

What do you think?

(also kinda related to #2281)

@julien-c I think it would be better to move the model under the dbmdz namespace - as it is no "official" model!

@julien-c moving to dbmdz is fine. We need to update the default NER pipeline's model provider to reflect the new path.

Model now lives at https://huggingface.co/dbmdz/bert-large-cased-finetuned-conll03-english

Let me know if everything works correctly!

Works perfectly now, thanks!

Was this page helpful?
0 / 5 - 0 ratings