Transformers: Specify a model from a specific directory for extract_features.py

Created on 28 Nov 2018 · 4Comments · Source: huggingface/transformers

I have downloaded the model and vocab files into a specific location, using their original file names, so my directory for bert-base-cased contains:

bert-base-cased-vocab.txt
bert_config.json
pytorch_model.bin

But when I try to specify the directory which contains these files for the --bert_model parameter of extract_features.py I get the following error:

ValueError: Can't find a vocabulary file at path <THEDIRECTORYPATHISPECIFIED> ...

When I specify a file that exists and is a proper file, the error messages seem to indicate that the program wants to untar and uncompress the files.

Is there no way to just specify a specific directory that contains the vocab, config, and model files?

Source

johann-petrak

Most helpful comment

The last update broke this, but you can fix this in tokenization.py, you have to add this after vocab_file = pretrained_model_name:

if os.path.isdir(vocab_file):
    vocab_file = os.path.join(vocab_file, "vocab.txt")

artemisart on 29 Nov 2018

👍2

All 4 comments

The last update broke this, but you can fix this in tokenization.py, you have to add this after vocab_file = pretrained_model_name:

if os.path.isdir(vocab_file):
    vocab_file = os.path.join(vocab_file, "vocab.txt")

artemisart on 29 Nov 2018

👍2

Thank you, is it fair to assume that this will get accepted as an issue and fixed in a future update/release?

johann-petrak on 30 Nov 2018

Yes :-) There is a new release planned for tonight that will fix this (among other things, basically all the other open issues).

thomwolf on 30 Nov 2018

👍1

Ok, this is now included in the new release 0.3.0 (by #73).

thomwolf on 30 Nov 2018

👍1

Was this page helpful?

0 / 5 - 0 ratings

Related issues

ValueError while using --optimize_on_cpu

rsanjaykamath · 3Comments

Finetuning OpenAI GPT-2 for another language.

0x01h · 3Comments

Weights not initialized from pretrained model

lemonhu · 3Comments

if crf needed when do ner?

alphanlp · 3Comments

Tokenizer not found after conversion from TF checkpoint to PyTorch

HansBambel · 3Comments