Transformers: Specify a model from a specific directory for extract_features.py

Created on 28 Nov 2018  路  4Comments  路  Source: huggingface/transformers

I have downloaded the model and vocab files into a specific location, using their original file names, so my directory for bert-base-cased contains:

bert-base-cased-vocab.txt
bert_config.json
pytorch_model.bin

But when I try to specify the directory which contains these files for the --bert_model parameter of extract_features.py I get the following error:

ValueError: Can't find a vocabulary file at path <THEDIRECTORYPATHISPECIFIED> ...

When I specify a file that exists and is a proper file, the error messages seem to indicate that the program wants to untar and uncompress the files.

Is there no way to just specify a specific directory that contains the vocab, config, and model files?

Most helpful comment

The last update broke this, but you can fix this in tokenization.py, you have to add this after vocab_file = pretrained_model_name:

if os.path.isdir(vocab_file):
    vocab_file = os.path.join(vocab_file, "vocab.txt")

All 4 comments

The last update broke this, but you can fix this in tokenization.py, you have to add this after vocab_file = pretrained_model_name:

if os.path.isdir(vocab_file):
    vocab_file = os.path.join(vocab_file, "vocab.txt")

Thank you, is it fair to assume that this will get accepted as an issue and fixed in a future update/release?

Yes :-) There is a new release planned for tonight that will fix this (among other things, basically all the other open issues).

Ok, this is now included in the new release 0.3.0 (by #73).

Was this page helpful?
0 / 5 - 0 ratings

Related issues

yspaik picture yspaik  路  3Comments

rsanjaykamath picture rsanjaykamath  路  3Comments

HansBambel picture HansBambel  路  3Comments

hsajjad picture hsajjad  路  3Comments

0x01h picture 0x01h  路  3Comments