I have downloaded the model and vocab files into a specific location, using their original file names, so my directory for bert-base-cased contains:
bert-base-cased-vocab.txt
bert_config.json
pytorch_model.bin
But when I try to specify the directory which contains these files for the --bert_model parameter of extract_features.py I get the following error:
ValueError: Can't find a vocabulary file at path <THEDIRECTORYPATHISPECIFIED> ...
When I specify a file that exists and is a proper file, the error messages seem to indicate that the program wants to untar and uncompress the files.
Is there no way to just specify a specific directory that contains the vocab, config, and model files?
The last update broke this, but you can fix this in tokenization.py, you have to add this after vocab_file = pretrained_model_name:
if os.path.isdir(vocab_file):
vocab_file = os.path.join(vocab_file, "vocab.txt")
Thank you, is it fair to assume that this will get accepted as an issue and fixed in a future update/release?
Yes :-) There is a new release planned for tonight that will fix this (among other things, basically all the other open issues).
Ok, this is now included in the new release 0.3.0 (by #73).
Most helpful comment
The last update broke this, but you can fix this in tokenization.py, you have to add this after
vocab_file = pretrained_model_name: