Transformers: cannot access to pretrained vocab file on S3

Created on 30 Nov 2018  路  4Comments  路  Source: huggingface/transformers

Hi, thanks for develop well-made pytorch version of BERT.
Unfortunately, pretrained vocab files are not reachable.

error traceback is below.

File "/usr/local/lib/python3.6/dist-packages/pytorch_pretrained_bert/tokenization.py", line 124, in from_pretrained
resolved_vocab_file = cached_path(vocab_file)
File "/usr/local/lib/python3.6/dist-packages/pytorch_pretrained_bert/file_utils.py", line 88, in cached_path
return get_from_cache(url_or_filename, cache_dir)
File "/usr/local/lib/python3.6/dist-packages/pytorch_pretrained_bert/file_utils.py", line 178, in get_from_cache
.format(url, response.status_code))
OSError: HEAD request failed for url https://s3.amazonaws.com/models.huggingface.co/bert/bert-base-uncased-vocab.txt with status code 404

Most helpful comment

I found temporary solution for this issue.
BertTokenizer.from_pretrained method accepts local file instead of model_name
ex) BertTokenizer.from_pretrained('/dir/to/vocab/bert-base-uncased-vocab.txt')

vocab txt file can be downloaded from google bert repo.

All 4 comments

I have the same issue.

OSError: HEAD request failed for url https://s3.amazonaws.com/models.huggingface.co/bert/bert-base-chinese-vocab.txt with status code 404

It would be nice to be able to cache the vocab files as well as the model weights out of the box.

I found temporary solution for this issue.
BertTokenizer.from_pretrained method accepts local file instead of model_name
ex) BertTokenizer.from_pretrained('/dir/to/vocab/bert-base-uncased-vocab.txt')

vocab txt file can be downloaded from google bert repo.

The files are back. Sorry, wrong manipulation while adding the new models.

I found temporary solution for this issue.
BertTokenizer.from_pretrained method accepts local file instead of model_name
ex) BertTokenizer.from_pretrained('/dir/to/vocab/bert-base-uncased-vocab.txt')

Well, this solution doesn't seem to be working now, I get

OSError: Model name 'path/to/model/vocab.txt' was not found in tokenizers model name list (bart-/model/large, bart-large-mnli, bart-large-cnn, bart-large-xsum). We assumed 'path/to/model/vocab.txt' was a path, a model identifier, or url to a directory containing vocabulary files named ['vocab.json', 'merges.txt'] but couldn't find such vocabulary files at this path or url.

Was this page helpful?
0 / 5 - 0 ratings