Transformers: Cannot Load bert-base-japanese tokenizer

Created on 29 Apr 2020 · 3Comments · Source: huggingface/transformers

🐛 Bug

Information

Model I am using BertJapaneseTokenizer:

Language I am using the model on Japanese:

The problem arises when using:

[x] the official example scripts: (give details below)
[ ] my own modified scripts: (give details below)

The tasks I am working on is: Just to load

To reproduce

>>> from transformers import BertJapaneseTokenizer
>>> tokenizer = BertJapaneseTokenizer.from_pretrained('bert-base-japanese')
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/Users/bayartsogtyadamsuren/DDAM-Projects/isid/myenv/lib/python3.7/site-packages/transformers/tokenization_utils.py", line 393, in from_pretrained
    return cls._from_pretrained(*inputs, **kwargs)
  File "/Users/bayartsogtyadamsuren/DDAM-Projects/isid/myenv/lib/python3.7/site-packages/transformers/tokenization_utils.py", line 496, in _from_pretrained
    list(cls.vocab_files_names.values()),
OSError: Model name 'bert-base-japanese' was not found in tokenizers model name list (bert-base-japanese, bert-base-japanese-whole-word-masking, bert-base-japanese-char, bert-base-japanese-char-whole-word-masking). We assumed 'bert-base-japanese' was a path, a model identifier, or url to a directory containing vocabulary files named ['vocab.txt'] but couldn't find such vocabulary files at this path or url.

Expected behavior

To load

Environment info

transformers version: 2.7.0
Platform: Darwin-18.7.0-x86_64-i386-64bit
Python version: 3.7.4
PyTorch version (GPU?): 1.3.1
Tensorflow version (GPU?): not installed (NA)
Using GPU in script?: no
Using distributed or parallel set-up in script?: no

Source

bayartsogt-ya

Most helpful comment

Hi, I had the same issue and I solved it by downloading the required files locally with the steps below.

Download vocab.txt, config.json, pytorch_model.bin from the source URL

Enter the folder containing the three files in the from_pretrained method
e.g.

model = BertModel.from_pretrained ('./models/bert-base-japanese/')
config =  BertConfig('./models/bert-base-japanese/')
tokenizer = BertJapaneseTokenizer.from_pretrained('./models/bert-base-japanese/')

where

─ models
   └- bert-base-japanese
      ├- vocab.txt
      ├- config.json
      └- pytorch_model.bin

I think this is probably an obstacle caused by a change in the path on S3 due to this commit. The version of transformers installed by pip is old and you may be pointing to the wrong path.
https://github.com/huggingface/transformers/commit/455c6390938a5c737fa63e78396cedae41e4e87e

Reinstall with the latest version of transformers and it should work.

git clone [email protected]: huggingface/transformers.git
pip install ./transformers

reo11 on 29 Apr 2020

👍5

All 3 comments

Hi, I had the same issue and I solved it by downloading the required files locally with the steps below.

Download vocab.txt, config.json, pytorch_model.bin from the source URL

Enter the folder containing the three files in the from_pretrained method
e.g.

model = BertModel.from_pretrained ('./models/bert-base-japanese/')
config =  BertConfig('./models/bert-base-japanese/')
tokenizer = BertJapaneseTokenizer.from_pretrained('./models/bert-base-japanese/')

where

─ models
   └- bert-base-japanese
      ├- vocab.txt
      ├- config.json
      └- pytorch_model.bin

Reinstall with the latest version of transformers and it should work.

git clone [email protected]: huggingface/transformers.git
pip install ./transformers

reo11 on 29 Apr 2020

👍5

I apologize, it's my fault. I mved files around instead of copying them as we do usually, so I broke backward compatibility for the bert-base-japanese models.

As @reo11 said, you'll need to install from source for now. You can also do:
pip install git+git://github.com/huggingface/transformers.git

Sorry about that.

julien-c on 1 May 2020

👍4

@reo11 Thank you so much!
@julien-c Thank you for your response. Since a lot of us trying to use transformers in production too, please consider having stable workflow. (Anyways you guys doing great!)

bayartsogt-ya on 1 May 2020

👍1

Was this page helpful?

0 / 5 - 0 ratings