tokenizer = BertTokenizer.from_pretrained('bert-base-german-cased')
Output:
Model name 'bert-base-german-cased' was not found in model name list (bert-base-uncased, bert-large-uncased, bert-base-cased, bert-large-cased, bert-base-multilingual-uncased, bert-base-multilingual-cased, bert-base-chinese). We assumed 'bert-base-german-cased' was a path or url but couldn't find any file associated to this path or url.
Hi @laifi,
I cannot reproduce this issue. Are you sure that you run with the latest code from master branch? It looks suspicious to me that tokenizer = BertTokenizer.from_pretrained('bert-base-german-cased') doesn't find the model.
Can you please check if you have the according line in your PRETRAINED_VOCAB_ARCHIVE_MAP?
For your second approach with downloaded files:
from_pretrained expects a model name or path not a .bin . You should try: BertTokenizer.from_pretrained('YOUR_PATH_TO/bert-base-german-cased') Hope that helps!
Thank you @tholor , i installed the package with pip and i cannot find 'bert-german-cased' in PRETRAINED_VOCAB_ARCHIVE_MAP
Now , i tried to reinstall the package from source and it's working .
@laifi I am keep getting the same error as the one that you got:
UnicodeDecodeError: 'utf-8' codec can't decode byte 0x80 in position 0: invalid start byte
I also tried to reinstall it, how did you fix it?
@laifi I am keep getting the same error as the one that you got:
UnicodeDecodeError: 'utf-8' codec can't decode byte 0x80 in position 0: invalid start byte
I also tried to reinstall it, how did you fix it?
@shaked571 , i have just uninstalled the pip package and installed it again from source (try to not keep any cache for the package).
PS: the issue is fixed in the last migration from pytorch-pretrained-bert to pytorch-transformers .
Hi,
I also run into the same issue when I try this piece of code in google colab.
tokenizer = BertTokenizer.from_pretrained('bert-base-german-cased')
Hi,
I also have the same issue. Using
from transformers import AutoTokenizer
tokenizer = AutoTokenizer.from_pretrained("bert-base-german-cased")
solves the problem for me
Most helpful comment
Hi @laifi,
I cannot reproduce this issue. Are you sure that you run with the latest code from master branch? It looks suspicious to me that
tokenizer = BertTokenizer.from_pretrained('bert-base-german-cased')doesn't find the model.Can you please check if you have the according line in your PRETRAINED_VOCAB_ARCHIVE_MAP?
For your second approach with downloaded files:
from_pretrainedexpects a model name or path not a .bin . You should try: BertTokenizer.from_pretrained('YOUR_PATH_TO/bert-base-german-cased')Hope that helps!