Transformers: BERT Tokenizer not working! Failed to load the bert-base-uncased model.

Created on 21 Jun 2019  路  5Comments  路  Source: huggingface/transformers

The sentence that is being tokenized is: "Weather: Summer鈥檚 Finally Here. So Where Is It?"
But it gives the following error:

Error message:
AttributeError Traceback (most recent call last)
in
----> 1 correct_pairs = convert_sentence_pair(df_full.title.tolist(), df_full.desc.tolist(), max_seq_length=200, tokenizer=tokenizer)
2
3

in convert_sentence_pair(titles, descs, max_seq_length, tokenizer)
3 for (ex_index, (title, desc)) in enumerate(zip(titles, descs)):
4 print(title)
----> 5 tokens_a = tokenizer.tokenize(title)
6
7 tokens_b = None

AttributeError: 'NoneType' object has no attribute 'tokenize'

When I tried to load the module manually I got the following issue:
tokenizer = BertTokenizer.from_pretrained(
... "bert-base-uncased", do_lower_case=True,
... cache_dir=PYTORCH_PRETRAINED_BERT_CACHE)
Model name 'bert-base-uncased' was not found in model name list (bert-base-cased, bert-large-uncased, bert-large-cased, bert-base-multilingual-cased, bert-base-chinese, bert-base-uncased, bert-base-multilingual-uncased). We assumed 'https://s3.amazonaws.com/models.huggingface.co/bert/bert-base-uncased-vocab.txt' was a path or url but couldn't find any file associated to this path or url.

Can anyone please help?

wontfix

Most helpful comment

If you are using Kaggle then make sure that the internet toggle button is switched on the right-hand side.

All 5 comments

Do you have a good internet connection? The error messages will be improved in the coming release but usually, this comes from the library not being able to reach AWS S3 servers to download the pretrained weights.

@thomwolf Thank you so much for your quick response! I followed your advice to people on other posts where they can't load the model. What I did then is to try to download and test the model in the command line.
So I tried the following and it worked.

What I couldn't understand is the fact that why I have to manually import BERT packages in a python shell when I already installed it using pip3?

Below is what I tried and it worked.

from pytorch_pretrained_bert.modeling import BertForNextSentencePrediction
KeyboardInterrupt
model = BertForNextSentencePrediction.from_pretrained(
... "bert-base-uncased"
... ).to(device)
100%|鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅| 407873900/407873900 [00:08<00:00, 48525133.57B/s]
Traceback (most recent call last):
File "", line 3, in
NameError: name 'device' is not defined

#

I fixed the device thing and below is the proper output.

from pytorch_pretrained_bert.modeling import BertForNextSentencePrediction
model = BertForNextSentencePrediction.from_pretrained(
... "bert-base-uncased"
... ).to(device)

I solved the problem by removing 'cache_dir=PYTORCH_PRETRAINED_BERT_CACHE'. The function is trying to find the downloaded model in your cache_dir, but if you haven't downloaded anything. then you should remove this argument.

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

If you are using Kaggle then make sure that the internet toggle button is switched on the right-hand side.

Was this page helpful?
0 / 5 - 0 ratings

Related issues

hsajjad picture hsajjad  路  3Comments

quocnle picture quocnle  路  3Comments

siddsach picture siddsach  路  3Comments

zhezhaoa picture zhezhaoa  路  3Comments

0x01h picture 0x01h  路  3Comments