Flair: BertEmbeddings('bert-base-multilingual-uncased') not found

Created on 25 Apr 2019 · 16Comments · Source: flairNLP/flair

Not sure whether it's a bug, therefore tagged as question. I want to load the Bert embeddings by calling

from flair.embeddings import BertEmbeddings
bert_embeddings = BertEmbeddings('bert-base-multilingual-uncased')

It gives the following error:

Model name 'bert-base-multilingual-uncased' was not found in model name list (bert-base-uncased, bert-large-uncased, bert-base-cased, bert-large-cased, bert-base-multilingual-uncased, bert-base-multilingual-cased, bert-base-chinese). We assumed 'https://s3.amazonaws.com/models.huggingface.co/bert/bert-base-multilingual-uncased-vocab.txt' was a path or url but couldn't find any file associated to this path or url.

What I do not understand:
1) The string I pass as argument clearly IS in the list.
2) When I open the link, the text file seems to contain lots of weird tokens and special characters.

Why is that?

question

Source

Janinanu

All 16 comments

I think you must make sure, that you're using a recent version of pytorch-pretrained-bert, so you should try a pip install --upgrade pytorch-pretrained-bert :)

stefan-it on 25 Apr 2019

Thanks. I did that, the error persists though. It is still showing me the same error message...

Janinanu on 26 Apr 2019

Hm that is strange. I just ran the code on a fresh colab notebook at it works. Did you install from pip or are you working on the master branch?

alanakbik on 26 Apr 2019

I used

pip install flair
pip install --upgrade pytorch-pretrained-bert

with Python 3.7 and PyTorch 1.0.1

Janinanu on 26 Apr 2019

Maybe there's an older version of flair installed, could you try to run pip install --upgrade flair?

stefan-it on 26 Apr 2019

No, pip install --upgrade flair tells me that all requirements are up-to-date...

Janinanu on 26 Apr 2019

This is strange, here's what I tried to reproduce it:

$ python3.7 -m venv /tmp/flair-venv
$ source /tmp/flair-venv/bin/activate
(flair-venv) $ pip install --upgrade flair
(flair-venv) $ pip install --upgrade pytorch-pretrained-bert
(flair-venv) $ python
Python 3.7.1 (default, Oct 22 2018, 11:21:55) 
[GCC 8.2.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> 
>>> from flair.embeddings import BertEmbeddings
>>> bert_embeddings = BertEmbeddings('bert-base-multilingual-uncased')
>>>

stefan-it on 26 Apr 2019

Could you give us more information about your Python environment :)

stefan-it on 26 Apr 2019

I am using Python 3.7 via a remote interpreter on Ubuntu 18.04 with conda version 4.6.14.
Please let me know if any other specific information is relevant.

Is there a link where I can directly download the Bert embeddings as used by Flair?

Janinanu on 26 Apr 2019

Hi, what version of pytorch-pretrained-bert do you have?

import pytorch_pretrained_bert
pytorch_pretrained_bert.__version__

thomwolf on 26 Apr 2019

>>> import pytorch_pretrained_bert
>>> pytorch_pretrained_bert.__version__
'0.6.2'

Janinanu on 27 Apr 2019

This might be somehow related to a bug in pytorch_pretrained_bert v0.6.2.

I cannot reproduce OP's error with

Python 3.7.2
flair v0.4.1
pytorch_pretrained_bert v0.6.2
BertEmbeddings('bert-base-multilingual-uncased')

but get an AttributeError (which, I must admit, would be a different issue) when embedding a sentence:

  File "flair/models/sequence_tagger_model.py", line 300, in predict
    tags, _ = self.forward_labels_and_loss(batch, sort=False)
  File "flair/models/sequence_tagger_model.py", line 268, in forward_labels_and_loss
    feature, lengths, tags = self.forward(sentences, sort=sort)
  File "flair/models/sequence_tagger_model.py", line 315, in forward
    self.embeddings.embed(sentences)
  File "flair/embeddings.py", line 130, in embed
    embedding.embed(sentences)
  File "flair/embeddings.py", line 63, in embed
    self._add_embeddings_internal(sentences)
  File "flair/embeddings.py", line 1143, in _add_embeddings_internal
    max([self.tokenizer.tokenize(sentence.to_tokenized_string()) for sentence in sentences], key=len))
  File "flair/embeddings.py", line 1143, in <listcomp>
    max([self.tokenizer.tokenize(sentence.to_tokenized_string()) for sentence in sentences], key=len))
  File "pytorch_pretrained_bert/tokenization.py", line 109, in tokenize
    if self.do_basic_tokenize:
AttributeError: 'BertTokenizer' object has no attribute 'do_basic_tokenize'

However, everything works fine with pytorch_pretrained_bert v0.6.1. So I guess the whole thing might solve itself with v0.6.3?

BTW: I don't know what's going wrong here, because the BertTokenizer _does_ have an attribute do_base_tokenize – but it's the wrong place here to start discussing that anyway.

severinsimmler on 29 Apr 2019

@Janinanu Is there any version information found for flair when you execute:

import flair
print(flair.__version__)

in your virtual environment?

@severinsimmler Could you provide a full code snippet for that error? I would really like to reproduce it (maybe we can add some nice unit tests for that cases) :)

stefan-it on 29 Apr 2019

@stefan-it, I think I just found a fix for my bug in the flair code, will make a PR with some more details :)

severinsimmler on 29 Apr 2019

👍1

Sorry, false alarm... I definitely can't reproduce OP's error, and the following example works out just fine with the versions I mentioned above:

>>> from flair.data import Sentence
>>> from flair.embeddings import BertEmbeddings
>>> sentence = Sentence("This is a sentence.")
>>> embedding = BertEmbeddings("bert-base-multilingual-cased")
>>> embedding.embed(sentence)

My use case was loading a sequence tagger model _trained_ with pytorch_pretrained_bert v0.6.1, but _predicting_ with v0.6.2:

>>> from flair.data import Sentence
>>> from flair.models import SequenceTagger
>>> tagger = SequenceTagger.load_from_file("model.pt")
>>> sentence = Sentence("This is a sentence.")
>>> tagger.predict(sentence)
AttributeError: 'BertTokenizer' object has no attribute 'do_basic_tokenize'

The AttributeError is obvious, because the BertTokenizer in v0.6.1 (= loaded from the model.pt) indeed had no do_basic_tokenize attribute, but the object in v0.6.2 does have.

severinsimmler on 30 Apr 2019

For some reason, it now works. I don't know why and how though. Thanks everyone :)

Janinanu on 3 May 2019

👍1

Was this page helpful?

0 / 5 - 0 ratings