Transformers: The pre-trained model you are loading is a cased model but you have not set `do_lower_case` to False.

Created on 7 Aug 2019 · 2Comments · Source: huggingface/transformers

I initialized the tokenizer and the model like

def load_bert_score_model(bert="bert-base-multilingual-cased", num_layers=8):

    assert bert in bert_types

    tokenizer = BertTokenizer.from_pretrained(bert, do_lower_case=True)
    model = BertModel.from_pretrained(bert)
    model.eval()
    device = 'cuda' if torch.cuda.is_available() else 'cpu'
    model.to(device)

    # drop unused layers
    model.encoder.layer = torch.nn.ModuleList([layer for layer in model.encoder.layer[:num_layers]])

    return model, tokenizer

so setting the do_lower_case=True, but I'm getting this warning:

The pre-trained model you are loading is a cased model but you have not set `do_lower_case` to False. We are setting `do_lower_case=False` for you but you may want to check this behavior.

Source

loretoparisi

Most helpful comment

Hi! You seem to be loading a cased model (such as the bert-base-multilingual-cased), but you're specifying do_lower_case to your tokenizer, which strips accents and lowercases every character.

The model you specified has been trained with uppercase and lowercase characters as well as accent markers, so you should use it with such characters as well. If you're looking at using only lowercase characters, it would be better for you to use an uncased model (such as the bert-base-multilingual-uncased).

LysandreJik on 7 Aug 2019

👍4

All 2 comments

Hi! You seem to be loading a cased model (such as the bert-base-multilingual-cased), but you're specifying do_lower_case to your tokenizer, which strips accents and lowercases every character.

LysandreJik on 7 Aug 2019

👍4

@LysandreJik that is correct, thank you.

loretoparisi on 7 Aug 2019

Was this page helpful?

0 / 5 - 0 ratings