Transformers: 'utf-8' codec can't decode byte 0x80 in position 0: invalid start byte

Created on 24 Oct 2019  路  7Comments  路  Source: huggingface/transformers

When I load the pretrained model from the local bin file, there is a decoding problem.

Most helpful comment

@lipingbj With the latest versions of transformers you need to pass the path to the PyTorch-compatible model, so in your example use:

tokenizer = BertTokenizer.from_pretrained("/home/liping/liping/bert/")

The following files must be located in that folder:

  • vocab.txt - vocabulary file
  • pytorch_model.bin - the PyTorch-compatible (and converted) model
  • config.json - json-based model configuration

Please make sure that these files exist and e.g. rename bert-base-cased-pytorch_model.bin to pytorch_model.bin.

That should work :)

All 7 comments

Hi, could you provide more information: e.g. respect the template? Please tell us which model, which bin file, with which command?

Hi, could you provide more information: e.g. respect the template? Please tell us which model, which bin file, with which command?

tokenizer = BertTokenizer.from_pretrained("/home/liping/liping/bert/bert-base-cased-pytorch_model.bin")

XLNetModel.from_pretrained("/data2/liping/xlnet/xlnet-base-cased-pytorch_model.bin")
Those two command will make the problem occur.

@lipingbj With the latest versions of transformers you need to pass the path to the PyTorch-compatible model, so in your example use:

tokenizer = BertTokenizer.from_pretrained("/home/liping/liping/bert/")

The following files must be located in that folder:

  • vocab.txt - vocabulary file
  • pytorch_model.bin - the PyTorch-compatible (and converted) model
  • config.json - json-based model configuration

Please make sure that these files exist and e.g. rename bert-base-cased-pytorch_model.bin to pytorch_model.bin.

That should work :)

@lipingbj With the latest versions of transformers you need to pass the path to the PyTorch-compatible model, so in your example use:

tokenizer = BertTokenizer.from_pretrained("/home/liping/liping/bert/")

The following files must be located in that folder:

  • vocab.txt - vocabulary file
  • pytorch_model.bin - the PyTorch-compatible (and converted) model
  • config.json - json-based model configuration

Please make sure that these files exist and e.g. rename bert-base-cased-pytorch_model.bin to pytorch_model.bin.

That should work :)

encoder_model = BertModel.from_pretrained("/home/liping/liping/bert/pytorch-bert-model")
tokenizer = BertTokenizer.from_pretrained("/home/liping/liping/bert/pytorch-bert-model")

vocab.txt, pytorch_model.bin, config.json have included in directory bert/pytorch-bert-model

OSError: Model name '/home/liping/liping/bert/pytorch-bert-model' was not found in model name list (bert-base-uncased, bert-large-uncased, bert-base-cased, bert-large-cased, bert-base-multilingual-uncased, bert-base-multilingual-cased, bert-base-chinese, bert-base-german-cased, bert-large-uncased-whole-word-masking, bert-large-cased-whole-word-masking, bert-large-uncased-whole-word-masking-finetuned-squad, bert-large-cased-whole-word-masking-finetuned-squad, bert-base-cased-finetuned-mrpc, bert-base-german-dbmdz-cased, bert-base-german-dbmdz-uncased). We assumed '/home/liping/liping/bert/pytorch-bert-model/config.json' was a path or url to a configuration file named config.json or a directory containing such a file but couldn't find any such file at this path or url.

As the error says, "We assumed '/home/liping/liping/bert/pytorch-bert-model/config.json' was a path or url to a configuration file named config.json or a directory containing such a file but couldn't find any such file at this path or url."

Your data does not seem to be in "/home/liping/liping/bert/pytorch-bert-model"

Hello,

I'm trying to load biobert into pytorch, seeing a different error:
UnicodeDecodeError: 'utf-8' codec can't decode byte 0x8b in position 1: invalid start byte

any hints? @LysandreJik

Hello,

I'm trying to load biobert into pytorch, seeing a different error:
UnicodeDecodeError: 'utf-8' codec can't decode byte 0x8b in position 1: invalid start byte

any hints? @LysandreJik

Can you show the code that you are running to load from pre-trained weights?
For example

model = BertForSequenceClassification.from_pretrained('/path/to/directory/containing/model_artifacts/')

As stefan-it mentioned above, the directory must contain the 3 required files.

Was this page helpful?
0 / 5 - 0 ratings