Transformers: Unable to import TF models

Created on 15 Oct 2019 · 9Comments · Source: huggingface/transformers

🐛 Bug

Model I am using (Bert, XLNet....): Bert

Language I am using the model on (English, Chinese....): English

The problem arise when using:

[x] the official example scripts: Quick tour TF 2.0 training and PyTorch interoperability from github homepage

To Reproduce

Steps to reproduce the behavior:

Install libraries (update tensorflow to 2.0.0)

!pip install tensorflow-gpu
!pip install torch
!pip install transformers

Run example
```import tensorflow as tf
import tensorflow_datasets
from transformers import *

Load dataset, tokenizer, model from pretrained model/vocabulary

tokenizer = BertTokenizer.from_pretrained('bert-base-cased')
model = TFBertForSequenceClassification.from_pretrained('bert-base-cased')
data = tensorflow_datasets.load('glue/mrpc')

Prepare dataset for GLUE as a tf.data.Dataset instance

train_dataset = glue_convert_examples_to_features(data['train'], tokenizer, max_length=128, task='mrpc')
valid_dataset = glue_convert_examples_to_features(data['validation'], tokenizer, max_length=128, task='mrpc')
train_dataset = train_dataset.shuffle(100).batch(32).repeat(2)
valid_dataset = valid_dataset.batch(64)

Prepare training: Compile tf.keras model with optimizer, loss and learning rate schedule

optimizer = tf.keras.optimizers.Adam(learning_rate=3e-5, epsilon=1e-08, clipnorm=1.0)
loss = tf.keras.losses.SparseCategoricalCrossentropy(from_logits=True)
metric = tf.keras.metrics.SparseCategoricalAccuracy('accuracy')
model.compile(optimizer=optimizer, loss=loss, metrics=[metric])

Train and evaluate using tf.keras.Model.fit()

history = model.fit(train_dataset, epochs=2, steps_per_epoch=115,
validation_data=valid_dataset, validation_steps=7)

Load the TensorFlow model in PyTorch for inspection

model.save_pretrained('./save/')
pytorch_model = BertForSequenceClassification.from_pretrained('./save/', from_tf=True)

Quickly test a few predictions - MRPC is a paraphrasing task, let's see if our model learned the task

sentence_0 = "This research was consistent with his findings."
sentence_1 = "His findings were compatible with this research."
sentence_2 = "His findings were not compatible with this research."
inputs_1 = tokenizer.encode_plus(sentence_0, sentence_1, add_special_tokens=True, return_tensors='pt')
inputs_2 = tokenizer.encode_plus(sentence_0, sentence_2, add_special_tokens=True, return_tensors='pt')

pred_1 = pytorch_model(inputs_1)[0].argmax().item()
pred_2 = pytorch_model(inputs_2)[0].argmax().item()
print("sentence_1 is", "a paraphrase" if pred_1 else "not a paraphrase", "of sentence_0")
print("sentence_2 is", "a paraphrase" if pred_2 else "not a paraphrase", "of sentence_0")

3. Get error

  5 # Load dataset, tokenizer, model from pretrained model/vocabulary
  6 tokenizer = BertTokenizer.from_pretrained('bert-base-cased')

----> 7 model = TFBertForSequenceClassification.from_pretrained('bert-base-cased')
8 data = tensorflow_datasets.load('glue/mrpc')
9

NameError: name 'TFBertForSequenceClassification' is not defined
```

Environment

Google collab

I get the same error when trying to use any TF version of the transformers.

Source

tylerjthomas9

Most helpful comment

@GrahamboJangles If you have issues with the import of tensorflow models on a blank colab notebook, please make sure you have the correct tensorflow version installed in your colab environment (2.0+). You can do so by overriding the already-installed TensorFlow with the following command:

!pip install tensorflow==2.0.0

LysandreJik on 17 Oct 2019

👍3

All 9 comments

Can you run the following and report back? It might be that you have some namespace conflict.

! pip list | grep "tensorflow"   # Check tensorflow==2.0.0, tensorflow-gpu==2.0.0
! pip list | grep "transformers" # Check transformers>=2.0.0

dataframing on 15 Oct 2019

👍1

Cleaning the environment fixed the issue. You are right, there was a namespace conflict.

tylerjthomas9 on 15 Oct 2019

@tylerjthomas9 - I'm having the same problem. Can you elaborate on what you did to fix the namespace conflict?

GrahamboJangles on 16 Oct 2019

!pip install tensorflow==2.0.0

LysandreJik on 17 Oct 2019

👍3

@LysandreJik - I made sure I had Tensorflow 2.0.0 and I still get the same error.

100%|██████████| 231508/231508 [00:00<00:00, 2665916.96B/s]
100%|██████████| 313/313 [00:00<00:00, 195011.46B/s]
100%|██████████| 440473133/440473133 [00:05<00:00, 73953508.44B/s]
100%|██████████| 815973/815973 [00:00<00:00, 5548125.39B/s]
100%|██████████| 458495/458495 [00:00<00:00, 3162846.19B/s]
ftfy or spacy is not installed using BERT BasicTokenizer instead of SpaCy & ftfy.
100%|██████████| 273/273 [00:00<00:00, 154235.59B/s]
100%|██████████| 478750579/478750579 [00:08<00:00, 56444018.22B/s]
This tokenizer does not make use of special tokens. Input is returned with no modification.
This tokenizer does not make use of special tokens. Input is returned with no modification.
This tokenizer does not make use of special tokens.
100%|██████████| 1042301/1042301 [00:00<00:00, 7120216.12B/s]
100%|██████████| 456318/456318 [00:00<00:00, 3926917.54B/s]
100%|██████████| 176/176 [00:00<00:00, 110459.00B/s]
100%|██████████| 548118077/548118077 [00:09<00:00, 59420986.50B/s]
This tokenizer does not make use of special tokens. Input is returned with no modification.
This tokenizer does not make use of special tokens. Input is returned with no modification.
This tokenizer does not make use of special tokens.
4608350B [00:00, 42689870.73B/s]
2257285B [00:00, 28527684.80B/s]          
100%|██████████| 611/611 [00:00<00:00, 408988.15B/s]
100%|██████████| 6552025106/6552025106 [03:27<00:00, 31645156.91B/s]
This tokenizer does not make use of special tokens. Input is returned with no modification.
This tokenizer does not make use of special tokens. Input is returned with no modification.
This tokenizer does not make use of special tokens.
100%|██████████| 9143613/9143613 [00:00<00:00, 29615841.04B/s]
100%|██████████| 606/606 [00:00<00:00, 397210.22B/s]
100%|██████████| 1140884800/1140884800 [00:21<00:00, 53037879.64B/s]
This tokenizer does not make use of special tokens. Input is returned with no modification.
This tokenizer does not make use of special tokens. Input is returned with no modification.
This tokenizer does not make use of special tokens.
100%|██████████| 798011/798011 [00:00<00:00, 5526095.41B/s]
100%|██████████| 641/641 [00:00<00:00, 405390.36B/s]
100%|██████████| 467042463/467042463 [00:08<00:00, 52695048.04B/s]
100%|██████████| 1452741/1452741 [00:00<00:00, 8067948.45B/s]
100%|██████████| 1008321/1008321 [00:00<00:00, 5690556.88B/s]
100%|██████████| 396/396 [00:00<00:00, 225243.34B/s]
100%|██████████| 830122454/830122454 [00:24<00:00, 33868891.23B/s]
100%|██████████| 492/492 [00:00<00:00, 307311.63B/s]
100%|██████████| 267967963/267967963 [00:14<00:00, 18543027.08B/s]
100%|██████████| 898823/898823 [00:00<00:00, 6115044.08B/s]
100%|██████████| 456318/456318 [00:00<00:00, 3196420.05B/s]
100%|██████████| 473/473 [00:00<00:00, 295048.45B/s]
100%|██████████| 501200538/501200538 [00:06<00:00, 77291522.27B/s]
---------------------------------------------------------------------------
OSError                                   Traceback (most recent call last)
/usr/local/lib/python3.6/dist-packages/transformers/configuration_utils.py in from_pretrained(cls, pretrained_model_name_or_path, **kwargs)
    132         try:
--> 133             resolved_config_file = cached_path(config_file, cache_dir=cache_dir, force_download=force_download, proxies=proxies)
    134         except EnvironmentError:

3 frames
OSError: file roberta-base not found

During handling of the above exception, another exception occurred:

OSError                                   Traceback (most recent call last)
/usr/local/lib/python3.6/dist-packages/transformers/configuration_utils.py in from_pretrained(cls, pretrained_model_name_or_path, **kwargs)
    143                         ', '.join(cls.pretrained_config_archive_map.keys()),
    144                         config_file, CONFIG_NAME)
--> 145             raise EnvironmentError(msg)
    146 
    147         if resolved_config_file == config_file:

OSError: Model name 'roberta-base' was not found in model name list (bert-base-uncased, bert-large-uncased, bert-base-cased, bert-large-cased, bert-base-multilingual-uncased, bert-base-multilingual-cased, bert-base-chinese, bert-base-german-cased, bert-large-uncased-whole-word-masking, bert-large-cased-whole-word-masking, bert-large-uncased-whole-word-masking-finetuned-squad, bert-large-cased-whole-word-masking-finetuned-squad, bert-base-cased-finetuned-mrpc, bert-base-german-dbmdz-cased, bert-base-german-dbmdz-uncased). We assumed 'roberta-base' was a path or url to a configuration file named config.json or a directory containing such a file but couldn't find any such file at this path or url.

GrahamboJangles on 19 Oct 2019

@GrahamboJangles this does not seem to be the same error. It seems to me that you're trying to load a RoBERTa checkpoint in a BERT model/tokenizer.

LysandreJik on 22 Oct 2019

@LysandreJik - Maybe that is the problem, but tokenizer = BertTokenizer.from_pretrained('bert-base-uncased') so I don't see why it would be trying to use a RoBERTa checkpoint unless there's something I'm missing. Also, when I try with the RobertaModel I get the same error.

GrahamboJangles on 25 Oct 2019

Could you provide a script so that we can try and reproduce the error on our side?

LysandreJik on 25 Oct 2019

@LysandreJik - Here's my Colab notebook.

GrahamboJangles on 25 Oct 2019

Was this page helpful?

0 / 5 - 0 ratings