Model I am using TFBertForSequenceClassification
Language I am using the model on: English
The problem arises when using:
The tasks I am working on is:
Steps to reproduce the behavior:
import tensorflow as tf
from transformers import BertTokenizer, TFBertForSequenceClassification
tokenizer = BertTokenizer.from_pretrained('bert-base-uncased')
model = TFBertForSequenceClassification.from_pretrained('bert-base-uncased')
input_ids = tf.constant(tokenizer.encode("Hello, my dog is cute"))[None, :] # Batch size 1
labels = tf.reshape(tf.constant(1), (-1, 1)) # Batch size 1
outputs = model(input_ids, labels=labels)
loss, logits = outputs[:2]
transformers version: 2.11.0You should try this
pip install git+https://github.com/huggingface/transformers
not
pip install transformers
because the latest version isn't available in any release
It is available in the version v3.0.0 which was released this morning :)
I'm having the same issue with v3.0.2, following is the error msg:
TypeError: tf__call() got an unexpected keyword argument 'labels'
I'm having the same issue with v3.0.2, following is the error msg:
TypeError: tf__call() got an unexpected keyword argument 'labels'
I would like to elaborate more upon this issue. I carefully checked the source code and the error is that the TFDistilBertModel get the label keyword argument and throws this error. I have ensured that the data is fitted as the desired form in the type of tf.data.Dataset. The code I wrote is effectively identical to the run_tf_glue.py and the example code is running correctly.
Note: I pull down the github repo and install the dependency as described here. Is this possibly related to the issue?
I also tried the following code to reinstall transformers but it still doesn't work
pip uninstall transformers
pip install git+https://github.com/huggingface/transformers
This has been to a point where I'm extremely frustrated, it would be really appreciated if someone can point me to a right direction.
I'm having the same issue with v3.0.2, following is the error msg:
TypeError: tf__call() got an unexpected keyword argument 'labels'I would like to elaborate more upon this issue. I carefully checked the source code and the error is that the
TFDistilBertModelget thelabelkeyword argument and throws this error. I have ensured that the data is fitted as the desired form in the type oftf.data.Dataset. The code I wrote is effectively identical to therun_tf_glue.pyand the example code is running correctly.Note: I pull down the github repo and install the dependency as described here. Is this possibly related to the issue?
I also tried the following code to reinstall transformers but it still doesn't work
pip uninstall transformers pip install git+https://github.com/huggingface/transformersThis has been to a point where I'm extremely frustrated, it would be really appreciated if someone can point me to a right direction.
Did you wanna do classification task?
If so, you may need to use TFDistilBertForSequenceClassification or TFDistilBertForTokenClassification。
The TFDistilBertModel class doesn't contain a classification_layer, so there isn't a 'labels' argument exists。
Post your code, as installing the latest transformers fixed the issue for me as recommended here.
@afogarty85 Hi, please see the code below.
Regarding to the training data, the size of data is around 5000 text-label pair, I created this tf.data.Dataset inspired by the glue_convert_examples_to_features.py
import os
import json
import re
from pprint import pprint
from dataclasses import dataclass, field
from dotenv import load_dotenv
import numpy as np
import pandas as pd
import tensorflow as tf
from transformers import (
AutoConfig,
AutoTokenizer,
TFAutoModel,
TFTrainer,
TFTrainingArguments,
)
from sklearn.metrics import precision_recall_fscore_support
from tc_data import TopCoder
load_dotenv()
def build_dataset(tokenizer):
""" Build td.data.Dataset out of text and prize range."""
# Load TopCoder data
tc = TopCoder()
tc_req = tc.get_filtered_requirements()
tc_meta = tc.get_filtered_challenge_info()
# Convert float prize into categorical prize range
interval = np.linspace(0, 3000, 31)[:-1]
tc_prz_range = tc_meta['total_prize'].apply(lambda prz: np.searchsorted(interval, prz, side='right') - 1)
tc_prz_range.name = 'prize_cat'
req_prz_df = pd.concat([tc_req['requirement'], tc_prz_range], axis=1) # user this df to ensure the index of text and label is aligned
dataset_size = len(req_prz_df)
# batched encode the str to `input_ids` and `attention_mask`
batched_encoded = tokenizer(req_prz_df['requirement'].to_list(), padding='max_length', truncation=True)
# Features are tuple of {'input_ids': [...], 'attention_mask': [...]} and prize range label
features = [({k: batched_encoded[k][i] for k in batched_encoded}, req_prz_df['prize_cat'].iloc[i]) for i in range(len(req_prz_df))]
input_names = tuple(batched_encoded.keys())
def gen():
""" generator used in `tf.data.Dataset.from_generator`."""
for encoded_str, label in features:
yield encoded_str, label
return (
tf.data.Dataset.from_generator(
gen,
({k: tf.int32 for k in batched_encoded}, tf.int32),
({k: tf.TensorShape([512]) for k in batched_encoded}, tf.TensorShape([]))
),
dataset_size
)
def compute_metrics(pred):
""" Compute eval metrics
reference: https://huggingface.co/transformers/training.html#tensorflow
"""
labels = pred.label_ids
preds = pred.predictions.argmax(-1)
precision, recall, f1, _ = precision_recall_fscore_support(labels, preds, average='binary')
acc = (preds == labels).mean()
return {
'accuracy': acc,
'f1': f1,
'precision': precision,
'recall': recall
}
def finetune_with_tftrainer():
""" Fine tune with TFTrainer"""
config = AutoConfig.from_pretrained(os.getenv('MODEL_NAME'), cache_dir=os.getenv('OUTPUT_DIR'), num_labels=30)
tokenizer = AutoTokenizer.from_pretrained(os.getenv('MODEL_NAME'), cache_dir=os.getenv('OUTPUT_DIR'))
training_args = TFTrainingArguments(
output_dir=os.getenv('OUTPUT_DIR'),
logging_dir=os.getenv('OUTPUT_DIR'),
overwrite_output_dir=True,
do_train=True,
do_eval=True,
learning_rate=2e-5,
)
with training_args.strategy.scope():
model = TFAutoModel.from_pretrained(os.getenv('MODEL_NAME'), config=config, cache_dir=os.getenv('OUTPUT_DIR'))
# Get data for fine-tuning
dataset, dataset_size = build_dataset(tokenizer)
# shuffle and split train/test tasks manuanly
dataset = dataset.shuffle(dataset_size)
train_size, test_size = int(dataset_size * (4 / 5)), dataset_size - int(dataset_size * (4 / 5)) # 8-2 split
train_data, test_data = dataset.take(train_size), dataset.skip(train_size)
trainer = TFTrainer(
model=model,
args=training_args,
train_dataset=train_data,
eval_dataset=test_data,
compute_metrics=compute_metrics
)
# Train the model
trainer.train()
trainer.save_model()
tokenizer.save_pretrained(os.getenv('OUTPUT_DIR'))
# Evaluate the model
result = trainer.evaluate()
pprint(result)
with open(os.path.join(os.getenv('OUTPUT_DIR'), 'eval_results.json'), 'w') as fwrite:
json.dump(result, fwrite, indent=4)
if __name__ == "__main__":
finetune_with_tftrainer()
@BenjiTheC
try replace TFAutoModel with TFAutoModelForSequenceClassification
@QixinLi My purpose for fine-tuning the model is to use the last hidden layer state of BERT combining with some other features to continue forward in a bigger NN. Will using TFAutoModelForSequenceClassification compromise this goal?
Thanks!
@QixinLi My purpose for fine-tuning the model is to use the last hidden layer state of BERT combining with some other features to continue forward in a bigger NN. Will using
TFAutoModelForSequenceClassificationcompromise this goal?Thanks!
@QixinLi It's working after the replacement, but still wondering if this will impact my goal for finetuning, thanks a lot!
TFAutoModelForSequenceClassification 是一个封装好的用于文本分类的bert模型。以你貌似用到的TFDistilBertForSequenceClassification为例:
class TFDistilBertForSequenceClassification(TFDistilBertPreTrainedModel, TFSequenceClassificationLoss):
def __init__(self, config, *inputs, **kwargs):
super().__init__(config, *inputs, **kwargs)
self.num_labels = config.num_labels
self.distilbert = TFDistilBertMainLayer(config, name="distilbert")
self.pre_classifier = tf.keras.layers.Dense(
config.dim,
kernel_initializer=get_initializer(config.initializer_range),
activation="relu",
name="pre_classifier",
)
self.classifier = tf.keras.layers.Dense(
config.num_labels, kernel_initializer=get_initializer(config.initializer_range), name="classifier"
)
self.dropout = tf.keras.layers.Dropout(config.seq_classif_dropout)
它有两层线性层和一层dropout。
不过它在库里是封装好的。如果你想在bert最后一层输出之后再去加一些自定义的网络结构,可能需要自定义一个model类,并且继承TFDistilBertPreTrainedModel。
如果你只是想做文本分类,那么TFAutoModelForSequenceClassification应该能满足你的要求。
@QixinLi My purpose for fine-tuning the model is to use the last hidden layer state of BERT combining with some other features to continue forward in a bigger NN. Will using
TFAutoModelForSequenceClassificationcompromise this goal?
Thanks!@QixinLi It's working after the replacement, but still wondering if this will impact my goal for finetuning, thanks a lot!
I do not think so. People have tend to have found that extracting the CLS token from the hidden layer is what you want, instead of the embeddings for all your tokens. Some discussion on that is here: https://github.com/huggingface/transformers/issues/1950
@QixinLi My purpose for fine-tuning the model is to use the last hidden layer state of BERT combining with some other features to continue forward in a bigger NN. Will using
TFAutoModelForSequenceClassificationcompromise this goal?
Thanks!@QixinLi It's working after the replacement, but still wondering if this will impact my goal for finetuning, thanks a lot!
I do not think so. People have tend to have found that extracting the CLS token from the hidden layer is what you want, instead of the embeddings for all your tokens. Some discussion on that is here: #1950
Can you refer me to a specific comment? My purpose is exactly described in this comment, but it seems like a ...ForSequenceClassification adds a specific mission type. Thanks!
TFAutoModelForSequenceClassification是一个封装好的用于文本分类的bert模型。以你貌似用到的TFDistilBertForSequenceClassification为例:class TFDistilBertForSequenceClassification(TFDistilBertPreTrainedModel, TFSequenceClassificationLoss): def __init__(self, config, *inputs, **kwargs): super().__init__(config, *inputs, **kwargs) self.num_labels = config.num_labels self.distilbert = TFDistilBertMainLayer(config, name="distilbert") self.pre_classifier = tf.keras.layers.Dense( config.dim, kernel_initializer=get_initializer(config.initializer_range), activation="relu", name="pre_classifier", ) self.classifier = tf.keras.layers.Dense( config.num_labels, kernel_initializer=get_initializer(config.initializer_range), name="classifier" ) self.dropout = tf.keras.layers.Dropout(config.seq_classif_dropout)它有两层线性层和一层dropout。
不过它在库里是封装好的。如果你想在bert最后一层输出之后再去加一些自定义的网络结构,可能需要自定义一个model类,并且继承TFDistilBertPreTrainedModel。如果你只是想做文本分类,那么
TFAutoModelForSequenceClassification应该能满足你的要求。
@QixinLi 您好!我的目的就是把文本喂到fine-tuned BERT里之后获取最后一层输出,缀上一些其他的features之后继续通过神经网络去做分类。 目前使用distillbert在本地机器上调试,跑通了之后会放到云上跑bert-large这样。如果是这样的话,我应该用哪一个类呢?或者我可以通过output[1]来从...SequenceClassification获得last hidden state吗?谢谢!
@QixinLi My purpose for fine-tuning the model is to use the last hidden layer state of BERT combining with some other features to continue forward in a bigger NN. Will using
TFAutoModelForSequenceClassificationcompromise this goal?
Thanks!@QixinLi It's working after the replacement, but still wondering if this will impact my goal for finetuning, thanks a lot!
I do not think so. People have tend to have found that extracting the CLS token from the hidden layer is what you want, instead of the embeddings for all your tokens. Some discussion on that is here: #1950
Can you refer me to a specific comment? My purpose is exactly described in this comment, but it seems like a
...ForSequenceClassificationadds a specific mission type. Thanks!
Are you classifying something? If so, use ...ForSequenceClassification. This seems to be what you want to do given your text-label data. Extracting the embedding is slightly different when using ...ForSequenceClassification rather than the plain TFDistilBertModel.
But for your purpose, to classify something and to then get those embeddings, look toward this comment, as it illustrates the difference.
https://github.com/huggingface/transformers/issues/1950#issuecomment-558683444
如果你想要last_hidden_states,可以这么做。
class MyOwnModel(TFDistilBertPreTrainedModel):
def __init__(self, config):
super().__init__(config)
self.distilbert = TFDistilBertMainLayer(config, name="distilbert")
self.classifier = tf.keras.layers.Dense(config.num_labels)
def call(self, inputs=None, mask=None, token_type_ids=None, labels=None):
outputs = self.distilbert(inputs, attention_mask=mask,token_type_ids=token_type_ids)
last_hidden_states = outputs[0]
# do whatever you want
processed_hidden_states = .........
logits = self.classifier(processed_hidden_states)
outputs = logits
if labels is not None:
loss = self.compute_loss(labels, logits)
outputs = (loss,) + outputs
return outputs
model = MyOwnModel.from_pretrained(os.getenv('MODEL_NAME'), config=config, cache_dir=os.getenv('OUTPUT_DIR'))
input_ids = tf.constant(tokenizer.encode("Hello, my dog is cute"))[None, :] # Batch size 1
outputs = model(input_ids)
p.s.刚刚看了一下...SequenceClassification的call()函数,发现他的返回值output[1]是过了分类层后的结果,并没有你想要的last_hidden_states。所以应该不能满足你的需求。
如果你想要last_hidden_states,可以这么做。
class MyOwnModel(TFDistilBertPreTrainedModel): def __init__(self, config): super().__init__(config) self.distilbert = TFDistilBertMainLayer(config, name="distilbert") self.classifier = tf.keras.layers.Dense(config.num_labels) def call(self, inputs=None, mask=None, token_type_ids=None, labels=None): outputs = self.distilbert(inputs, attention_mask=mask,token_type_ids=token_type_ids) last_hidden_states = outputs[0] # do whatever you want processed_hidden_states = ......... logits = self.classifier(processed_hidden_states) outputs = logits if labels is not None: loss = self.compute_loss(labels, logits) outputs = (loss,) + outputs return outputsmodel = MyOwnModel.from_pretrained(os.getenv('MODEL_NAME'), config=config, cache_dir=os.getenv('OUTPUT_DIR')) input_ids = tf.constant(tokenizer.encode("Hello, my dog is cute"))[None, :] # Batch size 1 outputs = model(input_ids)p.s.刚刚看了一下
...SequenceClassification的call()函数,发现他的返回值output[1]是过了分类层后的结果,并没有你想要的last_hidden_states。所以应该不能满足你的需求。
@QixinLi 您好,你的回复解答了我的问题,十分感谢!另外有两个follow up:
TFDistilBertModel和TFBertModel都是没有区别的是吗...PreTrainedModel貌似是一个abstract class,是不是应该继承具体的BERT/DISTILLBERT类呢?谢谢!
@QixinLi My purpose for fine-tuning the model is to use the last hidden layer state of BERT combining with some other features to continue forward in a bigger NN. Will using
TFAutoModelForSequenceClassificationcompromise this goal?
Thanks!@QixinLi It's working after the replacement, but still wondering if this will impact my goal for finetuning, thanks a lot!
I do not think so. People have tend to have found that extracting the CLS token from the hidden layer is what you want, instead of the embeddings for all your tokens. Some discussion on that is here: #1950
Can you refer me to a specific comment? My purpose is exactly described in this comment, but it seems like a
...ForSequenceClassificationadds a specific mission type. Thanks!Are you classifying something? If so, use
...ForSequenceClassification. This seems to be what you want to do given your text-label data. Extracting the embedding is slightly different when using...ForSequenceClassificationrather than the plainTFDistilBertModel.But for your purpose, to classify something and to then get those embeddings, look toward this comment, as it illustrates the difference.
Thanks for the answer! I've found the inspiration I need from your referring issue. Much appreciated!
如果你想要last_hidden_states,可以这么做。
class MyOwnModel(TFDistilBertPreTrainedModel): def __init__(self, config): super().__init__(config) self.distilbert = TFDistilBertMainLayer(config, name="distilbert") self.classifier = tf.keras.layers.Dense(config.num_labels) def call(self, inputs=None, mask=None, token_type_ids=None, labels=None): outputs = self.distilbert(inputs, attention_mask=mask,token_type_ids=token_type_ids) last_hidden_states = outputs[0] # do whatever you want processed_hidden_states = ......... logits = self.classifier(processed_hidden_states) outputs = logits if labels is not None: loss = self.compute_loss(labels, logits) outputs = (loss,) + outputs return outputsmodel = MyOwnModel.from_pretrained(os.getenv('MODEL_NAME'), config=config, cache_dir=os.getenv('OUTPUT_DIR')) input_ids = tf.constant(tokenizer.encode("Hello, my dog is cute"))[None, :] # Batch size 1 outputs = model(input_ids)p.s.刚刚看了一下
...SequenceClassification的call()函数,发现他的返回值output[1]是过了分类层后的结果,并没有你想要的last_hidden_states。所以应该不能满足你的需求。@QixinLi 您好,你的回复解答了我的问题,十分感谢!另外有两个follow up:
- distilbert 和 bert 要通过两个不同的类来读模型,但是在我的代码实现上继承
TFDistilBertModel和TFBertModel都是没有区别的是吗...PreTrainedModel貌似是一个abstract class,是不是应该继承具体的BERT/DISTILLBERT类呢?谢谢!
TFDistilBertPreTrainedModel换成TFBertPreTrainedModel。TFDistilBertMainLayer也要换成TFBertMainLayer。...PreTrainedModel也是继承自tf.keras.Model,所以直接继承没有问题。(transformers里头就是这么做的。我上面的代码仅供参考,具体实现可以学习huggingface大佬们的模型代码)