Transformers: TFBertForSequenceClassification: TypeError: call() got an unexpected keyword argument 'labels'

Created on 20 Jun 2020 · 19Comments · Source: huggingface/transformers

🐛 Bug

Information

Model I am using TFBertForSequenceClassification

Language I am using the model on: English

The problem arises when using:

[X] the official example scripts: (give details below)

The tasks I am working on is:

[X] my own task or dataset: (give details below)
To classify text and learn the model and package.

To reproduce

Steps to reproduce the behavior:

pip install transformers (currently 2.11.0)
run default code on website: https://huggingface.co/transformers/model_doc/bert.html#tfbertforsequenceclassification
I tried to follow this: https://github.com/huggingface/transformers/issues/4848; and the issue remains the same.

import tensorflow as tf
from transformers import BertTokenizer, TFBertForSequenceClassification

tokenizer = BertTokenizer.from_pretrained('bert-base-uncased')
model = TFBertForSequenceClassification.from_pretrained('bert-base-uncased')
input_ids = tf.constant(tokenizer.encode("Hello, my dog is cute"))[None, :]  # Batch size 1
labels = tf.reshape(tf.constant(1), (-1, 1)) # Batch size 1
outputs = model(input_ids, labels=labels)
loss, logits = outputs[:2]

Environment info

transformers version: 2.11.0
Platform: Windows-10-10.0.18362-SP0
Python version: 3.7.7
PyTorch version (GPU?): not installed (NA)
Tensorflow version (GPU?): 2.2.0 (True)
Using GPU in script?: Yes
Using distributed or parallel set-up in script?: No

Source

afogarty85

All 19 comments

You should try this
pip install git+https://github.com/huggingface/transformers
not
pip install transformers
because the latest version isn't available in any release

QixinLi on 27 Jun 2020

👍1

It is available in the version v3.0.0 which was released this morning :)

LysandreJik on 29 Jun 2020

👎1 👍1

I'm having the same issue with v3.0.2, following is the error msg:

TypeError: tf__call() got an unexpected keyword argument 'labels'

BenjiTheC on 18 Jul 2020

I'm having the same issue with v3.0.2, following is the error msg:

TypeError: tf__call() got an unexpected keyword argument 'labels'

I would like to elaborate more upon this issue. I carefully checked the source code and the error is that the TFDistilBertModel get the label keyword argument and throws this error. I have ensured that the data is fitted as the desired form in the type of tf.data.Dataset. The code I wrote is effectively identical to the run_tf_glue.py and the example code is running correctly.

Note: I pull down the github repo and install the dependency as described here. Is this possibly related to the issue?

I also tried the following code to reinstall transformers but it still doesn't work

pip uninstall transformers
pip install git+https://github.com/huggingface/transformers

This has been to a point where I'm extremely frustrated, it would be really appreciated if someone can point me to a right direction.

BenjiTheC on 18 Jul 2020

I'm having the same issue with v3.0.2, following is the error msg:

TypeError: tf__call() got an unexpected keyword argument 'labels'

I would like to elaborate more upon this issue. I carefully checked the source code and the error is that the TFDistilBertModel get the label keyword argument and throws this error. I have ensured that the data is fitted as the desired form in the type of tf.data.Dataset. The code I wrote is effectively identical to the run_tf_glue.py and the example code is running correctly.

Note: I pull down the github repo and install the dependency as described here. Is this possibly related to the issue?

I also tried the following code to reinstall transformers but it still doesn't work
pip uninstall transformers
pip install git+https://github.com/huggingface/transformers
This has been to a point where I'm extremely frustrated, it would be really appreciated if someone can point me to a right direction.

Did you wanna do classification task?
If so, you may need to use TFDistilBertForSequenceClassification or TFDistilBertForTokenClassification。
The TFDistilBertModel class doesn't contain a classification_layer, so there isn't a 'labels' argument exists。

QixinLi on 18 Jul 2020

Post your code, as installing the latest transformers fixed the issue for me as recommended here.

afogarty85 on 18 Jul 2020

@afogarty85 Hi, please see the code below.
Regarding to the training data, the size of data is around 5000 text-label pair, I created this tf.data.Dataset inspired by the glue_convert_examples_to_features.py

import os
import json
import re
from pprint import pprint
from dataclasses import dataclass, field
from dotenv import load_dotenv

import numpy as np
import pandas as pd

import tensorflow as tf
from transformers import (
    AutoConfig,
    AutoTokenizer,
    TFAutoModel,
    TFTrainer,
    TFTrainingArguments,
)
from sklearn.metrics import precision_recall_fscore_support

from tc_data import TopCoder

load_dotenv()

def build_dataset(tokenizer):
    """ Build td.data.Dataset out of text and prize range."""
    # Load TopCoder data
    tc = TopCoder()
    tc_req = tc.get_filtered_requirements()
    tc_meta = tc.get_filtered_challenge_info()

    # Convert float prize into categorical prize range
    interval = np.linspace(0, 3000, 31)[:-1]
    tc_prz_range = tc_meta['total_prize'].apply(lambda prz: np.searchsorted(interval, prz, side='right') - 1)
    tc_prz_range.name = 'prize_cat'

    req_prz_df = pd.concat([tc_req['requirement'], tc_prz_range], axis=1) # user this df to ensure the index of text and label is aligned
    dataset_size = len(req_prz_df)

    # batched encode the str to `input_ids` and `attention_mask`
    batched_encoded = tokenizer(req_prz_df['requirement'].to_list(), padding='max_length', truncation=True)

    # Features are tuple of {'input_ids': [...], 'attention_mask': [...]} and prize range label
    features = [({k: batched_encoded[k][i] for k in batched_encoded}, req_prz_df['prize_cat'].iloc[i]) for i in range(len(req_prz_df))]

    input_names = tuple(batched_encoded.keys())
    def gen():
        """ generator used in `tf.data.Dataset.from_generator`."""
        for encoded_str, label in features:
            yield encoded_str, label

    return (
        tf.data.Dataset.from_generator(
            gen,
            ({k: tf.int32 for k in batched_encoded}, tf.int32),
            ({k: tf.TensorShape([512]) for k in batched_encoded}, tf.TensorShape([]))
        ),
        dataset_size
    )

def compute_metrics(pred):
    """ Compute eval metrics
        reference: https://huggingface.co/transformers/training.html#tensorflow
    """
    labels = pred.label_ids
    preds = pred.predictions.argmax(-1)
    precision, recall, f1, _ = precision_recall_fscore_support(labels, preds, average='binary')
    acc = (preds == labels).mean()
    return {
        'accuracy': acc,
        'f1': f1,
        'precision': precision,
        'recall': recall
    }

def finetune_with_tftrainer():
    """ Fine tune with TFTrainer"""
    config = AutoConfig.from_pretrained(os.getenv('MODEL_NAME'), cache_dir=os.getenv('OUTPUT_DIR'), num_labels=30)
    tokenizer = AutoTokenizer.from_pretrained(os.getenv('MODEL_NAME'), cache_dir=os.getenv('OUTPUT_DIR'))

    training_args = TFTrainingArguments(
        output_dir=os.getenv('OUTPUT_DIR'),
        logging_dir=os.getenv('OUTPUT_DIR'),
        overwrite_output_dir=True,
        do_train=True,
        do_eval=True,
        learning_rate=2e-5,
        )

    with training_args.strategy.scope():
        model = TFAutoModel.from_pretrained(os.getenv('MODEL_NAME'), config=config, cache_dir=os.getenv('OUTPUT_DIR'))

    # Get data for fine-tuning
    dataset, dataset_size = build_dataset(tokenizer)

    # shuffle and split train/test tasks manuanly
    dataset = dataset.shuffle(dataset_size)
    train_size, test_size = int(dataset_size * (4 / 5)), dataset_size - int(dataset_size * (4 / 5)) # 8-2 split
    train_data, test_data = dataset.take(train_size), dataset.skip(train_size)

    trainer = TFTrainer(
        model=model,
        args=training_args,
        train_dataset=train_data,
        eval_dataset=test_data,
        compute_metrics=compute_metrics
    )

    # Train the model
    trainer.train()
    trainer.save_model()
    tokenizer.save_pretrained(os.getenv('OUTPUT_DIR'))

    # Evaluate the model
    result = trainer.evaluate()
    pprint(result)
    with open(os.path.join(os.getenv('OUTPUT_DIR'), 'eval_results.json'), 'w') as fwrite:
        json.dump(result, fwrite, indent=4)

if __name__ == "__main__":
    finetune_with_tftrainer()

BenjiTheC on 18 Jul 2020

@BenjiTheC
try replace TFAutoModel with TFAutoModelForSequenceClassification

QixinLi on 18 Jul 2020

@QixinLi My purpose for fine-tuning the model is to use the last hidden layer state of BERT combining with some other features to continue forward in a bigger NN. Will using TFAutoModelForSequenceClassification compromise this goal?

Thanks!

BenjiTheC on 18 Jul 2020

@QixinLi My purpose for fine-tuning the model is to use the last hidden layer state of BERT combining with some other features to continue forward in a bigger NN. Will using TFAutoModelForSequenceClassification compromise this goal?

Thanks!

@QixinLi It's working after the replacement, but still wondering if this will impact my goal for finetuning, thanks a lot!

BenjiTheC on 18 Jul 2020

TFAutoModelForSequenceClassification 是一个封装好的用于文本分类的bert模型。以你貌似用到的TFDistilBertForSequenceClassification为例：

class TFDistilBertForSequenceClassification(TFDistilBertPreTrainedModel, TFSequenceClassificationLoss):
    def __init__(self, config, *inputs, **kwargs):
        super().__init__(config, *inputs, **kwargs)
        self.num_labels = config.num_labels

        self.distilbert = TFDistilBertMainLayer(config, name="distilbert")
        self.pre_classifier = tf.keras.layers.Dense(
            config.dim,
            kernel_initializer=get_initializer(config.initializer_range),
            activation="relu",
            name="pre_classifier",
        )
        self.classifier = tf.keras.layers.Dense(
            config.num_labels, kernel_initializer=get_initializer(config.initializer_range), name="classifier"
        )
        self.dropout = tf.keras.layers.Dropout(config.seq_classif_dropout)

它有两层线性层和一层dropout。
不过它在库里是封装好的。如果你想在bert最后一层输出之后再去加一些自定义的网络结构，可能需要自定义一个model类，并且继承TFDistilBertPreTrainedModel。

如果你只是想做文本分类，那么TFAutoModelForSequenceClassification应该能满足你的要求。

QixinLi on 18 Jul 2020

@QixinLi My purpose for fine-tuning the model is to use the last hidden layer state of BERT combining with some other features to continue forward in a bigger NN. Will using TFAutoModelForSequenceClassification compromise this goal?
Thanks!

@QixinLi It's working after the replacement, but still wondering if this will impact my goal for finetuning, thanks a lot!

I do not think so. People have tend to have found that extracting the CLS token from the hidden layer is what you want, instead of the embeddings for all your tokens. Some discussion on that is here: https://github.com/huggingface/transformers/issues/1950

afogarty85 on 18 Jul 2020

@QixinLi My purpose for fine-tuning the model is to use the last hidden layer state of BERT combining with some other features to continue forward in a bigger NN. Will using TFAutoModelForSequenceClassification compromise this goal?
Thanks!

@QixinLi It's working after the replacement, but still wondering if this will impact my goal for finetuning, thanks a lot!

I do not think so. People have tend to have found that extracting the CLS token from the hidden layer is what you want, instead of the embeddings for all your tokens. Some discussion on that is here: #1950

Can you refer me to a specific comment? My purpose is exactly described in this comment, but it seems like a ...ForSequenceClassification adds a specific mission type. Thanks!

BenjiTheC on 18 Jul 2020

TFAutoModelForSequenceClassification 是一个封装好的用于文本分类的bert模型。以你貌似用到的TFDistilBertForSequenceClassification为例：
class TFDistilBertForSequenceClassification(TFDistilBertPreTrainedModel, TFSequenceClassificationLoss):
    def __init__(self, config, *inputs, **kwargs):
        super().__init__(config, *inputs, **kwargs)
        self.num_labels = config.num_labels

        self.distilbert = TFDistilBertMainLayer(config, name="distilbert")
        self.pre_classifier = tf.keras.layers.Dense(
            config.dim,
            kernel_initializer=get_initializer(config.initializer_range),
            activation="relu",
            name="pre_classifier",
        )
        self.classifier = tf.keras.layers.Dense(
            config.num_labels, kernel_initializer=get_initializer(config.initializer_range), name="classifier"
        )
        self.dropout = tf.keras.layers.Dropout(config.seq_classif_dropout)
它有两层线性层和一层dropout。
不过它在库里是封装好的。如果你想在bert最后一层输出之后再去加一些自定义的网络结构，可能需要自定义一个model类，并且继承TFDistilBertPreTrainedModel。

如果你只是想做文本分类，那么TFAutoModelForSequenceClassification应该能满足你的要求。

@QixinLi 您好！我的目的就是把文本喂到fine-tuned BERT里之后获取最后一层输出，缀上一些其他的features之后继续通过神经网络去做分类。目前使用distillbert在本地机器上调试，跑通了之后会放到云上跑bert-large这样。如果是这样的话，我应该用哪一个类呢？或者我可以通过output[1]来从...SequenceClassification获得last hidden state吗？谢谢！

BenjiTheC on 18 Jul 2020

@QixinLi My purpose for fine-tuning the model is to use the last hidden layer state of BERT combining with some other features to continue forward in a bigger NN. Will using TFAutoModelForSequenceClassification compromise this goal?
Thanks!

@QixinLi It's working after the replacement, but still wondering if this will impact my goal for finetuning, thanks a lot!

I do not think so. People have tend to have found that extracting the CLS token from the hidden layer is what you want, instead of the embeddings for all your tokens. Some discussion on that is here: #1950

Can you refer me to a specific comment? My purpose is exactly described in this comment, but it seems like a ...ForSequenceClassification adds a specific mission type. Thanks!

Are you classifying something? If so, use ...ForSequenceClassification. This seems to be what you want to do given your text-label data. Extracting the embedding is slightly different when using ...ForSequenceClassification rather than the plain TFDistilBertModel.

But for your purpose, to classify something and to then get those embeddings, look toward this comment, as it illustrates the difference.

https://github.com/huggingface/transformers/issues/1950#issuecomment-558683444

afogarty85 on 18 Jul 2020

如果你想要last_hidden_states，可以这么做。

class MyOwnModel(TFDistilBertPreTrainedModel):
    def __init__(self, config):
        super().__init__(config)
        self.distilbert = TFDistilBertMainLayer(config, name="distilbert")
        self.classifier = tf.keras.layers.Dense(config.num_labels)

    def call(self, inputs=None, mask=None, token_type_ids=None, labels=None):
        outputs = self.distilbert(inputs, attention_mask=mask,token_type_ids=token_type_ids) 
        last_hidden_states = outputs[0] 
        # do whatever you want
        processed_hidden_states = .........
        logits = self.classifier(processed_hidden_states)
        outputs = logits
        if labels is not None:
            loss = self.compute_loss(labels, logits)
            outputs = (loss,) + outputs
        return outputs

model = MyOwnModel.from_pretrained(os.getenv('MODEL_NAME'), config=config, cache_dir=os.getenv('OUTPUT_DIR'))
input_ids = tf.constant(tokenizer.encode("Hello, my dog is cute"))[None, :]  # Batch size 1
outputs = model(input_ids)

p.s.刚刚看了一下...SequenceClassification的call()函数，发现他的返回值output[1]是过了分类层后的结果，并没有你想要的last_hidden_states。所以应该不能满足你的需求。

QixinLi on 18 Jul 2020

如果你想要last_hidden_states，可以这么做。

class MyOwnModel(TFDistilBertPreTrainedModel):
    def __init__(self, config):
        super().__init__(config)
        self.distilbert = TFDistilBertMainLayer(config, name="distilbert")
        self.classifier = tf.keras.layers.Dense(config.num_labels)

    def call(self, inputs=None, mask=None, token_type_ids=None, labels=None):
        outputs = self.distilbert(inputs, attention_mask=mask,token_type_ids=token_type_ids) 
        last_hidden_states = outputs[0] 
        # do whatever you want
        processed_hidden_states = .........
        logits = self.classifier(processed_hidden_states)
        outputs = logits
        if labels is not None:
            loss = self.compute_loss(labels, logits)
            outputs = (loss,) + outputs
        return outputs

model = MyOwnModel.from_pretrained(os.getenv('MODEL_NAME'), config=config, cache_dir=os.getenv('OUTPUT_DIR'))
input_ids = tf.constant(tokenizer.encode("Hello, my dog is cute"))[None, :]  # Batch size 1
outputs = model(input_ids)

@QixinLi 您好，你的回复解答了我的问题，十分感谢！另外有两个follow up:

distilbert 和 bert 要通过两个不同的类来读模型，但是在我的代码实现上继承TFDistilBertModel和TFBertModel都是没有区别的是吗
...PreTrainedModel貌似是一个abstract class，是不是应该继承具体的BERT/DISTILLBERT类呢？

谢谢！

BenjiTheC on 18 Jul 2020

@QixinLi My purpose for fine-tuning the model is to use the last hidden layer state of BERT combining with some other features to continue forward in a bigger NN. Will using TFAutoModelForSequenceClassification compromise this goal?
Thanks!

@QixinLi It's working after the replacement, but still wondering if this will impact my goal for finetuning, thanks a lot!

I do not think so. People have tend to have found that extracting the CLS token from the hidden layer is what you want, instead of the embeddings for all your tokens. Some discussion on that is here: #1950

Can you refer me to a specific comment? My purpose is exactly described in this comment, but it seems like a ...ForSequenceClassification adds a specific mission type. Thanks!

Are you classifying something? If so, use ...ForSequenceClassification. This seems to be what you want to do given your text-label data. Extracting the embedding is slightly different when using ...ForSequenceClassification rather than the plain TFDistilBertModel.

But for your purpose, to classify something and to then get those embeddings, look toward this comment, as it illustrates the difference.

#1950 (comment)

Thanks for the answer! I've found the inspiration I need from your referring issue. Much appreciated!

BenjiTheC on 18 Jul 2020

如果你想要last_hidden_states，可以这么做。
class MyOwnModel(TFDistilBertPreTrainedModel):
    def __init__(self, config):
        super().__init__(config)
        self.distilbert = TFDistilBertMainLayer(config, name="distilbert")
        self.classifier = tf.keras.layers.Dense(config.num_labels)

    def call(self, inputs=None, mask=None, token_type_ids=None, labels=None):
        outputs = self.distilbert(inputs, attention_mask=mask,token_type_ids=token_type_ids) 
        last_hidden_states = outputs[0] 
        # do whatever you want
        processed_hidden_states = .........
        logits = self.classifier(processed_hidden_states)
        outputs = logits
        if labels is not None:
            loss = self.compute_loss(labels, logits)
            outputs = (loss,) + outputs
        return outputs
model = MyOwnModel.from_pretrained(os.getenv('MODEL_NAME'), config=config, cache_dir=os.getenv('OUTPUT_DIR'))
input_ids = tf.constant(tokenizer.encode("Hello, my dog is cute"))[None, :]  # Batch size 1
outputs = model(input_ids)
p.s.刚刚看了一下...SequenceClassification的call()函数，发现他的返回值output[1]是过了分类层后的结果，并没有你想要的last_hidden_states。所以应该不能满足你的需求。
@QixinLi 您好，你的回复解答了我的问题，十分感谢！另外有两个follow up:

distilbert 和 bert 要通过两个不同的类来读模型，但是在我的代码实现上继承TFDistilBertModel和TFBertModel都是没有区别的是吗

...PreTrainedModel貌似是一个abstract class，是不是应该继承具体的BERT/DISTILLBERT类呢？

谢谢！

是的。之后送到机器上跑，如果要更换模型的话，需要将TFDistilBertPreTrainedModel换成TFBertPreTrainedModel。TFDistilBertMainLayer也要换成TFBertMainLayer。
可以去官方文档查看，或直接浏览transformers关于这些类的源码。
2....PreTrainedModel也是继承自tf.keras.Model，所以直接继承没有问题。（transformers里头就是这么做的。我上面的代码仅供参考，具体实现可以学习huggingface大佬们的模型代码）