Transformers: Adding `class_weights` argument for the loss function of transformers model

Created on 9 Sep 2020 · 15Comments · Source: huggingface/transformers

🚀 Feature request

To provide a parameter called class_weights while initializing a sequence classification model. The attribute will be used to calculate the weighted loss which is useful for classification with imbalanced datasets.

from transformers import DistilBertForSequenceClassification

# Note the additional class_weights attribute
model = DistilBertForSequenceClassification.from_pretrained(
    "distilbert-base-uncased", 
    num_labels=5, 
    class_weights=[5, 3, 2, 1, 1])

class_weights will provide the same functionality as the weight parameter of Pytorch losses like torch.nn.CrossEntropyLoss.

Motivation

There have been similar issues raised before on "How to provide class weights for imbalanced classification dataset". See #297, #1755,

And I ended up modifying the transformers code to get the class weights (shown below), and it looks like an easy addition which can benefit many.

Your contribution

This should be possible because currently the loss for Sequence classification in the forward method is initialized like below:

    loss_fct = nn.CrossEntropyLoss() # <- Defined without the weight parameter
    loss = loss_fct(logits.view(-1, self.num_labels), labels.view(-1))

And we can add the weight attribute of Pytorch and pass the class_weights recorded during model initialization.

    loss_fct = nn.CrossEntropyLoss(weight=self.class_weights)
    loss = loss_fct(logits.view(-1, self.num_labels), labels.view(-1))

I am happy to implement this and provide a PR. Although I am new to the transformers package and may require some iterative code reviews from the senior contributors/members.

Source

nvs-abhilash

Most helpful comment

It's a bit hard to know how to guide you when you don't explain to use how you train your model. Are you using Trainer? Then you should subclass it and override the brand new compute_loss method that I just added to make this use case super easy. There is an example in the docs (note that you will need an install from source for this).

sgugger on 12 Sep 2020

🎉3

All 15 comments

This would be a cool addition! This would need to be an attribute added to the configuration, similarly to what is done for num_labels or other configuration attributes. You can see the implementation in PretrainedConfig, here. Feel free to open a PR and ping me on it!

LysandreJik on 9 Sep 2020

Thanks @LysandreJik . I'll work on the initial draft and create a PR.

nvs-abhilash on 9 Sep 2020

After having thought about it with @sgugger, it's probably not such a great idea to allow changing this parameter directly in the model configuration. If we enable this change, then we'll eventually have to support more arguments to pass to the loss functions, while the goal of the config and model loss computation isn't to be a feature-complete loss computation system, but to provide the very basic and simple traditional loss for the use-case (classification in this case).

In specific cases like this one where you would like to tweak the loss parameters, it would be better to get the logits back and compute the loss yourself with the logits and labels (no need to modify the transformers code when doing so).

LysandreJik on 9 Sep 2020

Ok. Understood. Thanks, @LysandreJik and @sgugger for your points. 👍

I'll work on a custom solution for the same.

nvs-abhilash on 9 Sep 2020

@LysandreJik @sgugger

After having thought about it with @sgugger, it's probably not such a great idea to allow changing this parameter directly in the model configuration. If we enable this change, then we'll eventually have to support more arguments to pass to the loss functions, while the goal of the config and model loss computation isn't to be a feature-complete loss computation system, but to provide the very basic and simple traditional loss for the use-case (classification in this case).

Well, what about allowing a dict to be passed to the prediction heads and then passed to the CrossEntropyLoss function.

Like this:

cross_entropy_loss_params = {"weight": [0.8, 1.2, 0.97]}

loss_fct = CrossEntropyLoss(**cross_entropy_loss_params )

Here for example: https://github.com/huggingface/transformers/blob/76818cc4c6a1275a23ba261ca337b9f9070c397e/src/transformers/modeling_bert.py#L943

This way you would:

implement no breaking change
open up everything for all parameters the API user wants to set

@LysandreJik @sgugger @nvs-abhilash what do you think?

PhilipMay on 10 Sep 2020

I'm not sure where you would pass that dict. Could you precise that part?

sgugger on 10 Sep 2020

I'm not sure where you would pass that dict. Could you precise that part?

@sgugger I just did start a RP (which is just a demo in the current state) that explains how I would implement it.
See here: #7057

The code would look like this:

model_name = 'bert-base-german-dbmdz-uncased'

config = AutoConfig.from_pretrained(
    model_name,
    num_labels=3,
)

model = AutoModelForSequenceClassification.from_pretrained(
    model_name,
    config=config,
    loss_function_params={"weight": [0.8, 1.2, 0.97]}
)

I would be happy about feedback and to finish this PR.

@LysandreJik @nvs-abhilash what do you think?

PhilipMay on 10 Sep 2020

Reopening the issue, since the discussion is going on.

nvs-abhilash on 11 Sep 2020

@LysandreJik @nvs-abhilash what do you think?

It looks good to me and I am happy to contribute but I guess it's up to @sgugger and @LysandreJik to provide more explanation on the feasibility and potential implications on the project.

nvs-abhilash on 11 Sep 2020

Supporting the last comment made, we don't intend for PreTrainedModels to provide a feature-complete loss computation system. We expect them to provide the simplest loss that's traditionally used in most cases.

We would rather encourage users to retrieve the logits from the models and compute the loss themselves when having different use-cases than the very basic approach, like it is usually done with nn.Modules, like so:

logits = model(**input_dict)
loss = CrossEntropyLoss(weight=[0.8, 1.2, 0.97])

output = loss(logits, labels)

LysandreJik on 11 Sep 2020

👍1

Hi @LysandreJik ok...
Could you please briefly explain how those 3 lines of code are used from users (API) perspective?

As a Hugging Face Transformers user: when I want to train a new Text classifier with unbalanced classes and do model = AutoModelForSequenceClassification.from_pretrained(model_name, config=config) how do I get CrossEntropyLoss(weight=[0.8, 1.2, 0.97]) into that?

I could just subclass BertForSequenceClassification for example and write the complete forward function from scratch again. But this would be 99% cut and paste and IMO not the way a good and open API like _HF Transformers_ should be designed. IMO this is not good from usability point of view.

If I understand you right @LysandreJik you do not want to force new model type developers to support the API that I suggested in my PR #7057 because you think that would be too much work to do. But IMO you do not consider the needs of the API user.

PhilipMay on 12 Sep 2020