To provide a parameter called class_weights while initializing a sequence classification model. The attribute will be used to calculate the weighted loss which is useful for classification with imbalanced datasets.
from transformers import DistilBertForSequenceClassification
# Note the additional class_weights attribute
model = DistilBertForSequenceClassification.from_pretrained(
"distilbert-base-uncased",
num_labels=5,
class_weights=[5, 3, 2, 1, 1])
class_weights will provide the same functionality as the weight parameter of Pytorch losses like torch.nn.CrossEntropyLoss.
There have been similar issues raised before on "How to provide class weights for imbalanced classification dataset". See #297, #1755,
And I ended up modifying the transformers code to get the class weights (shown below), and it looks like an easy addition which can benefit many.
This should be possible because currently the loss for Sequence classification in the forward method is initialized like below:
loss_fct = nn.CrossEntropyLoss() # <- Defined without the weight parameter
loss = loss_fct(logits.view(-1, self.num_labels), labels.view(-1))
And we can add the weight attribute of Pytorch and pass the class_weights recorded during model initialization.
loss_fct = nn.CrossEntropyLoss(weight=self.class_weights)
loss = loss_fct(logits.view(-1, self.num_labels), labels.view(-1))
I am happy to implement this and provide a PR. Although I am new to the transformers package and may require some iterative code reviews from the senior contributors/members.
This would be a cool addition! This would need to be an attribute added to the configuration, similarly to what is done for num_labels or other configuration attributes. You can see the implementation in PretrainedConfig, here. Feel free to open a PR and ping me on it!
Thanks @LysandreJik . I'll work on the initial draft and create a PR.
After having thought about it with @sgugger, it's probably not such a great idea to allow changing this parameter directly in the model configuration. If we enable this change, then we'll eventually have to support more arguments to pass to the loss functions, while the goal of the config and model loss computation isn't to be a feature-complete loss computation system, but to provide the very basic and simple traditional loss for the use-case (classification in this case).
In specific cases like this one where you would like to tweak the loss parameters, it would be better to get the logits back and compute the loss yourself with the logits and labels (no need to modify the transformers code when doing so).
Ok. Understood. Thanks, @LysandreJik and @sgugger for your points. 馃憤
I'll work on a custom solution for the same.
@LysandreJik @sgugger
After having thought about it with @sgugger, it's probably not such a great idea to allow changing this parameter directly in the model configuration. If we enable this change, then we'll eventually have to support more arguments to pass to the loss functions, while the goal of the config and model loss computation isn't to be a feature-complete loss computation system, but to provide the very basic and simple traditional loss for the use-case (classification in this case).
Well, what about allowing a dict to be passed to the prediction heads and then passed to the CrossEntropyLoss function.
Like this:
cross_entropy_loss_params = {"weight": [0.8, 1.2, 0.97]}
loss_fct = CrossEntropyLoss(**cross_entropy_loss_params )
Here for example: https://github.com/huggingface/transformers/blob/76818cc4c6a1275a23ba261ca337b9f9070c397e/src/transformers/modeling_bert.py#L943
This way you would:
@LysandreJik @sgugger @nvs-abhilash what do you think?
I'm not sure where you would pass that dict. Could you precise that part?
I'm not sure where you would pass that dict. Could you precise that part?
@sgugger I just did start a RP (which is just a demo in the current state) that explains how I would implement it.
See here: #7057
The code would look like this:
model_name = 'bert-base-german-dbmdz-uncased'
config = AutoConfig.from_pretrained(
model_name,
num_labels=3,
)
model = AutoModelForSequenceClassification.from_pretrained(
model_name,
config=config,
loss_function_params={"weight": [0.8, 1.2, 0.97]}
)
I would be happy about feedback and to finish this PR.
@LysandreJik @nvs-abhilash what do you think?
Reopening the issue, since the discussion is going on.
@LysandreJik @nvs-abhilash what do you think?
It looks good to me and I am happy to contribute but I guess it's up to @sgugger and @LysandreJik to provide more explanation on the feasibility and potential implications on the project.
Supporting the last comment made, we don't intend for PreTrainedModels to provide a feature-complete loss computation system. We expect them to provide the simplest loss that's traditionally used in most cases.
We would rather encourage users to retrieve the logits from the models and compute the loss themselves when having different use-cases than the very basic approach, like it is usually done with nn.Modules, like so:
logits = model(**input_dict)
loss = CrossEntropyLoss(weight=[0.8, 1.2, 0.97])
output = loss(logits, labels)
Hi @LysandreJik ok...
Could you please briefly explain how those 3 lines of code are used from users (API) perspective?
As a Hugging Face Transformers user: when I want to train a new Text classifier with unbalanced classes and do model = AutoModelForSequenceClassification.from_pretrained(model_name, config=config) how do I get CrossEntropyLoss(weight=[0.8, 1.2, 0.97]) into that?
I could just subclass BertForSequenceClassification for example and write the complete forward function from scratch again. But this would be 99% cut and paste and IMO not the way a good and open API like _HF Transformers_ should be designed. IMO this is not good from usability point of view.
If I understand you right @LysandreJik you do not want to force new model type developers to support the API that I suggested in my PR #7057 because you think that would be too much work to do. But IMO you do not consider the needs of the API user.
It's a bit hard to know how to guide you when you don't explain to use how you train your model. Are you using Trainer? Then you should subclass it and override the brand new compute_loss method that I just added to make this use case super easy. There is an example in the docs (note that you will need an install from source for this).
Ok. Super easy. Thanks @sgugger ! Thats it! :-))
@nvs-abhilash I think the answer closes this issue - right?
Then you should subclass it and override the brand new compute_loss method that I just added to make this use case super easy
Thanks, @sgugger , this will definitely solve my problem as well!
Most helpful comment
It's a bit hard to know how to guide you when you don't explain to use how you train your model. Are you using
Trainer? Then you should subclass it and override the brand newcompute_lossmethod that I just added to make this use case super easy. There is an example in the docs (note that you will need an install from source for this).