I'm currently fine-tuning BERTForSequenceClassification model for a classification task and I wanted to know if there are ways to add additional layers before the final classification layer?
Hi @psureshmagadi17 , you can add additional layers easily, take a loot the source code for BERTForSequenceClassification, you can take that code as it is and add the additional layers before the final classifier.
Hi @psureshmagadi17 , you can add additional layers easily, take a loot the source code for
BERTForSequenceClassification, you can take that code as it is and add the additional layers before the final classifier.
Hi @patil-suraj , thank you for your response. Did you mean that we can just alter the code in main class? If yes, do you have an example?
Hi @psureshmagadi17, if your goal is to add layers to a pretrained model only for fine-tuning BERTForSequenceClassification I think the best option is to modify the BertForSequenceClassification Module.
If you want to add attention layers, make sure to use the sequence_output of the BertModel Module and not the pooled_output in the forward function, then use a BertPooler layer before the classifier.
Hi @psureshmagadi17, if your goal is to add layers to a pretrained model only for fine-tuning BERTForSequenceClassification I think the best option is to modify the BertForSequenceClassification Module.
If you want to add attention layers, make sure to use the sequence_output of the BertModel Module and not the pooled_output in the forward function, then use a BertPooler layer before the classifier.
Hi @nassim-yagoub - thank you for the response! I'm fairly new to this process i.e., modify the network structure. Do you have an example or discussion that I can follow to help me through this process?
A small example:
import torch.nn as nn
from transformers import BertModel
class CustomBERTModel(nn.Module):
def __init__(self):
super(CustomBERTModel, self).__init__()
self.bert = BertModel.from_pretrained("bert-base-uncased")
# add your additional layers here, for example a dropout layer followed by a linear classification head
self.dropout = nn.Dropout(0.3)
self.out = nn.Linear(768, 2)
def forward(self, ids, mask, token_type_ids):
sequence_output, pooled_output = self.bert(
ids,
attention_mask=mask,
token_type_ids=token_type_ids
)
# we apply dropout to the sequence output, tensor has shape (batch_size, sequence_length, 768)
sequence_output = self.dropout(sequence_output)
# next, we apply the linear layer. The linear layer (which applies a linear transformation)
# takes as input the hidden states of all tokens (so seq_len times a vector of size 768, each corresponding to
# a single token in the input sequence) and outputs 2 numbers (scores, or logits) for every token
# so the logits are of shape (batch_size, sequence_length, 2)
logits = self.out(sequence_output)
return logits
A small example:
import torch.nn as nn from transformers import BertModel class CustomBERTModel(nn.Module): def __init__(self): super(CustomBERTModel, self).__init__() self.bert = BertModel.from_pretrained("bert-base-uncased") # add your additional layers here, for example a dropout layer followed by a linear classification head self.dropout = nn.Dropout(0.3) self.out = nn.Linear(768, 2) def forward(self, ids, mask, token_type_ids): sequence_output, pooled_output = self.bert( ids, attention_mask=mask, token_type_ids=token_type_ids ) # we apply dropout to the sequence output, tensor has shape (batch_size, sequence_length, 768) sequence_output = self.dropout(sequence_output) # next, we apply the linear layer. The linear layer (which applies a linear transformation) # takes as input the hidden states of all tokens (so seq_len times a vector of size 768, each corresponding to # a single token in the input sequence) and outputs 2 numbers (scores, or logits) for every token # so the logits are of shape (batch_size, sequence_length, 2) logits = self.out(sequence_output) return logits
Thank you, @NielsRogge
For example if you want to add the same layers used in Bert, you may want to modify the Module this way (with new_layers_config being the same than the original config, except for the number of layers):
class BertForSequenceClassification(BertPreTrainedModel):
def __init__(self, config, new_layers_config):
super().__init__(config)
self.num_labels = config.num_labels
self.bert = BertModel(config)
self.new_layers = BertEncoder(new_layers_config)
self.pooler = BertPooler(config)
self.dropout = nn.Dropout(config.hidden_dropout_prob)
self.classifier = nn.Linear(config.hidden_size, config.num_labels)
self.init_weights()
def forward(
self,
input_ids=None,
attention_mask=None,
token_type_ids=None,
position_ids=None,
head_mask=None,
inputs_embeds=None,
labels=None,
output_attentions=None,
output_hidden_states=None,
):
outputs = self.bert(
input_ids,
attention_mask=attention_mask,
token_type_ids=token_type_ids,
position_ids=position_ids,
head_mask=head_mask,
inputs_embeds=inputs_embeds,
output_attentions=output_attentions,
output_hidden_states=output_hidden_states,
)
sequence_output = outputs[0]
new_layers_output = self.new_layers(sequence_output)[0]
pooled_output = self.pooler(new_layers_output)
pooled_output = self.dropout(pooled_output)
logits = self.classifier(pooled_output)
outputs = (logits,) + outputs[2:] # add hidden states and attention if they are here
if labels is not None:
if self.num_labels == 1:
# We are doing regression
loss_fct = MSELoss()
loss = loss_fct(logits.view(-1), labels.view(-1))
else:
loss_fct = CrossEntropyLoss()
loss = loss_fct(logits.view(-1, self.num_labels), labels.view(-1))
outputs = (loss,) + outputs
return outputs # (loss), logits, (hidden_states), (attentions)
We added a BertEncoder and a BertPooler to the base implementation.
You can also retreive the hidden_states and attention of the new layers if you want to, I did not do it here.
Thanks @nassim-yagoub !
@nassim-yagoub - I had another question : are the weights for BERTForSequenceClassification Model layers frozen by default?
The weights are not frozen by default when you load them, however you can manually freeze them with .requires_grad = False
The weights are not frozen by default when you load them, however you can manually freeze them with
.requires_grad = False
Thank you @nassim-yagoub!
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.
Most helpful comment
A small example: