Transformers: Get Attention Values for Pretrained Model

Created on 3 Jul 2019  路  5Comments  路  Source: huggingface/transformers

When using BertModel.from_pretrained, I am not able to have it also return the attention layers. Why does that not word? Am I doing something wrong?

wontfix

Most helpful comment

Hi, this will be in the next release (release date sometime next week).
There will be attention/hidden-state output options for all the models.

All 5 comments

You need to install the master version (not with pip or conda) :

git clone https://github.com/huggingface/pytorch-pretrained-BERT.git
cd pytorch-pretrained-BERT
python setup.py install

Then you can use it like this :

model = BertModel.from_pretrained('bert-base-uncased',
                                   output_attentions=True,
                                   keep_multihead_output=True)
model.eval()  # turn off dropout layers
attn = model(tokens)[0]

Tell me if I'm misinterpreting your problem

Thank you a lot for the help, I didn't expect this to only work on the current release.

However, I think with this I found a problem in the BERT encoder module:

def forward(self, hidden_states, attention_mask, output_all_encoded_layers=True, head_mask=None):
        all_encoder_layers = []
        all_attentions = []
        for i, layer_module in enumerate(self.layer):
            hidden_states = layer_module(hidden_states, attention_mask, head_mask[i])

The forward function by default gets None for the head_mask parameter. Then, however, it indexes it, which causes an error. I think it would be nice to handle this case.

Hi. I want to do something similar but with the BertForQuestionAnswering model.

The BertModel is the general BERT model that is used to classify whether a sentence is the next sentence or not. I want to get the attention values for QuestionAnswering while I pass a new paragraph and a question as inputs. I want to use the BertForQuestionAnswering model (which is pretrained on SQuAD if I am not wrong) and get the self-attention values on the question words. Is it possible to achieve this in a similar way as mentioned above?

NOTE: I know the above method gives attention values of the pre-trained model. I want to get attention values of the model when I feed a new input question to the model. Something similar to what can be done using BertViz (although I do not want to visualize attention, just want to get the values).

Thanks.

Hi, this will be in the next release (release date sometime next week).
There will be attention/hidden-state output options for all the models.

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

Was this page helpful?
0 / 5 - 0 ratings

Related issues

fyubang picture fyubang  路  3Comments

zhezhaoa picture zhezhaoa  路  3Comments

ereday picture ereday  路  3Comments

siddsach picture siddsach  路  3Comments

lcswillems picture lcswillems  路  3Comments