Transformers: Get Attention Values for Pretrained Model

Created on 3 Jul 2019 · 5Comments · Source: huggingface/transformers

When using BertModel.from_pretrained, I am not able to have it also return the attention layers. Why does that not word? Am I doing something wrong?

wontfix

Source

Sparkier

Most helpful comment

Hi, this will be in the next release (release date sometime next week).
There will be attention/hidden-state output options for all the models.

thomwolf on 5 Jul 2019

👍3

All 5 comments

You need to install the master version (not with pip or conda) :

git clone https://github.com/huggingface/pytorch-pretrained-BERT.git
cd pytorch-pretrained-BERT
python setup.py install

Then you can use it like this :

model = BertModel.from_pretrained('bert-base-uncased',
                                   output_attentions=True,
                                   keep_multihead_output=True)
model.eval()  # turn off dropout layers
attn = model(tokens)[0]

Tell me if I'm misinterpreting your problem

LouisGerard on 4 Jul 2019

Thank you a lot for the help, I didn't expect this to only work on the current release.

However, I think with this I found a problem in the BERT encoder module:

def forward(self, hidden_states, attention_mask, output_all_encoded_layers=True, head_mask=None):
        all_encoder_layers = []
        all_attentions = []
        for i, layer_module in enumerate(self.layer):
            hidden_states = layer_module(hidden_states, attention_mask, head_mask[i])

The forward function by default gets None for the head_mask parameter. Then, however, it indexes it, which causes an error. I think it would be nice to handle this case.

Sparkier on 4 Jul 2019

Hi. I want to do something similar but with the BertForQuestionAnswering model.

The BertModel is the general BERT model that is used to classify whether a sentence is the next sentence or not. I want to get the attention values for QuestionAnswering while I pass a new paragraph and a question as inputs. I want to use the BertForQuestionAnswering model (which is pretrained on SQuAD if I am not wrong) and get the self-attention values on the question words. Is it possible to achieve this in a similar way as mentioned above?

NOTE: I know the above method gives attention values of the pre-trained model. I want to get attention values of the model when I feed a new input question to the model. Something similar to what can be done using BertViz (although I do not want to visualize attention, just want to get the values).

Thanks.

SouravDutta91 on 5 Jul 2019

Hi, this will be in the next release (release date sometime next week).
There will be attention/hidden-state output options for all the models.

thomwolf on 5 Jul 2019

👍3

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.