Transformers: Why is `encoder_extended_attention_mask = None` when `config.is_decoder == False`

Created on 3 Jul 2020  路  5Comments  路  Source: huggingface/transformers

Potential Bug(?)

Reading the codebase I see that attention masks are ignored for many of the pretrained model configs such as 'bert-base-uncased'. We can see here that the attention mask is simply cleared out. Is this intentional?

from transformers import BertModel
config_path = 'bert-base-uncased'
config = BertModel.config_class.from_pretrained(config_path)
print(f'is_decoder: {config.is_decoder}')

outputs False

Most helpful comment

This encoder_attention_mask is only relevent for a Bert EncoderDecoder model. It is not the same as the usual attention_mask

All 5 comments

The encoder_attention_mask is only relevant if BERT is uses as a Encoder-Decoder model using the EncoderDecoderModel wrapper class. In this case the decoder should be able to accept an encoder_attention_mask for its cross-attention layers.

In all other cases this mask is not relevant and should be set to None.

I agree that the check if self.is_decoder is probably not the best one here it should rather be if self.is_encoder_decoder and self.is_decoder. will update this soon.

Feel free to reopen if this does not answer your question

Hi Patrick, thanks for the swift response. I鈥檓 not sure if I understand: shouldn鈥檛 we always want to mask the padded tokens, even in the encoder?

In fact the canonical BERT model suggests this, where they have no such check: https://github.com/google-research/bert/blob/master/modeling.py#L200

@patrickvonplaten Sorry for the noise. Noticed you said to reopen the issue but I think only maintainers have this permission :)

This encoder_attention_mask is only relevent for a Bert EncoderDecoder model. It is not the same as the usual attention_mask

Ah, I see. Looking again at the code I definitely misunderstood that. Thanks a ton.

Was this page helpful?
0 / 5 - 0 ratings