Transformers: 馃殌 An advice about changing variable name from "attention_mask" to "adder"

Created on 4 May 2020  路  3Comments  路  Source: huggingface/transformers

馃殌 Feature request

Motivation

I noticed that some users are pretty confused when reading source codes about variable attention_mask
like:
What is the meaning of Attention Mask #205
Clarifying attention mask #542
And I refer to the origional BERT repository - google-research/bert. Compared to the origin, I find in this repo sometimes the concepts of attention_mask and adder are mixed.

refering original BERT: ./modeling.py#L707

attention_mask = tf.expand_dims(attention_mask, axis=[1])
adder = (1.0 - tf.cast(attention_mask, tf.float32)) * -10000.0
attention_scores += adder

But in this repo: take src/transformers/modeling_tf_openai.py#L282 as an example:

attention_mask = attention_mask[:, tf.newaxis, tf.newaxis, :]
attention_mask = tf.cast(attention_mask, tf.float32)
attention_mask = (1.0 - attention_mask) * -10000.0

and inside the method TFAttention._attn() src/transformers/modeling_tf_openai.py#L112:

if attention_mask is not None:
  # Apply the attention mask
  w = w + attention_mask

Your contribution

may be changing its name is way better?
like:

attention_mask = attention_mask[:, tf.newaxis, tf.newaxis, :]
attention_mask = tf.cast(attention_mask, tf.float32)
adder = (1.0 - attention_mask) * -10000.0

and then:

if adder is not None:
  # Apply the attention mask
  attention_score = w + adder
Good First Issue wontfix

Most helpful comment

I agree! Do you want to open a PR about this to change the naming? :-)
When doing this we just have to be careful to not change the user facing api when doing this -> which means that ideally, we should not rename any function arguments of high level modules like BertModel.forward().

All 3 comments

I agree! Do you want to open a PR about this to change the naming? :-)
When doing this we just have to be careful to not change the user facing api when doing this -> which means that ideally, we should not rename any function arguments of high level modules like BertModel.forward().

I've created PR #4566 for this issue

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

Was this page helpful?
0 / 5 - 0 ratings

Related issues

adigoryl picture adigoryl  路  3Comments

fabiocapsouza picture fabiocapsouza  路  3Comments

hsajjad picture hsajjad  路  3Comments

alphanlp picture alphanlp  路  3Comments

lemonhu picture lemonhu  路  3Comments