Transformers: GPT2 model does not have attention mask

Created on 17 Jul 2019  路  5Comments  路  Source: huggingface/transformers

Hello, in the doc string of GPT2 model, it says there is an optional input called attention_mask to avoid computing attention on paddings. But actually I cannot find the implementation and there is no such arguments either.

Most helpful comment

GPT-2 is a model with absolute position embeddings (like Bert) so you should always pad on the right to get best performances for this model (will add this information to the doc_string).

As it's a causal model (only attend to the left context), also means that the model will not attend to the padding tokens (which are on the right) for any real token anyway.

So in conclusion, no need to take special care of avoiding attention on padding.

Just don't use the output of the padded tokens for anything as they don't contain any reliable information (which is obvious I hope).

All 5 comments

Indeed, I will remove this doctring, there is no attention_mask on GPT-2.

Indeed, I will remove this doctring, there is no attention_mask on GPT-2.

But what to do if I do want to avoid computing attention on the paddings in the input sequences.

@Saner3 @thomwolf I have same question? don't we need that for paddings?

GPT-2 is a model with absolute position embeddings (like Bert) so you should always pad on the right to get best performances for this model (will add this information to the doc_string).

As it's a causal model (only attend to the left context), also means that the model will not attend to the padding tokens (which are on the right) for any real token anyway.

So in conclusion, no need to take special care of avoiding attention on padding.

Just don't use the output of the padded tokens for anything as they don't contain any reliable information (which is obvious I hope).

@thomwolf thanks much, and great job!

Was this page helpful?
0 / 5 - 0 ratings

Related issues

siddsach picture siddsach  路  3Comments

delip picture delip  路  3Comments

hsajjad picture hsajjad  路  3Comments

HansBambel picture HansBambel  路  3Comments

fabiocapsouza picture fabiocapsouza  路  3Comments