Transformers: GPT2 model does not have attention mask

Created on 17 Jul 2019 · 5Comments · Source: huggingface/transformers

Hello, in the doc string of GPT2 model, it says there is an optional input called attention_mask to avoid computing attention on paddings. But actually I cannot find the implementation and there is no such arguments either.

Source

Saner3

Most helpful comment

GPT-2 is a model with absolute position embeddings (like Bert) so you should always pad on the right to get best performances for this model (will add this information to the doc_string).

As it's a causal model (only attend to the left context), also means that the model will not attend to the padding tokens (which are on the right) for any real token anyway.

So in conclusion, no need to take special care of avoiding attention on padding.

Just don't use the output of the padded tokens for anything as they don't contain any reliable information (which is obvious I hope).

thomwolf on 20 Aug 2019

👍4

All 5 comments

Indeed, I will remove this doctring, there is no attention_mask on GPT-2.

thomwolf on 17 Jul 2019

Indeed, I will remove this doctring, there is no attention_mask on GPT-2.

But what to do if I do want to avoid computing attention on the paddings in the input sequences.

Saner3 on 17 Jul 2019

@Saner3 @thomwolf I have same question? don't we need that for paddings?

mehdimashayekhi on 16 Aug 2019

GPT-2 is a model with absolute position embeddings (like Bert) so you should always pad on the right to get best performances for this model (will add this information to the doc_string).

As it's a causal model (only attend to the left context), also means that the model will not attend to the padding tokens (which are on the right) for any real token anyway.

So in conclusion, no need to take special care of avoiding attention on padding.

Just don't use the output of the padded tokens for anything as they don't contain any reliable information (which is obvious I hope).

thomwolf on 20 Aug 2019

👍4

@thomwolf thanks much, and great job!

mehdimashayekhi on 21 Aug 2019

Was this page helpful?

0 / 5 - 0 ratings