Transformers: How to add input mask to GPT?

Created on 6 Mar 2019  路  1Comment  路  Source: huggingface/transformers

I use attention_mask when I do bert.forward(input, attention_mask). But in GPT, when I try to pass a batch of input to OpenAIGPTModel to extract a batch of features, and the lengths of sentences in a batch are different, I have no idea how to do it. Or maybe it doesn't need the mask to be given? If so, is zero the padding_index?

For a quick review, this is the code for bert to extract embeddings.

all_encoder_layers, pooled_output = self.bert(inputs[:, :seq_max_len], token_type_ids=None,
                                          attention_mask=att_mask.to(device))
embeds = torch.cat(all_encoder_layers[-self.bert_n_layers:],-1)

Most helpful comment

GPT is a causal model so each tokens only attend to the left context and masking is not really needed.
Just mask the output according to your lengths (and be such that each input sample start at the very first left token).

>All comments

GPT is a causal model so each tokens only attend to the left context and masking is not really needed.
Just mask the output according to your lengths (and be such that each input sample start at the very first left token).

Was this page helpful?
0 / 5 - 0 ratings

Related issues

0x01h picture 0x01h  路  3Comments

siddsach picture siddsach  路  3Comments

yspaik picture yspaik  路  3Comments

ereday picture ereday  路  3Comments

fyubang picture fyubang  路  3Comments