Transformers: Image GPT

Created on 17 Jun 2020  路  5Comments  路  Source: huggingface/transformers

馃専 New model addition

Model description

OpenAI just announced Image GPT: https://openai.com/blog/image-gpt/

Although image rendering would be out of scope for Transformers, the RGB generation would still be in scope and it would be best to port the weights to a GPT2LMModel.

However, it's not immediately clear here how the tokenization is implemented in the downloaded model. (no separate vocab.json)

Open source status

New model wontfix

Most helpful comment

Hey @minimaxir! Here's a colab which loads the weights into a subclass of GPT2LMHeadModel and demonstrates unconditional image generation and conditional image completion.

Some differences I've found between Image-GPT and GPT2 which are reflected in the subclass.

1) Image-GPT layer normalization doesn't subtract off the mean
2) different activations used in the MLP
3) In Image-GPT, the input and output embeddings are not tied
4) Image-GPT has an extra learned "sos" token embedding which is concatenated at the beginning of the sequence
5) The GPT2 [n_embd, 3*n_embd] dimensional linear layer, c_attn, which produces queries, keys, and values is instead split into 3 separate linear layers each with dimension [n_head, n_embd/n_head, n_embd] in Image-GPT (this only affects how to load the weights and not the actual model).
6) In Image-GPT, the conv1d module doesn't have a bias term

So what's our next step to add this to the repo?

All 5 comments

I'd like a google colab of it

Hey @minimaxir! Here's a colab which loads the weights into a subclass of GPT2LMHeadModel and demonstrates unconditional image generation and conditional image completion.

Some differences I've found between Image-GPT and GPT2 which are reflected in the subclass.

1) Image-GPT layer normalization doesn't subtract off the mean
2) different activations used in the MLP
3) In Image-GPT, the input and output embeddings are not tied
4) Image-GPT has an extra learned "sos" token embedding which is concatenated at the beginning of the sequence
5) The GPT2 [n_embd, 3*n_embd] dimensional linear layer, c_attn, which produces queries, keys, and values is instead split into 3 separate linear layers each with dimension [n_head, n_embd/n_head, n_embd] in Image-GPT (this only affects how to load the weights and not the actual model).
6) In Image-GPT, the conv1d module doesn't have a bias term

So what's our next step to add this to the repo?

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

@apeguero1 we have an "Adding a new model" checklist at https://github.com/huggingface/transformers/tree/master/templates/adding_a_new_model

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

Was this page helpful?
0 / 5 - 0 ratings

Related issues

yspaik picture yspaik  路  3Comments

alphanlp picture alphanlp  路  3Comments

fyubang picture fyubang  路  3Comments

HansBambel picture HansBambel  路  3Comments

rsanjaykamath picture rsanjaykamath  路  3Comments