Transformers: Sharing Microsoft's DialogRPT (new dialog ranking model)

Created on 1 Oct 2020 · 12Comments · Source: huggingface/transformers

🌟 New model addition

Model description

Thanks for the awesome work!

DialogRPT (Dialog Ranking Pretrained Transformers) is a set of GPT-2 based dialogue ranking models recently released with an EMNLP paper by Microsoft Research. It's a follow-up work of DialoGPT (thanks for hosting it!)
The architecture is pretty simple: a GPT2Model followed by a torch.nn.Linear(n_embd, 1, bias=False), and implemented based on a previous HuggingFace commit
At first, I'm trying to create a model card for it, but then realized that it seems there's no existing model architecture in HuggingFace is compatible with DialogRPT. I noticed a lot of BERT-based sequence classification models, but ours is GPT-2 based.

If there's a simple fix (or I missed something) please let me know!
If implementation in modeling_gpt2.py is necessary, I'm also glad to help!

Open source status

[x] the model implementation is available: (https://github.com/golsun/DialogRPT)
[x] the model weights are available: (https://github.com/golsun/DialogRPT)
[x] who are the authors: @golsun @dreasysnail

New model

Source

golsun

🎉2

Most helpful comment

Hi @golsun! GPT2ForSequenceClassification has been implemented on #7501 and I verified that I obtain the same results as you do on your README using your examples.

You should only need to upload your models on the model hub now! Some helpers regarding the configuration:

You should upload a model configuration on the hub, for every model.
You can simply copy-paste the gpt2-medium configuration that you can find here.
You will need to add a num_labels=1 field to these configurations.
In the architectures field, you should put GPT2ForSequenceClassification

LysandreJik on 1 Oct 2020

🚀1 🎉1

All 12 comments

Hi @golsun! Thanks a lot for opening an issue and offering to contribute it!

Indeed, there is no GPT2ForSequenceClassification model in the library (yet!) I'm adding it right now with the goal of supporting DialogRPT. I'll get back to you in a bit.

LysandreJik on 1 Oct 2020

🎉1

Hi @golsun! GPT2ForSequenceClassification has been implemented on #7501 and I verified that I obtain the same results as you do on your README using your examples.

You should only need to upload your models on the model hub now! Some helpers regarding the configuration:

You should upload a model configuration on the hub, for every model.
You can simply copy-paste the gpt2-medium configuration that you can find here.
You will need to add a num_labels=1 field to these configurations.
In the architectures field, you should put GPT2ForSequenceClassification

LysandreJik on 1 Oct 2020

🚀1 🎉1

wow, super fast!!!
thank you @LysandreJik , I'll update my repo to reflect this once the pull is merged.

golsun on 1 Oct 2020

The pul request is now merged @golsun!

LysandreJik on 6 Oct 2020

🎉1

Thank you so much @LysandreJik !
I just tried GPT2ForSequenceClassification and it works! 👍
Then I created this model card, but model = AutoModelForSequenceClassification.from_pretrained("microsoft/DialogRPT-updown") gives me the following error, which can be reproduced with this Notebook:

/content/transformers/src/transformers/modeling_auto.py in from_pretrained(cls, pretrained_model_name_or_path, *model_args, **kwargs)
   1203                 config.__class__,
   1204                 cls.__name__,
-> 1205                 ", ".join(c.__name__ for c in MODEL_FOR_SEQUENCE_CLASSIFICATION_MAPPING.keys()),
   1206             )
   1207         )

ValueError: Unrecognized configuration class <class 'transformers.configuration_gpt2.GPT2Config'> for this kind of AutoModel: AutoModelForSequenceClassification.
Model type should be one of DistilBertConfig, AlbertConfig, CamembertConfig, XLMRobertaConfig, BartConfig, LongformerConfig, RobertaConfig, SqueezeBertConfig, BertConfig, XLNetConfig, MobileBertConfig, FlaubertConfig, XLMConfig, ElectraConfig, FunnelConfig, DebertaConfig.

golsun on 7 Oct 2020

Indeed, this should be solved by #7630.

LysandreJik on 7 Oct 2020

thank you @LysandreJik AutoModelForSequenceClassification works now.
The inference webpage still gives the Unrecognized configuration class error but I guess it will sync with the latest code soon.
I'm going to introduce model card in the original repo.
Thanks again for the help!

golsun on 8 Oct 2020

👍1

We just updated the API inference so that it uses the latest code. I've taken the liberty to add a padding token to your models, in your configuration (pad_token_id: 50256) and in the special_tokens_map.json: pad_token: "<|endoftext|>", as it is necessary for the models to have a padding token to run in the API inference.

I've taken these values from your code here and here.

Models should now work correctly in the inference webpage :)

LysandreJik on 9 Oct 2020

Great! Thank you for updating the config and special_tokens_map for us! :)
The inference webpage will output a score of 1 no matter what input is. I guess it's because it outputs softmax(logits), which is always 1 if num_labels==1. Maybe the following if-else will fix it?

if num_labels == 1:
    return torch.sigmoid(logits)
else:
    return torch.softmax(logits)

the case num_labels==1 follows the DialogRPT code here

golsun on 9 Oct 2020

You're correct! Solving that in #7726.

LysandreJik on 12 Oct 2020

Also @golsun on the inference API, you can have custom label names (instead of just LABEL_0 here) if you set your label names in your config.json

See https://huggingface.co/roberta-large-mnli's config.json file for an example