Transformers: Sharing Microsoft's DialogRPT (new dialog ranking model)

Created on 1 Oct 2020  路  12Comments  路  Source: huggingface/transformers

馃専 New model addition

Model description

Thanks for the awesome work!

DialogRPT (Dialog Ranking Pretrained Transformers) is a set of GPT-2 based dialogue ranking models recently released with an EMNLP paper by Microsoft Research. It's a follow-up work of DialoGPT (thanks for hosting it!)
The architecture is pretty simple: a GPT2Model followed by a torch.nn.Linear(n_embd, 1, bias=False), and implemented based on a previous HuggingFace commit
At first, I'm trying to create a model card for it, but then realized that it seems there's no existing model architecture in HuggingFace is compatible with DialogRPT. I noticed a lot of BERT-based sequence classification models, but ours is GPT-2 based.

If there's a simple fix (or I missed something) please let me know!
If implementation in modeling_gpt2.py is necessary, I'm also glad to help!

Open source status

  • [x] the model implementation is available: (https://github.com/golsun/DialogRPT)
  • [x] the model weights are available: (https://github.com/golsun/DialogRPT)
  • [x] who are the authors: @golsun @dreasysnail
New model

Most helpful comment

Hi @golsun! GPT2ForSequenceClassification has been implemented on #7501 and I verified that I obtain the same results as you do on your README using your examples.

You should only need to upload your models on the model hub now! Some helpers regarding the configuration:

  • You should upload a model configuration on the hub, for every model.
  • You can simply copy-paste the gpt2-medium configuration that you can find here.
  • You will need to add a num_labels=1 field to these configurations.
  • In the architectures field, you should put GPT2ForSequenceClassification

All 12 comments

Hi @golsun! Thanks a lot for opening an issue and offering to contribute it!

Indeed, there is no GPT2ForSequenceClassification model in the library (yet!) I'm adding it right now with the goal of supporting DialogRPT. I'll get back to you in a bit.

Hi @golsun! GPT2ForSequenceClassification has been implemented on #7501 and I verified that I obtain the same results as you do on your README using your examples.

You should only need to upload your models on the model hub now! Some helpers regarding the configuration:

  • You should upload a model configuration on the hub, for every model.
  • You can simply copy-paste the gpt2-medium configuration that you can find here.
  • You will need to add a num_labels=1 field to these configurations.
  • In the architectures field, you should put GPT2ForSequenceClassification

wow, super fast!!!
thank you @LysandreJik , I'll update my repo to reflect this once the pull is merged.

The pul request is now merged @golsun!

Thank you so much @LysandreJik !
I just tried GPT2ForSequenceClassification and it works! 馃憤
Then I created this model card, but model = AutoModelForSequenceClassification.from_pretrained("microsoft/DialogRPT-updown") gives me the following error, which can be reproduced with this Notebook:

/content/transformers/src/transformers/modeling_auto.py in from_pretrained(cls, pretrained_model_name_or_path, *model_args, **kwargs)
   1203                 config.__class__,
   1204                 cls.__name__,
-> 1205                 ", ".join(c.__name__ for c in MODEL_FOR_SEQUENCE_CLASSIFICATION_MAPPING.keys()),
   1206             )
   1207         )

ValueError: Unrecognized configuration class <class 'transformers.configuration_gpt2.GPT2Config'> for this kind of AutoModel: AutoModelForSequenceClassification.
Model type should be one of DistilBertConfig, AlbertConfig, CamembertConfig, XLMRobertaConfig, BartConfig, LongformerConfig, RobertaConfig, SqueezeBertConfig, BertConfig, XLNetConfig, MobileBertConfig, FlaubertConfig, XLMConfig, ElectraConfig, FunnelConfig, DebertaConfig.

Indeed, this should be solved by #7630.

thank you @LysandreJik AutoModelForSequenceClassification works now.
The inference webpage still gives the Unrecognized configuration class error but I guess it will sync with the latest code soon.
I'm going to introduce model card in the original repo.
Thanks again for the help!

We just updated the API inference so that it uses the latest code. I've taken the liberty to add a padding token to your models, in your configuration (pad_token_id: 50256) and in the special_tokens_map.json: pad_token: "<|endoftext|>", as it is necessary for the models to have a padding token to run in the API inference.

I've taken these values from your code here and here.

Models should now work correctly in the inference webpage :)

Great! Thank you for updating the config and special_tokens_map for us! :)
The inference webpage will output a score of 1 no matter what input is. I guess it's because it outputs softmax(logits), which is always 1 if num_labels==1. Maybe the following if-else will fix it?

if num_labels == 1:
    return torch.sigmoid(logits)
else:
    return torch.softmax(logits)

the case num_labels==1 follows the DialogRPT code here

You're correct! Solving that in #7726.

Also @golsun on the inference API, you can have custom label names (instead of just LABEL_0 here) if you set your label names in your config.json

See https://huggingface.co/roberta-large-mnli's config.json file for an example

Awesome! thank you @LysandreJik @julien-c

Was this page helpful?
0 / 5 - 0 ratings

Related issues

quocnle picture quocnle  路  3Comments

yspaik picture yspaik  路  3Comments

HansBambel picture HansBambel  路  3Comments

adigoryl picture adigoryl  路  3Comments

alphanlp picture alphanlp  路  3Comments