Transformers: RoBERTa support

Created on 19 Jul 2019 · 14Comments · Source: huggingface/transformers

https://twitter.com/sleepinyourhat/status/1151940994688016384

The code/parameters aren't out yet, but I figure it couldn't hurt to put in an obnoxious feature request now!

Source

sleepinyourhat

😄12 👍10 ❤7

Most helpful comment

Working on the code/paper release as we speak :) It largely follows the existing masked_lm implementation in fairseq. Happy to help get this integrated here.

myleott on 19 Jul 2019

👍9 ❤5

All 14 comments

Working on the code/paper release as we speak :) It largely follows the existing masked_lm implementation in fairseq. Happy to help get this integrated here.

myleott on 19 Jul 2019

👍9 ❤5

Hi @myleott great news :) I'm really excited about the release 🤗 I've some questions: do you plan to perform any comparisons between RoBERTa and BERT on NER (CoNLL-2003)?

I've read the Cloze-driven Pretraining of Self-attention Networks paper, and if I recall correctly, the implementation is currently done in the bi_trans_lm branch in fairseq, but do you have any updates on that? It would be awesome if a pre-trained CNN model from that paper could also be integrated into pytorch-transformers 😍

stefan-it on 23 Jul 2019

Sounds great @myleott. Keep us updated about the release!

thomwolf on 23 Jul 2019

👍3

Models and README are uploaded: https://github.com/pytorch/fairseq/tree/master/examples/roberta. We submitted the paper to arXiv today so it should be out Sunday evening.

I've some questions: do you plan to perform any comparisons between RoBERTa and BERT on NER (CoNLL-2003)?

We haven't yet, but it would be interesting to explore. RoBERTa was trained on considerably more data than BERT, so I expect it would do well on NER tasks.

myleott on 27 Jul 2019

👍8 🎉5

Paper is out. Thanks @myleott!

julien-c on 29 Jul 2019

Work in progress in #964 feel free to chime in :)

julien-c on 5 Aug 2019

Example of how RoBERTa can be used to predict a masked token.

import torch
from pytorch_transformers import RobertaTokenizer, RobertaForMaskedLM

tokenizer = RobertaTokenizer.from_pretrained('roberta-large')
model = RobertaForMaskedLM.from_pretrained('roberta-large')
model.eval()

if torch.cuda.is_available(): model.to('cuda') #if we have a GPU
text = 'I believe my sister is > because she eats a lot of vegetables .'

tokenized_text = tokenizer.tokenize(text)
masked_index = tokenized_text.index(>)+1

add_special_tokens adds a > to the beginning and > to the end of the text

input_ids = torch.tensor(tokenizer.encode(text,add_special_tokens=True)).unsqueeze(0)
input_ids_tensor = input_ids.to("cuda")

with torch.no_grad():

outputs = model(input_ids_tensor, masked_lm_labels=input_ids_tensor)
loss, prediction_scores = outputs[:2]

predicted_index = torch.argmax(prediction_scores[0, masked_index]).item()

predicted_token = tokenizer.convert_ids_to_tokens([predicted_index])[0]

predicted_k_indexes = torch.topk(prediction_scores[0, masked_index],k=20)
predicted_logits_list = predicted_k_indexes[0]
predicted_indexes_list = predicted_k_indexes[1]

for i, item in enumerate(predicted_indexes_list):
the_index = predicted_indexes_list[i].item()
print("word and logits",tokenizer.decode(the_index),predicted_logits_list[i].item())

pwolff on 19 Aug 2019

Hi @pwolff, at first glance it looks ok to me. You don't need to send the masked_lm_labels if you don't use the loss though.

thomwolf on 20 Aug 2019

@thomwolf hello, I trained the robert on my customized corpus following the fairseq instruction. I am confused how to generate the robert vocab.json and also merge.txt because I want to use the pytorch-transformer RoBERTaTokenizer.

songtaoshi on 22 Aug 2019

@stefan-it hello, I trained the robert on my customized corpus following the fairseq instruction. I am confused how to generate the robert vocab.json and also merge.txt because I want to use the pytorch-transformer RoBERTaTokenizer.

songtaoshi on 23 Aug 2019

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.