Transformers: RoBERTa support

Created on 19 Jul 2019  路  14Comments  路  Source: huggingface/transformers

https://twitter.com/sleepinyourhat/status/1151940994688016384

The code/parameters aren't out yet, but I figure it couldn't hurt to put in an obnoxious feature request now!

Most helpful comment

Working on the code/paper release as we speak :) It largely follows the existing masked_lm implementation in fairseq. Happy to help get this integrated here.

All 14 comments

Working on the code/paper release as we speak :) It largely follows the existing masked_lm implementation in fairseq. Happy to help get this integrated here.

Hi @myleott great news :) I'm really excited about the release 馃 I've some questions: do you plan to perform any comparisons between RoBERTa and BERT on NER (CoNLL-2003)?

I've read the Cloze-driven Pretraining of Self-attention Networks paper, and if I recall correctly, the implementation is currently done in the bi_trans_lm branch in fairseq, but do you have any updates on that? It would be awesome if a pre-trained CNN model from that paper could also be integrated into pytorch-transformers 馃槏

Sounds great @myleott. Keep us updated about the release!

Models and README are uploaded: https://github.com/pytorch/fairseq/tree/master/examples/roberta. We submitted the paper to arXiv today so it should be out Sunday evening.

I've some questions: do you plan to perform any comparisons between RoBERTa and BERT on NER (CoNLL-2003)?

We haven't yet, but it would be interesting to explore. RoBERTa was trained on considerably more data than BERT, so I expect it would do well on NER tasks.

Paper is out. Thanks @myleott!

Work in progress in #964 feel free to chime in :)

Example of how RoBERTa can be used to predict a masked token.

import torch
from pytorch_transformers import RobertaTokenizer, RobertaForMaskedLM

tokenizer = RobertaTokenizer.from_pretrained('roberta-large')
model = RobertaForMaskedLM.from_pretrained('roberta-large')
model.eval()

if torch.cuda.is_available(): model.to('cuda') #if we have a GPU
text = 'I believe my sister is > because she eats a lot of vegetables .'

tokenized_text = tokenizer.tokenize(text)
masked_index = tokenized_text.index(>)+1

add_special_tokens adds a > to the beginning and > to the end of the text

input_ids = torch.tensor(tokenizer.encode(text,add_special_tokens=True)).unsqueeze(0)
input_ids_tensor = input_ids.to("cuda")

with torch.no_grad():

outputs = model(input_ids_tensor, masked_lm_labels=input_ids_tensor)
loss, prediction_scores = outputs[:2]

predicted_index = torch.argmax(prediction_scores[0, masked_index]).item()

predicted_token = tokenizer.convert_ids_to_tokens([predicted_index])[0]

predicted_k_indexes = torch.topk(prediction_scores[0, masked_index],k=20)
predicted_logits_list = predicted_k_indexes[0]
predicted_indexes_list = predicted_k_indexes[1]

for i, item in enumerate(predicted_indexes_list):
the_index = predicted_indexes_list[i].item()
print("word and logits",tokenizer.decode(the_index),predicted_logits_list[i].item())

Hi @pwolff, at first glance it looks ok to me. You don't need to send the masked_lm_labels if you don't use the loss though.

@thomwolf hello, I trained the robert on my customized corpus following the fairseq instruction. I am confused how to generate the robert vocab.json and also merge.txt because I want to use the pytorch-transformer RoBERTaTokenizer.

@stefan-it hello, I trained the robert on my customized corpus following the fairseq instruction. I am confused how to generate the robert vocab.json and also merge.txt because I want to use the pytorch-transformer RoBERTaTokenizer.

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

@songtaoshi I think this can be done via subword-nmt, see this note:

https://github.com/pytorch/fairseq/issues/1163#issuecomment-534098220

is this still an issue?

Nope, RoBERTa support was shipped in v1.1.0

Thanks all!

Was this page helpful?
0 / 5 - 0 ratings

Related issues

fyubang picture fyubang  路  3Comments

chuanmingliu picture chuanmingliu  路  3Comments

lcswillems picture lcswillems  路  3Comments

yspaik picture yspaik  路  3Comments

lemonhu picture lemonhu  路  3Comments