I have been looking to do some seq2seq tasks in the huggingface-transformers using BertGeneration or EncoderDecoderModel classes.
But I only have ended up finding some simple examples described in the API documentation like below.
>>> import torch
>>> tokenizer = BertTokenizer.from_pretrained('bert-base-uncased')
>>> model = EncoderDecoderModel.from_encoder_decoder_pretrained('bert-base-uncased', 'bert-base-uncased') # initialize Bert2Bert from pre-trained checkpoints
>>> # forward
>>> input_ids = torch.tensor(tokenizer.encode("Hello, my dog is cute", add_special_tokens=True)).unsqueeze(0) # Batch size 1
>>> outputs = model(input_ids=input_ids, decoder_input_ids=input_ids)
>>> # training
>>> outputs = model(input_ids=input_ids, decoder_input_ids=input_ids, labels=input_ids, return_dict=True)
>>> loss, logits = outputs.loss, outputs.logits
>>> # save and load from pretrained
>>> model.save_pretrained("bert2bert")
>>> model = EncoderDecoderModel.from_pretrained("bert2bert")
>>> # generation
>>> generated = model.generate(input_ids, decoder_start_token_id=model.config.decoder.pad_token_id)
Is there any Jupyter notebook or detailed example using BertGeneration or EncoderDecoderModel classes specifically? Even though I already know that these classes are released quite recently...
It would be a great help for me if I could find one. Thanks!
Releasing in ~1 week - it's almost ready :-)
Thanks for letting me know! :)
I've released two condensed notebooks as mentioned here: https://discuss.huggingface.co/t/leveraging-pre-trained-checkpoints-for-summarization/835/13?u=patrickvonplaten
Will also release a longer educational blog post in a bit.
Most helpful comment
Releasing in ~1 week - it's almost ready :-)