Transformers: How to make some structural changes to the EncoderDecoderModel ?

Created on 22 Oct 2020 · 14Comments · Source: huggingface/transformers

❓ Questions & Help

Details

Hey , I use EncoderDecoderModel for abstractive summarization. I load the bert2bert model like this
model=EncoderDecoderModel.from_encoder_decoder_pretrained('bert-base-uncased', 'bert-base-uncased')

And I want to make some structural changes to the output layer of decoder model.

For example, in one decoder step, the output hidden state of bert-decoder is a vector (s). I use another network and I get a vector (w) to make the summarization more accurate. I want to concatenate the two vectors in the output layer and use the final vector to generate a word in the vocabulary.

How can I do this ?

A link to original question on the forum/Stack Overflow:

Source

yhznb

👍2

Most helpful comment

Hey @yhznb,

We try to mainly use the github issues for bugs in the library. For more customized questions it would be great if you could use https://discuss.huggingface.co/ instead.

Regarding your question I would just add a layer to BertLMHeadModel wherever you want to and then build your EncoderDecoderModel from BertModel (encoder) & your use-case speciifc BertLMHeadModel (decoder).

patrickvonplaten on 23 Oct 2020

❤3

All 14 comments

Hey @yhznb,

We try to mainly use the github issues for bugs in the library. For more customized questions it would be great if you could use https://discuss.huggingface.co/ instead.

patrickvonplaten on 23 Oct 2020

❤3

Hey, @patrickvonplaten, I have the same question. Can you provide a example of building the EncoderDecoderModel from BertModel (encoder) & use-case speciifc BertLMHeadModel ? I can't find this in the official document. Thank you very much .

AI678 on 25 Oct 2020

I think the model(EncoderDecoderModel) outputs all the hidden states at once . And I want to control it step by step. For example , I want to change the LMhead of Decoder by concatenating another vector. The problem is that the DecoderModel outputs all the hidden states at once. I want to control it for step by step decoding. In other words. I want to use the concatenated vector as the hidden state for generation and use the generated word vector for next step's input. How can I change the model or call the interface properly ? Is it possible under the framework of huggingface ?
Thank you very much ! @patrickvonplaten

AI678 on 26 Oct 2020

I also raised this in the forum. Does this issue need to be closed ?
The link is here :
https://discuss.huggingface.co/t/control-encoderdecodermodel-to-generate-tokens-step-by-step/1756

AI678 on 26 Oct 2020

thank you very much ! @patrickvonplaten

yhznb on 27 Oct 2020

Have you solved your question ？ @AI678 I think it is all about changing the LMhaed and the calculation of logits. But I don't know how to change it .

yhznb on 27 Oct 2020

Yes , you are right. @yhznb

AI678 on 27 Oct 2020

Hey @yhznb,

We try to mainly use the github issues for bugs in the library. For more customized questions it would be great if you could use https://discuss.huggingface.co/ instead.

Regarding your question I would just add a layer to BertLMHeadModel wherever you want to and then build your EncoderDecoderModel from BertModel (encoder) & your use-case speciifc BertLMHeadModel (decoder).

Sorry, I misunderstood what you meant. This is a feature to be developed. So, how long can this feature be developed ? thank you for your response.

AI678 on 27 Oct 2020

Hey , I have similar demands. Because I think using only vanilla bert2bert or roberta2roberta is not sufficient for abstractive summarization . For fluency and information richness, we should consider to change the top layer of decoder for further learning.

nlpLover123 on 27 Oct 2020

Hey, @patrickvonplaten, when do you want to release that ?

nlpLover123 on 27 Oct 2020

@nlpLover123 , you can control it step by step. But I think it is too slow for a large dataset like cnn-dailymail.
And I also want to ask when do you want to release that ? @patrickvonplaten
If that needs too much time, maybe I would write a encoder_decoder_model from scratch. Because I have little time to wait for that.
Thank you very much .

AI678 on 28 Oct 2020

that is too difficult @AI678 .Maybe it is slower that step by step generation.

moonlightarc on 28 Oct 2020

so I just want to make a specific change at the LMhead layer @moonlightarc

AI678 on 29 Oct 2020

@AI678 , I don't think we are planning on releasing such a feature into the library. It's a very specific request and I'd suggest that you try to fork the repo and make the changes according to your needs

patrickvonplaten on 30 Oct 2020

Was this page helpful?

0 / 5 - 0 ratings