Allennlp: Remove / move internal transformer implementations

Created on 3 Jan 2019 · 21Comments · Source: allenai/allennlp

We currently have a handful of transformer implementations. At least the following:

https://github.com/allenai/allennlp/blob/master/allennlp/modules/seq2seq_encoders/multi_head_self_attention.py
https://github.com/allenai/allennlp/blob/master/allennlp/modules/seq2seq_encoders/bidirectional_language_model_transformer.py#L139
https://github.com/allenai/allennlp/blob/master/allennlp/modules/openai_transformer.py

We should remove these and rely on an external implementation. Perhaps, Pytorch's nn.Transformer. This will (very likely) be a breaking change for our existing models.

2 Weeks Breaking change

Source

brendan-ai2

Most helpful comment

Ideally we could just use another fully featured pytorch transformer implementation (e.g. https://github.com/pytorch/fairseq or possibly https://github.com/huggingface/pytorch-pretrained-BERT) and not have to worry about maintaining the code in allennlp. fairseq would also give us optimized CNNs and fp16 in addition to self-attention.

matt-peters on 4 Jan 2019

👍4

All 21 comments

@matt-peters during issue review it came up that you might propose standardizing on the openai transformer (assuming experiments work out). Is that accurate?

schmmd on 4 Jan 2019

matt-peters on 4 Jan 2019

👍4

FWIW, looks like the new PyTorch 1.1 includes a torch.nn.MultiheadAttention

nelson-liu on 1 May 2019

Moreover, this module was landed as a part of a bigger Transformer PR

maksym-del on 2 May 2019

Thanks @maxdel, we have put together a PR for Transformer in PyTorch. Feel free to comment.

The transformer module is supported by torch.nn.MultiheadAttention. We have a plan to improve the efficiency of this operator in the near future. @cpuhrsch.

zhangguanheng66 on 2 May 2019

👍1

Hey all, please let us know how we can help you guys deduplicate some of your work and better support this library overall!

We're eager to provide a good set of tools or a full implementation of what you need for your Transformers.

Also, if you have more general feature requests, we'd be eager to hear about them as well.

cpuhrsch on 3 May 2019

👍3

@cpuhrsch Need some help figuring out the decoder abstraction - https://github.com/allenai/allennlp/pull/2517 .

sai-prasanna on 4 May 2019

The new version of pytorch includes a nn.Transformer module

nelson-liu on 8 Aug 2019

👍1

Hello everyone, I just want to check if we could provide any help to apply nn.Transformer module. We want to better support AllenNLP and avoid the duplicate work. We could implement more features if you feel necessary to better serve this repo.

zhangguanheng66 on 20 Aug 2019

Updated description to reflect that we don't intend to maintain our own transformer implementation anymore.

brendan-ai2 on 7 Jan 2020

If you find any problems and difficulties when mitigating to pytorch transformer/multiheadattention, please ping me @zhangguaheng66 on pytorch Github. We are happy to provide some hands-on help.

zhangguanheng66 on 7 Jan 2020

@zhangguanheng66 fantastic and thanks for reaching back out. I'm pretty sure we'll be doing this in the next two months.

schmmd on 8 Jan 2020

👍1

What we should do: remove all of our internal implementations of multi-headed attention / self attention, in favor of huggingface and pytorch implementations. Some of our old models depend on this code, though. As we split out those models into their own repos, our internal implementations should go with them, when it makes sense. If we have any implementations that are not used in one of the models that we are putting into their own repo, they should just be removed.

matt-gardner on 10 Feb 2020

We opened an issue to keep a record of change requests for MHA and transformer. If you have some implementations what are worth landing in pytorch repo, feel free to send requests there. https://github.com/pytorch/pytorch/issues/32590

zhangguanheng66 on 10 Feb 2020

Notes to self:

BidirectionalLanguageModelTransformer should go with the model.
StackedSelfAttentionEncoder can possibly be transparently replaced with a PyTorch implementation if we're careful with the names of the parameters?

dirkgr on 4 Mar 2020

I started adding the transformer implementation from PyTorch here: #3898

dirkgr on 4 Mar 2020

This is done already.

dirkgr on 13 Apr 2020

@dirkgr Nice job for the transformer wrapper.

I saw there is a seq-to-seq encoder MultiHeadSelfAttention. Is there any plan for it?

zhangguanheng66 on 13 Apr 2020

Despite the name, that module is not strictly a transformer as designed by Devlin et al. Also, it was only used by QaNet, so I moved it there: https://github.com/allenai/allennlp-models/blob/master/allennlp_models/rc/qanet/multi_head_self_attention.py

If you need it, feel free to grab it from there and add it to your own code.

dirkgr on 14 Apr 2020

Despite the name, that module is not strictly a transformer as designed by Devlin et al. Also, it was only used by QaNet, so I moved it there: https://github.com/allenai/allennlp-models/blob/master/allennlp_models/rc/qanet/multi_head_self_attention.py

If you need it, feel free to grab it from there and add it to your own code.

Thanks @dirkgr . I'm wondering if you plan to wrap up nn.MultiheadAttention and replace multi_head_self_attention function. To me, they are very similar.

zhangguanheng66 on 14 Apr 2020

nn.MultiheadAttention is not a seq2seq module. It implements the attention part of the transformer architecture. The pytorch transformer code (and thus our wrapper of it) uses nn.MultiheadAttention internally.

If you need it in your own forward() functions, you can just call it! There is no need for AllenNLP to wrap it.

dirkgr on 15 Apr 2020

Was this page helpful?

0 / 5 - 0 ratings