Transformers: Add support for Funnel-Transformer

Created on 8 Jun 2020 · 3Comments · Source: huggingface/transformers

🌟 New model addition

Model description

The recently introduced Funnel-Transformer architecture and models would be a great feature for Transformers:

Funnel-Transformer is a new self-attention model that gradually compresses the sequence of hidden states to a shorter one and hence reduces the computation cost. More importantly, by re-investing the saved FLOPs from length reduction in constructing a deeper or wider model, Funnel-Transformer usually has a higher capacity given the same FLOPs. In addition, with a decoder, Funnel-Transformer is able to recover the token-level deep representation for each token from the reduced hidden sequence, which enables standard pretraining.

The paper can be found here.

Open source status

[x] the model implementation is available: official GitHub repo
[x] the model weights are available: Google Cloud Bucket
[x] who are the authors: @zihangdai and @laiguokun

New model

Source

stefan-it

👍4

Most helpful comment

Will start to look into this.

sgugger on 30 Jun 2020

❤5 👍5

All 3 comments

Will start to look into this.

sgugger on 30 Jun 2020

❤5 👍5

@sgugger Any updates on this? Thanks!

nemani on 25 Aug 2020

The first models are uploaded and the base models are available in PyTorch (FunnelModel has encoder + decoder and FunnelBaseModel just the encoder, for sequence classification and multiple choice) in this branch. Should have all checkpoints on the HuggingFace S3 and all PyTorch models on the same branch by the end of this week.

Note that there might be some changes in the names as this goes under review once it's ready.

sgugger on 1 Sep 2020

Was this page helpful?

0 / 5 - 0 ratings