The recently introduced Funnel-Transformer architecture and models would be a great feature for Transformers:
Funnel-Transformer is a new self-attention model that gradually compresses the sequence of hidden states to a shorter one and hence reduces the computation cost. More importantly, by re-investing the saved FLOPs from length reduction in constructing a deeper or wider model, Funnel-Transformer usually has a higher capacity given the same FLOPs. In addition, with a decoder, Funnel-Transformer is able to recover the token-level deep representation for each token from the reduced hidden sequence, which enables standard pretraining.
The paper can be found here.
Will start to look into this.
@sgugger Any updates on this? Thanks!
The first models are uploaded and the base models are available in PyTorch (FunnelModel has encoder + decoder and FunnelBaseModel just the encoder, for sequence classification and multiple choice) in this branch. Should have all checkpoints on the HuggingFace S3 and all PyTorch models on the same branch by the end of this week.
Note that there might be some changes in the names as this goes under review once it's ready.
Most helpful comment
Will start to look into this.