Transformers: Add Linformer model

Created on 13 Jun 2020 · 4Comments · Source: huggingface/transformers

🌟 New model addition

Model description

Linformer: Self-Attention with Linear Complexity

Paper published June 9th on ArXiv: https://arxiv.org/abs/2006.04768

Large transformer models have shown extraordinary success in achieving state-of-the-art results in many natural language processing applications. However, training and deploying these models can be prohibitively costly for long sequences, as the standard self-attention mechanism of the Transformer uses O(n²) time and space with respect to sequence length. In this paper, we demonstrate that the self-attention mechanism can be approximated by a low-rank matrix. We further exploit this finding to propose a new self-attention mechanism, which reduces the overall self-attention complexity from O(n²) to O(n) in both time and space. The resulting linear transformer, the Linformer, performs on par with standard Transformer models, while being much more memory- and time-efficient.

Open source status

[ ] the model implementation is available: (give details)
[ ] the model weights are available: (give details)
[x] who are the authors: Sinong Wang, Belinda Z. Li, Madian Khabsa, Han Fang, Hao Ma

New model wontfix

Source

AaronFriel

🚀22 🎉15 👀11 👍3

Most helpful comment

Here is an pytorch implementation
https://github.com/tatp22/linformer-pytorch

flozi00 on 13 Jun 2020

👍5

All 4 comments

Here is an pytorch implementation
https://github.com/tatp22/linformer-pytorch

flozi00 on 13 Jun 2020

👍5

Just another implementation by the authors
https://github.com/facebookresearch/pytext/pull/1407

bratao on 25 Jul 2020

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.