Paper published June 9th on ArXiv: https://arxiv.org/abs/2006.04768
Large transformer models have shown extraordinary success in achieving state-of-the-art results in many natural language processing applications. However, training and deploying these models can be prohibitively costly for long sequences, as the standard self-attention mechanism of the Transformer uses O(n虏) time and space with respect to sequence length. In this paper, we demonstrate that the self-attention mechanism can be approximated by a low-rank matrix. We further exploit this finding to propose a new self-attention mechanism, which reduces the overall self-attention complexity from O(n虏) to O(n) in both time and space. The resulting linear transformer, the Linformer, performs on par with standard Transformer models, while being much more memory- and time-efficient.
Here is an pytorch implementation
https://github.com/tatp22/linformer-pytorch
Just another implementation by the authors
https://github.com/facebookresearch/pytext/pull/1407
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.
Any Tensorflow implementation?
Most helpful comment
Here is an pytorch implementation
https://github.com/tatp22/linformer-pytorch