Transformer's Growth Is Unstoppable! Summary Of Research On Transformer Improvements Part 3

Transformer 24/12/2020

3 main points.
✔️ Introduction to specific examples of Efficient Transformer models
✔️ Explains methods that use learnable patterns, low-rank factorization, kernels, and recursion
✔️ Achieve Attention of linear order O(N) at best

Efficient Transformers: A Survey
written by Yi Tay, Mostafa Dehghani, Dara Bahri, Donald Metzler
(Submitted on 14 Sep 2020 (v1), last revised 16 Sep 2020 (this version, v2))
Comments: Accepted at arXiv
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Computer Vision and Pattern Recognition (cs.CV); Information Retrieval (cs.IR)

first of all

Research on more efficient transformers (Efficient Transformers) by improving the algorithm of transformers is now very active. The progress in this research area is so fast that many Efficient Transformers have already been proposed, and it is very difficult to grasp the whole picture. In this article, we take this reality into account and provide a

In light of this situation, this article provides a comprehensive explanation of the improvements to the Efficient Transformer. A description of Efficient Transformer in general, its broad classification, and other basics can be found in this article. In this article, we will provide more specific and detailed explanations about the architecture and time/space computational complexity of Efficient Transformer models proposed in the past.

The models presented in this article will be classified as Learnable Pattern (LP), Low Rank Factorization (LR), Kernel (KR) and Recursion (RC) based approaches. (4.5 - 4.8)

For an explanation of the other classified models, please see this article.

1. about the calculation amount of Transformer (explained in another article)

2. classification of Efficient Transformer (explained in another article)

3. related information on Efficient Transformer (explained in a separate article)

4. specific examples of Efficient Transformer
　4.1. fixed pattern based (FP) (explained in another article)
　　Memory Compressed Transformer
　　Image Transformer
　4.2. global memory based (M) (explained in another article)
　　Set Transformers
　4.3. Combinations of FP (explained in another article)
　　Sparse Transformers
　　Axial Transformers
　4.4. Combinations of Fixed Patterns and Global Memory Base (FP+M) (described in another article)
　　Longformer
　　ETC
　　BigBird
　4.5. learnable pattern based (LP)
　　Routing Transformers
　　Reformer
　　Sinkhorn Transformers
　4.6. Low Rank Factorization Based (LR)
　 Linformer
　　Synthesizers
4.7. Kernel-based (KR)
　　Performer
　　Linear Transformers
　4.8. Recursion-based (RC)
　　Transformer-XL
　　Compressive Transformers