Transformer's Growth Is Unstoppable! Summary Of Research On Transformer Improvements Part 3
3 main points.
✔️ Introduction to specific examples of Efficient Transformer models
✔️ Explains methods that use learnable patterns, low-rank factorization, kernels, and recursion
✔️ Achieve Attention of linear order O(N) at best
Efficient Transformers: A Survey
written by Yi Tay, Mostafa Dehghani, Dara Bahri, Donald Metzler
(Submitted on 14 Sep 2020 (v1), last revised 16 Sep 2020 (this version, v2))
Comments: Accepted at arXiv
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Computer Vision and Pattern Recognition (cs.CV); Information Retrieval (cs.IR)
first of all
Research on more efficient transformers (Efficient Transformers) by improving the algorithm of transformers is now very active. The progress in this research area is so fast that many Efficient Transformers have already been proposed, and it is very difficult to grasp the whole picture. In this article, we take this reality into account and provide a
In light of this situation, this article provides a comprehensive explanation of the improvements to the Efficient Transformer. A description of Efficient Transformer in general, its broad classification, and other basics can be found in this article. In this article, we will provide more specific and detailed explanations about the architecture and time/space computational complexity of Efficient Transformer models proposed in the past.
The models presented in this article will be classified as Learnable Pattern (LP), Low Rank Factorization (LR), Kernel (KR) and Recursion (RC) based approaches. (4.5 - 4.8)
For an explanation of the other classified models, please see this article.
table of contents
1. about the calculation amount of Transformer (explained in another article)
2. classification of Efficient Transformer (explained in another article)
3. related information on Efficient Transformer (explained in a separate article)
4. specific examples of Efficient Transformer
4.1. fixed pattern based (FP) (explained in another article)
Memory Compressed Transformer
Image Transformer
4.2. global memory based (M) (explained in another article)
Set Transformers
4.3. Combinations of FP (explained in another article)
Sparse Transformers
Axial Transformers
4.4. Combinations of Fixed Patterns and Global Memory Base (FP+M) (described in another article)
Longformer
ETC
BigBird
4.5. learnable pattern based (LP)
Routing Transformers
Reformer
Sinkhorn Transformers
4.6. Low Rank Factorization Based (LR)
Linformer
Synthesizers
4.7. Kernel-based (KR)
Performer
Linear Transformers
4.8. Recursion-based (RC)
Transformer-XL
Compressive Transformers
To read more,
Please register with AI-SCHOLAR.
ORCategories related to this article