Transformer's Growth Is Unstoppable! Summary Of Research On Improving Transformer Part 2
3 main points
✔️ Introduction to a concrete example of Efficient Transformer model
✔️ Describes a method using fixed pattern and global memory
✔️ Achieve an Attention of linear order O(N) at best
Efficient Transformers: A Survey
written by Yi Tay, Mostafa Dehghani, Dara Bahri, Donald Metzler
(Submitted on 14 Sep 2020 (v1), last revised 16 Sep 2020 (this version, v2))
Comments: Accepted at arXiv
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Computer Vision and Pattern Recognition (cs.CV); Information Retrieval (cs.IR)
first of all
Research on more efficient transformers (Efficient Transformers) that improve on Transformer algorithms is currently very active. The progress in this research area is so fast that so many Efficient Transformers have already been proposed, and it is very difficult to grasp the whole picture.
In light of this situation, this article provides a comprehensive explanation of improvements to Transformer.
For a general description of Efficient Transformer, a broad classification, and other basics, you can read this article.
This article provides a more specific and detailed description of the architecture and temporal and spatial computational complexity of the Efficient Transformer models proposed in the past. The models presented in this article can be categorized as Fixed Pattern (FP), Global Memory (M), or a combination of these (FP+M) based approaches. (4.1 - 4.4)
For an explanation of the other classified models, see this article (published tomorrow).
table of contents
1. computational complexity of transformers (explained in a separate article)
2. classification of Efficient Transformer (explained in another article)
3. related information on Efficient Transformer (explained in another article)
4. concrete examples of Efficient Transformer
4.1 Fixed Pattern Based (FP)
・Memory Compressed Transformer
・Memory-compressed Attention
・Local Attention Span
・Image Transformer
4.2 Global Memory Based (M)
・Set Transformers
4.3 Combinations of FP (Fixed Pattern Base)
・Sparse Transformers
・Axial Transformers
4.4 Compound of fixed patterns and global memory base (FP+M)
・Longformer
・ETC
・BigBird
4.5. Learnable pattern base (LP) (another article)
・Routing Transformers
・Reformer
・Sinkhorn Transformers
4.6 Low-rank factorization-based (LR) (another article)
・Linformer
・Synthesizers
4.7 Kernel-based (KR) (another article)
・Performer
・Linear Transformers
4.8. Recursion-based (RC) (another article)
・Transformer-XL
・Compressive Transformers
To read more,
Please register with AI-SCHOLAR.
ORCategories related to this article