Transformer's Growth Is Unstoppable!　Summary Of Research On Improving Transformer Part 2

Transformer 23/12/2020

3 main points
✔️ Introduction to a concrete example of Efficient Transformer model
✔️ Describes a method using fixed pattern and global memory
✔️ Achieve an Attention of linear order O(N) at best

Efficient Transformers: A Survey
written by Yi Tay, Mostafa Dehghani, Dara Bahri, Donald Metzler
(Submitted on 14 Sep 2020 (v1), last revised 16 Sep 2020 (this version, v2))
Comments: Accepted at arXiv
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Computer Vision and Pattern Recognition (cs.CV); Information Retrieval (cs.IR)

first of all

Research on more efficient transformers (Efficient Transformers) that improve on Transformer algorithms is currently very active. The progress in this research area is so fast that so many Efficient Transformers have already been proposed, and it is very difficult to grasp the whole picture.

In light of this situation, this article provides a comprehensive explanation of improvements to Transformer.

For a general description of Efficient Transformer, a broad classification, and other basics, you can read this article.

This article provides a more specific and detailed description of the architecture and temporal and spatial computational complexity of the Efficient Transformer models proposed in the past. The models presented in this article can be categorized as Fixed Pattern (FP), Global Memory (M), or a combination of these (FP+M) based approaches. (4.1 - 4.4)

For an explanation of the other classified models, see this article (published tomorrow).

1. computational complexity of transformers (explained in a separate article)
2. classification of Efficient Transformer (explained in another article)
3. related information on Efficient Transformer (explained in another article)
4. concrete examples of Efficient Transformer
4.1 Fixed Pattern Based (FP)
　・Memory Compressed Transformer
　・Memory-compressed Attention
　・Local Attention Span
・Image Transformer

  4.2 Global Memory Based (M)
・Set Transformers
  4.3 Combinations of FP (Fixed Pattern Base)
・Sparse Transformers
・Axial Transformers
  4.4 Compound of fixed patterns and global memory base (FP+M)
・Longformer
・ETC
・BigBird
4.5. Learnable pattern base (LP) (another article)
・Routing Transformers
・Reformer
・Sinkhorn Transformers
4.6 Low-rank factorization-based (LR) (another article)
・Linformer
・Synthesizers
4.7 Kernel-based (KR) (another article)
・Performer
・Linear Transformers
4.8. Recursion-based (RC) (another article)
　・Transformer-XL
・Compressive Transformers

To read more,

Please register with AI-SCHOLAR.

Categories related to this article

anonymous

Catch up on the latest AI articles

Transformer's Growth Is Unstoppable!　Summary Of Research On Improving Transformer Part 2

first of all

table of contents

Transformer's Growth Is Unstoppable! Summary Of Research On Improving Transformer Part 2

first of all

table of contents

Cross-Layer Attention Significantly Reduces Transformer Memory

Cross-Layer Attention Significantly Reduces Transformer Memory

I-ViT: Compute ViT In Integer Type! ?Shiftmax And ShiftGELU, Which Evolved From I-BERT Technology, Are Also Available!

I-ViT: Compute ViT In Integer Type! ?Shiftmax And ShiftGELU, Which Evolved From I-BERT Technology, A ...

[MusicLM] Text-to-Music Generation Model Developed By Google.

[MusicLM] Text-to-Music Generation Model Developed By Google.

Sparse Transformers: An Innovative Approach To The Problem Of Increasing Computational Complexity With Input Sequence Length

Sparse Transformers: An Innovative Approach To The Problem Of Increasing Computational Complexity Wi ...

Breaking Through The Barriers Of Computation Time And Memory!

Breaking Through The Barriers Of Computation Time And Memory!

LONGNET: Model Capable Of Processing Text Up To 1 Billion Tokens

LONGNET: Model Capable Of Processing Text Up To 1 Billion Tokens

Transformer's Growth Is Unstoppable!　Summary Of Research On Improving Transformer Part 2